home / skills / oimiragieo / agent-studio / memory-quality-auditor

memory-quality-auditor skill

/.claude/skills/memory-quality-auditor

This skill analyzes memory retrieval quality, identifies drift and staleness, and generates a remediation backlog with actionable improvements.

npx playbooks add skill oimiragieo/agent-studio --skill memory-quality-auditor

Review the files below or copy the command above to add this skill to your agents.

Files (10)
SKILL.md
866 B
---
name: memory-quality-auditor
description: Audit memory retrieval quality (drift, staleness, citation-groundedness) and produce remediation backlog.
version: 1.0.0
model: sonnet
invoked_by: both
user_invocable: true
tools: [Read, Write, Edit, Glob, Grep, Bash, Skill, MemoryRecord]
args: '--mode summary|full [--hours 24]'
error_handling: graceful
streaming: supported
---

# Memory Quality Auditor

Audit the memory system as a unified retrieval layer (STM/MTM/LTM files + index + spawn citation outcomes).

## Scope

- Retrieval drift signals
- stale memory ratio
- evidence injection coverage
- citation usage/groundedness continuity

## Workflow

1. Read memory artifacts and latest eval reports.
2. Compute quality metrics and threshold status.
3. Emit remediation backlog with TDD checks.
4. Record findings in memory and optional evolution recommendation.

Overview

This skill audits the quality of a memory retrieval layer across short-term, mid-term, and long-term stores. It identifies retrieval drift, measures staleness, evaluates citation-groundedness, and generates an actionable remediation backlog. Output is designed to feed into test-driven fixes and optional memory evolution recommendations.

How this skill works

The auditor ingests memory artifacts (STM/MTM/LTM files), index metadata, and the latest evaluation reports. It computes quality metrics such as drift signals, stale-memory ratio, evidence injection coverage, and citation continuity. It then compares metrics to configurable thresholds and emits a prioritized remediation backlog with TDD-style checks and suggested fixes. Findings are recorded back into memory and can include evolution recommendations for indexing or retention policies.

When to use it

  • After model updates that may change retrieval behavior
  • When retrieval results become less relevant or inconsistent
  • Prior to deploying memory schema or indexing changes
  • During periodic quality assurance of memory systems
  • When citation or evidence grounding complaints increase

Best practices

  • Run audits regularly and after any retrieval-index change
  • Define clear thresholds for drift, staleness, and citation coverage
  • Prioritize backlog items by user impact and frequency of failing queries
  • Include reproducible TDD checks for each remediation item
  • Record audit results into memory to track regression over time

Example use cases

  • Detecting retrieval drift after a vector encoder update and rolling back or retraining affected embeddings
  • Identifying stale facts in LTM and creating a refresh backlog for high-value entries
  • Measuring how often responses cite original evidence vs. hallucinating and producing fixes to increase citation-groundedness
  • Validating that evidence injection coverage meets minimum requirements for critical user workflows
  • Generating prioritized TDD tasks after an integration that altered the index format

FAQ

What inputs are required for the audit?

Memory artifact files for STM/MTM/LTM, index metadata, and recent evaluation reports or query logs.

How are remediation priorities determined?

By combining metric severity, frequency of affected queries, and user-impact weighting to create a prioritized backlog.