Use when the user wants to review, audit, or check safety for an AI memory system, agent learning pipeline, prompt-tuning workflow, skill builder, trace-mining tool, or eval/feedback loop. Produces an evidence-led audit report with learning-loop map, evidence inventory, maturity scorecard, severity-ranked findings, privacy/provenance gaps, counterfactual/eval coverage, and Stabilize/Standardize/Scale roadmap.
100
100%
Does it follow best practices?
Impact
100%
1.28xAverage score across 3 eval scenarios
Passed
No known issues
Audit LLM learning loops from raw traces to promoted artifacts: privacy, provenance, eval gates, rollback, stale behavior, overfitting, context poisoning, and regression risk. Load only the needed reference file.
| Reference | Contains |
|---|---|
| evidence-inventory.md | Evidence table and status labels |
| audit-domains.md | Domain checks |
| generated-skill-checks.md | Generated executable skill checks |
| findings-and-roadmap.md | Severity and roadmap rules |
| report-template-and-guardrails.md | Report skeleton and guardrails |
Start from evidence, not architecture claims.
rg --files | rg '(memory|rule|skill|prompt|eval|trace|span|session|feedback|artifact|registry|policy|retention|provenance|redact|pii|secret)'
rg -n "memory|rule|skill|prompt|eval|trace|span|session|feedback|artifact|registry|rollback|canary|retention|redact|PII|secret|provenance|counterfactual" .
rg --files | rg '(^|/)(README|docs|design|architecture|evals|traces|skills|prompts|policies|.github/workflows)'Before findings, produce a factual brief: runtime/approval boundary; evidence and outcome signals; promoted artifacts; promotion/canary/deprecation/rollback; storage, privacy, tool-permission, and tenancy risks.
If artifacts are unavailable, label the gap. Do not imply traces, dashboards, stores, datasets, policies, or code were reviewed when they were not provided.
| Phase | Action | Detail |
|---|---|---|
| 1 | Scope the learning loop | Build the brief from direct evidence |
| 2 | Inventory evidence | Use evidence-inventory.md |
| 3 | Score maturity | Score each relevant area 0-4 |
| 4 | Audit domains | Apply audit-domains.md |
| 5 | Prioritize findings | Use findings-and-roadmap.md |
| 6 | Produce report | Use report-template-and-guardrails.md |
Scale: 0 absent; 1 ad hoc/local; 2 incomplete; 3 owned, versioned, gated; 4 measured, privacy-aware, regression-tested, reviewable. Never average scores.
not clean learning evidence.Use these headings exactly when producing the final audit:
## Executive Summary
## Evidence Reviewed
## Architecture and Learning Loop
## Maturity Scorecard
## Critical Findings
## High Findings
## Medium Findings
## Low Findings
## Prioritized Roadmap
## Open QuestionsScorecard row format:
| Domain | Score (0-4) | Evidence | Rationale |
Lead with findings ordered by severity. Every finding must include this block:
- Severity:
- Evidence checked: include `path:line` for local file evidence when available
- Impact:
- Affected learning artifacts or runtime surfaces:
- Recommended fix:
- Owner/function:
- Sequencing dependency:Use findings-and-roadmap.md for severity classification and roadmap sequencing.