Use when the user wants to review, audit, or check safety for an AI memory system, agent learning pipeline, prompt-tuning workflow, skill builder, trace-mining tool, or eval/feedback loop. Produces an evidence-led audit report with learning-loop map, evidence inventory, maturity scorecard, severity-ranked findings, privacy/provenance gaps, counterfactual/eval coverage, and Stabilize/Standardize/Scale roadmap.
100
100%
Does it follow best practices?
Impact
100%
1.28xAverage score across 3 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description that clearly defines both what the skill produces and when it should be triggered. It uses third-person voice correctly, lists concrete deliverables, and provides a comprehensive set of natural trigger terms covering the specific domain of AI learning system auditing. The description is well-structured with a clear 'Use when' clause followed by specific outputs.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete outputs: 'learning-loop map, evidence inventory, maturity scorecard, severity-ranked findings, privacy/provenance gaps, counterfactual/eval coverage, and Stabilize/Standardize/Scale roadmap.' These are detailed, concrete deliverables. | 3 / 3 |
Completeness | Explicitly answers both 'when' ('Use when the user wants to review, audit, or check safety for an AI memory system...') and 'what' ('Produces an evidence-led audit report with learning-loop map, evidence inventory, maturity scorecard...'). The 'Use when' clause is present and detailed. | 3 / 3 |
Trigger Term Quality | Includes a rich set of natural trigger terms: 'review', 'audit', 'check safety', 'AI memory system', 'agent learning pipeline', 'prompt-tuning workflow', 'skill builder', 'trace-mining tool', 'eval/feedback loop'. These cover many natural ways a user might describe the need for this skill. | 3 / 3 |
Distinctiveness Conflict Risk | Targets a very specific niche — auditing AI memory/learning systems — with distinct triggers like 'AI memory system', 'agent learning pipeline', 'trace-mining tool'. This is unlikely to conflict with general coding, document, or data analysis skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
100%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a high-quality skill that efficiently communicates a complex audit methodology. It excels at progressive disclosure by keeping the main file focused on workflow and critical decision points while delegating detailed reference material to well-organized supporting files. The combination of executable discovery commands, structured finding contracts, explicit maturity scoring anchors, and a clear phased workflow makes this immediately actionable for auditing LLM learning systems.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is lean and efficient. It avoids explaining what LLMs, learning loops, or auditing are. Every section delivers specific instructions, patterns, or constraints without padding. The table-based workflow and reference structure minimize token usage while maximizing information density. | 3 / 3 |
Actionability | Provides executable grep commands for initial evidence gathering, exact report heading structure, precise finding block format with required fields, concrete maturity scoring scale (0-4 with anchors), and specific audit anchors with actionable criteria (e.g., 'post-hoc masking is not privacy safety if raw logs are retained'). The guidance is specific and directly usable. | 3 / 3 |
Workflow Clarity | The 6-phase workflow table is clearly sequenced with each phase linked to its reference file. The 'First Actions' section establishes an evidence-first approach before findings. Critical audit anchors serve as validation checkpoints (e.g., requiring counterexamples before rule promotion, requiring held-out validation plus separate promotion gate). The finding contract enforces structured output with severity ordering. | 3 / 3 |
Progressive Disclosure | Excellent progressive disclosure with a clear reference table at the top linking to 5 separate reference files, each with a concise description of its contents. The body contains only the overview, workflow, and critical anchors, while detailed domain checks, evidence tables, severity rules, and report templates are appropriately delegated to reference files. All references are one level deep and clearly signaled. | 3 / 3 |
Total | 12 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents