Name: sharaf/llm-learning-system-auditor
Rating: 100 (1 reviews)
Author: sharaf

sharaf/llm-learning-system-auditor

Use when the user wants to review, audit, or check safety for an AI memory system, agent learning pipeline, prompt-tuning workflow, skill builder, trace-mining tool, or eval/feedback loop. Produces an evidence-led audit report with learning-loop map, evidence inventory, maturity scorecard, severity-ranked findings, privacy/provenance gaps, counterfactual/eval coverage, and Stabilize/Standardize/Scale roadmap.

100

1.28x

Quality

100%

Does it follow best practices?

Impact

100%

1.28x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Quality

Content

100%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a high-quality skill that efficiently communicates a complex audit methodology. It excels at progressive disclosure by keeping the main file focused on workflow and critical decision points while delegating detailed reference material to well-organized supporting files. The combination of executable discovery commands, structured finding contracts, explicit maturity scoring anchors, and a clear phased workflow makes this immediately actionable for auditing LLM learning systems.

Dimension	Reasoning	Score
Conciseness	The content is lean and efficient. It avoids explaining what LLMs, learning loops, or auditing are. Every section delivers specific instructions, patterns, or constraints without padding. The table-based workflow and reference structure minimize token usage while maximizing information density.	3 / 3
Actionability	Provides executable grep commands for initial evidence gathering, exact report heading structure, precise finding block format with required fields, concrete maturity scoring scale (0-4 with anchors), and specific audit anchors with actionable criteria (e.g., 'post-hoc masking is not privacy safety if raw logs are retained'). The guidance is specific and directly usable.	3 / 3
Workflow Clarity	The 6-phase workflow table is clearly sequenced with each phase linked to its reference file. The 'First Actions' section establishes an evidence-first approach before findings. Critical audit anchors serve as validation checkpoints (e.g., requiring counterexamples before rule promotion, requiring held-out validation plus separate promotion gate). The finding contract enforces structured output with severity ordering.	3 / 3
Progressive Disclosure	Excellent progressive disclosure with a clear reference table at the top linking to 5 separate reference files, each with a concise description of its contents. The body contains only the overview, workflow, and critical anchors, while detailed domain checks, evidence tables, severity rules, and report templates are appropriately delegated to reference files. All references are one level deep and clearly signaled.	3 / 3
	Total	12 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that clearly defines both what the skill produces and when it should be triggered. It uses third-person voice correctly, lists concrete deliverables, and provides a comprehensive set of natural trigger terms covering the specific domain of AI learning system auditing. The description is well-structured with a clear 'Use when' clause followed by specific outputs.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete outputs: 'learning-loop map, evidence inventory, maturity scorecard, severity-ranked findings, privacy/provenance gaps, counterfactual/eval coverage, and Stabilize/Standardize/Scale roadmap.' These are detailed, concrete deliverables.	3 / 3
Completeness	Explicitly answers both 'when' ('Use when the user wants to review, audit, or check safety for an AI memory system...') and 'what' ('Produces an evidence-led audit report with learning-loop map, evidence inventory, maturity scorecard...'). The 'Use when' clause is present and detailed.	3 / 3
Trigger Term Quality	Includes a rich set of natural trigger terms: 'review', 'audit', 'check safety', 'AI memory system', 'agent learning pipeline', 'prompt-tuning workflow', 'skill builder', 'trace-mining tool', 'eval/feedback loop'. These cover many natural ways a user might describe the need for this skill.	3 / 3
Distinctiveness Conflict Risk	Targets a very specific niche — auditing AI memory/learning systems — with distinct triggers like 'AI memory system', 'agent learning pipeline', 'trace-mining tool'. This is unlikely to conflict with general coding, document, or data analysis skills.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

2 months ago

Table of Contents

Discovery Implementation Validation