Runs the description-drift experiment — spawns all Claude Code agents simultaneously to collect self-reported capabilities, then compares them against static frontmatter descriptions to reveal how reliable orchestrator routing based on descriptions actually is. Use when measuring description drift across the agent fleet, re-running the capability collection experiment, analyzing a specific agent's self-reported capabilities, or auditing whether frontmatter descriptions accurately reflect agent behavior.
72
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly articulates a specific, niche capability with concrete actions and explicit trigger conditions. It uses third person voice correctly, provides a comprehensive 'Use when...' clause with multiple natural trigger scenarios, and occupies a highly distinctive domain that minimizes conflict risk with other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: spawns Claude Code agents, collects self-reported capabilities, compares against static frontmatter descriptions, reveals routing reliability. These are detailed, concrete operations. | 3 / 3 |
Completeness | Clearly answers both what (spawns agents, collects self-reported capabilities, compares against frontmatter descriptions) and when (explicit 'Use when...' clause with four specific trigger scenarios: measuring description drift, re-running experiments, analyzing specific agent capabilities, auditing frontmatter accuracy). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms: 'description drift', 'capability collection experiment', 'self-reported capabilities', 'frontmatter descriptions', 'orchestrator routing', 'agent fleet', 'auditing'. These are terms a user in this domain would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche — 'description-drift experiment' is a very specific concept unlikely to overlap with other skills. The combination of agent spawning, capability collection, and frontmatter comparison creates a unique fingerprint. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, actionable skill with clear workflows and concrete commands. Its main weakness is moderate verbosity — the invocation mode section duplicates information between the flowchart and prose, and the gap analysis categories include explanatory context that could be trimmed. The skill effectively handles branching logic (single vs multi-agent, bash vs no-bash) and provides a clear output format for analysis results.
Suggestions
Remove the prose re-explanation of single-agent and multi-agent modes below the Mermaid flowchart — the flowchart already captures the logic clearly, and the duplication adds ~20 lines of redundant content.
Trim the gap analysis category descriptions to just the category name and flag label, removing the 'possible causes' explanations (e.g., 'description is aspirational, the agent's training doesn't surface that behavior...') which Claude can infer.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is mostly efficient but includes some unnecessary explanation. The 'Overview' section explains what description drift is and why it matters — concepts Claude can infer. The invocation mode section duplicates information between the flowchart and the prose explanations below it. The gap analysis categories include explanatory 'possible causes' text that adds bulk without actionable value. | 2 / 3 |
Actionability | The skill provides fully executable bash commands, specific script paths, concrete JSON output formats, and a clear gap analysis output template. Commands are copy-paste ready with clear variable substitution points (AGENT_ID_HERE, AGENT_NAME). The workflow steps are specific and concrete. | 3 / 3 |
Workflow Clarity | The full experiment workflow is clearly sequenced with numbered steps, explicit handling of edge cases (agents without Bash access), and a clear decision flowchart for invocation mode selection. The workflow includes validation checkpoints (waiting for all Tasks to complete before dumping, checking for agent-map.json existence) and handles branching paths explicitly. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear sections (Overview, Invocation Mode, Setup, Full Workflow, Single-Agent, Gap Analysis, Resources), but the gap analysis section is quite lengthy and could be split into a separate reference file. The Mermaid flowchart followed by prose re-explanation of the same logic inflates the main file. References to external files (template, scripts) are clearly signaled in the Resources section. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
4e61312
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.