CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-capability-analyzer

Runs the description-drift experiment — spawns all Claude Code agents simultaneously to collect self-reported capabilities, then compares them against static frontmatter descriptions to reveal how reliable orchestrator routing based on descriptions actually is. Use when measuring description drift across the agent fleet, re-running the capability collection experiment, analyzing a specific agent's self-reported capabilities, or auditing whether frontmatter descriptions accurately reflect agent behavior.

72

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, actionable skill that clearly guides Claude through a multi-agent capability analysis experiment. Its main strengths are concrete executable commands, clear workflow sequencing with decision points, and a well-defined output format for gap analysis. Its weaknesses are moderate redundancy between the Mermaid diagram and textual explanations, and some inline content (gap analysis categories) that could be more concise or split out.

Suggestions

Remove the redundancy between the Mermaid diagram and the textual explanations that follow it — keep one or the other, not both saying the same thing.

Trim the Gap Analysis category explanations to just the category names and flags — Claude can infer 'possible causes' without being told.

DimensionReasoningScore

Conciseness

The skill is reasonably efficient but includes some redundancy — the invocation mode section repeats information that appears again in the Full Experiment Workflow and Single-Agent Analysis sections. The Mermaid diagram, while useful, duplicates the textual explanations that follow it. The Gap Analysis category explanations include some unnecessary elaboration (e.g., explaining possible causes) that Claude could infer.

2 / 3

Actionability

The skill provides concrete, executable bash commands throughout — specific script invocations with exact flags, a one-liner for JSON lookup, clear template substitution instructions, and explicit tool usage patterns (Task spawning with subagent_type). The commands are copy-paste ready with clear placeholder substitution.

3 / 3

Workflow Clarity

Both the full experiment and single-agent workflows are clearly sequenced with numbered steps. The Mermaid flowchart provides decision logic for mode selection. Validation is present — checking script output ('Populated N agents, skipped M'), waiting for all Tasks to complete before dumping, and the gap analysis itself serves as a verification step. The branching logic for agents with/without Bash access is explicitly handled.

3 / 3

Progressive Disclosure

The skill references external resources (template, scripts) appropriately and has a Resources section. However, the content is quite long with the Mermaid diagram, two mode explanations (textual + diagram), and the full gap analysis methodology all inline. The gap analysis output format and category definitions could potentially be split into a referenced file. No bundle files were provided to verify referenced paths exist.

2 / 3

Total

10

/

12

Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly articulates a specific, niche capability with concrete actions and explicit trigger guidance. It uses third person voice throughout, provides multiple natural trigger terms, and has a well-structured 'Use when...' clause covering the key scenarios. The description is distinctive enough to avoid conflicts with other skills while being comprehensive enough for accurate routing.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: spawns Claude Code agents, collects self-reported capabilities, compares against static frontmatter descriptions, reveals routing reliability. These are detailed, concrete operations.

3 / 3

Completeness

Clearly answers both what (spawns agents, collects self-reported capabilities, compares against frontmatter descriptions) and when (explicit 'Use when...' clause covering measuring drift, re-running experiments, analyzing specific agents, auditing frontmatter accuracy).

3 / 3

Trigger Term Quality

Includes strong natural trigger terms: 'description drift', 'capability collection experiment', 'self-reported capabilities', 'frontmatter descriptions', 'orchestrator routing', 'agent fleet', 'auditing'. These are terms a user in this domain would naturally use.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche — 'description-drift experiment' is a very specific concept unlikely to overlap with other skills. The combination of agent spawning, capability collection, and frontmatter comparison creates a unique fingerprint.

3 / 3

Total

12

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
Jamie-BitFlight/claude_skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.