CtrlK
BlogDocsLog inGet started
Tessl Logo

audit-agents-skills

Audit Claude Code agents, skills, and commands for quality and production readiness. Use when evaluating skill quality, checking production readiness scores, or comparing agents against best-practice templates.

47

Quality

51%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./examples/skills/audit-agents-skills/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

27%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill reads more like a product specification or design document than an actionable skill for Claude. It is extremely verbose, spending significant tokens on industry justification, methodology rationale, and comparison tables that don't help Claude execute the audit. The core actionable content (scoring criteria, detection patterns, workflow steps) is buried in explanatory prose and could be condensed to roughly 25% of the current length while improving clarity.

Suggestions

Cut the 'Industry Context', 'Scoring Philosophy' rationale, 'Comparison: Command vs Skill', and 'Changelog' sections entirely — these don't help Claude execute the audit and waste ~150 lines of context.

Move the full scoring criteria definitions and detection patterns into actual bundle files (e.g., `scoring/criteria.yaml`, `detection/patterns.py`) and reference them, rather than inlining partial examples that aren't complete enough to use.

Add explicit validation checkpoints in the workflow: e.g., 'Verify at least 1 file found before proceeding to scoring' and 'Validate YAML parse succeeded before scoring criteria'.

Condense the main SKILL.md to a concise overview (~80-100 lines) covering: scan paths, scoring formula, grade thresholds, output format, and pointers to supporting files for criteria details and detection logic.

DimensionReasoningScore

Conciseness

Extremely verbose at ~400+ lines. Explains concepts Claude already knows (what PDF libraries do, what Jaccard similarity is, what CI/CD is). Includes extensive industry context, changelog, comparison tables, and methodology justifications that don't help Claude execute the task. The 'Industry Context' section alone is pure padding.

1 / 3

Actionability

Provides some concrete code snippets (frontmatter parsing, keyword detection, token counting) and JSON output schemas, but much of the guidance is descriptive rather than executable. The actual audit workflow relies on references to `scoring/criteria.yaml` which isn't provided, and the 'Usage' section shows hypothetical CLI invocations rather than concrete implementation steps Claude can follow.

2 / 3

Workflow Clarity

The 5-phase workflow (Discovery → Scoring → Comparative → Report → Fix Suggestions) is clearly sequenced, but lacks validation checkpoints between phases. There's no explicit verification that discovery found files correctly before scoring, no validation that scores are consistent, and no error recovery if files can't be parsed. The workflow reads more like a design document than operational instructions.

2 / 3

Progressive Disclosure

Monolithic wall of text with everything inline. References `scoring/criteria.yaml` and various example directories that don't exist in the bundle. No bundle files are provided, so all the referenced paths are broken. The content that could be split into separate files (detection patterns, scoring criteria, output examples, CI/CD integration) is all crammed into one massive document.

1 / 3

Total

6

/

12

Passed

Description

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a solid description that clearly communicates both what the skill does and when to use it, with an explicit 'Use when...' clause. Its main weakness is that the capability descriptions are somewhat high-level and could benefit from more concrete action verbs, and the trigger terms could include more natural user phrasings. Overall it performs well for skill selection purposes.

Suggestions

Add more specific concrete actions, e.g., 'Scores skill descriptions, validates YAML frontmatter, checks for missing fields, and generates improvement recommendations.'

Expand trigger terms with natural user phrasings like 'review my skill', 'validate agent', 'lint skills', 'check skill quality', or 'is my agent production-ready'.

DimensionReasoningScore

Specificity

Names the domain (Claude Code agents, skills, commands) and some actions (audit, evaluate quality, check production readiness scores, compare against templates), but the actions are somewhat high-level rather than listing multiple concrete operations like 'lint skill files, validate YAML frontmatter, score description quality'.

2 / 3

Completeness

Clearly answers both 'what' (audit Claude Code agents, skills, and commands for quality and production readiness) and 'when' (Use when evaluating skill quality, checking production readiness scores, or comparing agents against best-practice templates) with an explicit 'Use when...' clause.

3 / 3

Trigger Term Quality

Includes relevant terms like 'audit', 'skill quality', 'production readiness', 'agents', 'skills', 'commands', and 'best-practice templates', but misses common variations users might say such as 'review my skill', 'is my agent ready', 'skill lint', 'validate skill', or 'skill score'.

2 / 3

Distinctiveness Conflict Risk

The niche is very specific—auditing Claude Code agents/skills/commands for production readiness—which is unlikely to conflict with other skills. The combination of 'audit', 'production readiness scores', and 'best-practice templates' creates a distinct trigger profile.

3 / 3

Total

10

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (548 lines); consider splitting into references/ and linking

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
FlorianBruniaux/claude-code-ultimate-guide
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.