Audit Claude Code agents, skills, and commands for quality and production readiness. Use when evaluating skill quality, checking production readiness scores, or comparing agents against best-practice templates.
61
55%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./examples/skills/audit-agents-skills/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is well-structured with a clear 'what' and explicit 'when' clause, making it complete and distinctive. Its main weakness is that the actions described are somewhat high-level ('audit', 'evaluate', 'check') rather than enumerating specific concrete operations, and the trigger terms could include more natural user phrasings.
Suggestions
Add more specific concrete actions, e.g., 'Scores skill descriptions, validates YAML frontmatter, checks for missing fields, and generates improvement recommendations'.
Expand trigger terms with natural user phrasings like 'review my skill', 'validate SKILL.md', 'is my agent ready for production', or 'lint my commands'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (Claude Code agents, skills, commands) and some actions (audit, evaluate quality, check production readiness scores, compare against templates), but the actions are somewhat high-level rather than listing multiple concrete operations like 'lint skill files, validate YAML frontmatter, score description quality'. | 2 / 3 |
Completeness | Clearly answers both 'what' (audit Claude Code agents, skills, and commands for quality and production readiness) and 'when' (Use when evaluating skill quality, checking production readiness scores, or comparing agents against best-practice templates) with an explicit 'Use when...' clause. | 3 / 3 |
Trigger Term Quality | Includes relevant terms like 'audit', 'skill quality', 'production readiness scores', 'agents', 'commands', and 'best-practice templates', but misses common variations users might say such as 'review my skill', 'is my agent ready', 'skill lint', 'validate skill', or 'SKILL.md'. | 2 / 3 |
Distinctiveness Conflict Risk | The niche of auditing Claude Code agents/skills/commands for production readiness is quite specific and unlikely to conflict with other skills. Terms like 'production readiness scores' and 'best-practice templates' create a distinct identity. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is extremely comprehensive but suffers from severe verbosity - it reads more like a design document or RFC than an actionable skill for Claude. The industry context, methodology justifications, and extensive inline examples bloat the content far beyond what's needed for execution. The core workflow is reasonable but buried under unnecessary explanation, and key external dependencies (scoring/criteria.yaml) may not exist.
Suggestions
Reduce content by 60-70%: Remove the Industry Context section entirely, trim Methodology rationale to 2-3 lines, and move Detection Patterns, CI/CD Integration, and Output Examples to separate referenced files.
Make the workflow self-contained: Either inline the essential scoring criteria directly (instead of referencing scoring/criteria.yaml) or provide clear fallback behavior if the file doesn't exist.
Add validation checkpoints: Include explicit error handling for missing directories, malformed frontmatter, and missing criteria files in the workflow phases.
Remove explanations of basic concepts: Drop Python code comments explaining what Jaccard similarity is, how YAML parsing works, and what token estimation does - Claude already knows these.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~400+ lines. Includes extensive industry context (LangChain report statistics), methodology justifications, comparison tables, CI/CD integration examples, maintenance instructions, and changelog that are largely unnecessary for Claude to execute the skill. Explains concepts Claude already knows (what Jaccard similarity is, how YAML parsing works, what pre-commit hooks are). | 1 / 3 |
Actionability | Provides concrete Python code snippets for detection patterns and JSON output schemas, but the core workflow relies on external files (scoring/criteria.yaml) that may not exist. The actual execution steps are somewhat abstract - it describes what the system should do rather than providing fully executable, self-contained instructions Claude can follow. | 2 / 3 |
Workflow Clarity | The 5-phase workflow is clearly sequenced (Discovery → Scoring → Comparative → Report → Fix Suggestions), but validation checkpoints are missing. There's no explicit verification step after scoring to confirm results are reasonable, no error handling for missing directories or malformed files, and no feedback loop if the scoring criteria file is absent or malformed. | 2 / 3 |
Progressive Disclosure | References external files (scoring/criteria.yaml, related commands, reference templates) which is good, but the SKILL.md itself is monolithic with enormous inline content that should be in separate files. The scoring criteria details, detection patterns, industry context, CI/CD integration, and output examples could all be split into referenced documents rather than inlined. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
72%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 8 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (549 lines); consider splitting into references/ and linking | Warning |
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 8 / 11 Passed | |
746adc8
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.