Evaluates any repository's agentic development maturity. Use when auditing a codebase for best practices in agents, skills, instructions, MCP config, and prompts. Produces a scored report with specific remediation steps.
76
66%
Does it follow best practices?
Impact
93%
1.50xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./.github/skills/agentic-evaluator/SKILL.mdQuality
Discovery
85%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description that clearly communicates a specific, niche capability with explicit 'Use when' guidance and concrete outputs. The main weakness is that trigger terms lean toward domain jargon ('agentic development maturity', 'MCP config') which may not match how all users naturally phrase their requests. Overall it performs well across most dimensions.
Suggestions
Add more natural-language trigger variations such as 'agent setup review', 'check repo for agentic best practices', or 'how ready is my project for AI agents' to improve discoverability.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple concrete actions: 'evaluates repository's agentic development maturity', 'auditing a codebase for best practices', 'produces a scored report with specific remediation steps'. These are specific, actionable capabilities. | 3 / 3 |
Completeness | Clearly answers both what ('Evaluates any repository's agentic development maturity', 'Produces a scored report with specific remediation steps') and when ('Use when auditing a codebase for best practices in agents, skills, instructions, MCP config, and prompts'). | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'repository', 'codebase', 'agents', 'skills', 'MCP config', 'prompts', 'audit'. However, these are somewhat specialized/jargon-heavy and may miss natural user phrasings like 'check my repo setup', 'evaluate my project', or 'agentic readiness'. | 2 / 3 |
Distinctiveness Conflict Risk | The niche of 'agentic development maturity' evaluation is highly specific and unlikely to conflict with other skills. The combination of agents, skills, MCP config, and scored reports creates a very distinct trigger profile. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
47%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill has a well-structured multi-phase evaluation workflow with clear scoring criteria and a concrete output template, which are its strongest aspects. However, it severely violates its own 'Lean Context Principle'—the skill is far too long, embedding extensive research citations, best practice guides, and remediation patterns that should either be in bundled files or omitted entirely. The irony of a skill about token efficiency being one of the most token-heavy skills undermines its credibility.
Suggestions
Move the 'Lean Context Principle', 'Skill Development Best Practices', 'Skill Quality Dimensions', and 'Remediation Patterns' sections into bundled reference files (e.g., lean-context.md, best-practices.md, remediation.md) and reference them with 'See: filename.md'—this would cut SKILL.md by ~60%.
Remove or drastically shorten the SkillsBench citations and research references—Claude doesn't need academic paper numbers to execute the evaluation; just state the derived rules concisely.
Add concrete executable examples for discovery scanning (e.g., shell commands like `find .github -name '*.md' | wc -l` or a script snippet for frontmatter validation) rather than relying on natural language prompts.
Actually provide the referenced bundle files (checklist.md, report-template.md) to support the progressive disclosure pattern claimed in the skill.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | This skill is extremely verbose at ~400+ lines. It extensively explains concepts Claude already knows (what agentic patterns are, how folder structures work, what YAML frontmatter is). It includes lengthy sections on 'Lean Context Principle' with noise/signal tables, SkillsBench research citations, and best practices that are general knowledge rather than actionable evaluation steps. The irony is that a skill about lean context is itself bloated with context. | 1 / 3 |
Actionability | The scoring rubric tables with specific point allocations and criteria are concrete and actionable. The report template provides a clear output format. However, there are no executable code examples—the 'Running the Evaluator' section just shows natural language prompts, and the actual evaluation logic (how to count lines, check frontmatter validity, detect noise patterns) is left implicit rather than providing concrete commands or scripts. | 2 / 3 |
Workflow Clarity | The 7-phase workflow (Discovery Scan → Foundation → Skills → Agents → Instructions → Consistency → Generate Report) is clearly sequenced with explicit criteria and point allocations at each phase. Each phase has a structured table of checks with specific criteria. The report generation phase provides a complete output template with prioritized issues (P0/P1/P2) and recommendations. | 3 / 3 |
Progressive Disclosure | The skill references supporting files (checklist.md, report-template.md) and a related skill (project-scaffold), which is good progressive disclosure. However, no bundle files are actually provided, and the SKILL.md itself is monolithic—the Lean Context Principle section, Best Practices section, Remediation Patterns, and Size Guidelines could all be moved to bundled reference files, keeping the core SKILL.md focused on the evaluation workflow. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
a5309ae
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.