Evaluates any repository's agentic development maturity. Use when auditing a codebase for best practices in agents, skills, instructions, MCP config, and prompts. Produces a scored report with specific remediation steps.
76
66%
Does it follow best practices?
Impact
93%
1.50xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./.github/skills/agentic-evaluator/SKILL.mdQuality
Discovery
85%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description that clearly communicates a specific, niche capability with an explicit 'Use when' trigger clause. It names concrete outputs (scored report, remediation steps) and specific audit targets. The main weakness is that trigger terms could better cover natural user phrasing variations beyond the somewhat technical vocabulary used.
Suggestions
Add natural user-facing trigger terms like 'evaluate my repo', 'agentic readiness check', 'how well is my project set up for agents', or 'review my SKILL.md files' to improve discoverability.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple concrete actions: evaluates repository maturity, audits for best practices across specific areas (agents, skills, instructions, MCP config, prompts), produces a scored report with remediation steps. | 3 / 3 |
Completeness | Clearly answers both what ('Evaluates any repository's agentic development maturity, produces a scored report with remediation steps') and when ('Use when auditing a codebase for best practices in agents, skills, instructions, MCP config, and prompts') with an explicit 'Use when' clause. | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'auditing', 'codebase', 'best practices', 'agents', 'skills', 'MCP config', 'prompts', but misses natural user phrases like 'how mature is my repo', 'agentic readiness', 'evaluate my setup', or 'SKILL.md'. The terms are somewhat specialized and may not match how users naturally phrase requests. | 2 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche — 'agentic development maturity' evaluation with specific focus on agents, skills, instructions, MCP config, and prompts is unlikely to conflict with other skills. The domain is narrow and well-defined. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
47%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill has a well-structured evaluation workflow with clear phases, scoring rubrics, and output templates, but it massively violates its own principles by being extremely verbose (~350+ lines with extensive inline reference material). The Lean Context Principle section, best practices, remediation patterns, and SkillsBench citations should be moved to bundled reference files, leaving SKILL.md focused on the core evaluation workflow. The skill would benefit enormously from practicing what it preaches.
Suggestions
Move the 'Lean Context Principle,' 'Skill Development Best Practices,' 'Size Guidelines Reference,' 'Skill Quality Dimensions,' and 'Remediation Patterns' sections into bundled files (e.g., reference.md, remediation.md) and reference them with 'See: reference.md' — this would cut SKILL.md by ~60% and align with the skill's own advice.
Remove or drastically trim the SkillsBench citations and Anthropic guidance summaries — these are background justification, not actionable instructions for performing an evaluation.
Add a concrete validation mechanism (e.g., a script or structured JSON output schema) rather than relying entirely on subjective manual scoring via natural language prompts.
Trim the noise/signal tables to 2-3 examples each instead of 7+ — Claude can generalize the pattern from fewer examples.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | At ~350+ lines, this skill is extremely verbose and violates its own 'Lean Context Principle.' It includes extensive explanations of concepts Claude already knows (what noise vs signal is, how folder structures work, general best practices), lengthy tables restating common sense, and ironically embeds the very anti-patterns it warns against. The SkillsBench citations and Anthropic guidance summaries are reference material that should be in bundled files, not inline. | 1 / 3 |
Actionability | The evaluation workflow provides concrete checklists with point values and specific file paths to scan, which is actionable. However, there are no executable code snippets or scripts—the 'Running the Evaluator' section just shows natural language prompts. The scoring is manual/subjective with no automated validation commands. The phase-by-phase checklist tables are useful but lack precise decision criteria for edge cases. | 2 / 3 |
Workflow Clarity | The 7-phase workflow (Discovery → Foundation → Skills → Agents → Instructions → Consistency → Report) is clearly sequenced with explicit scoring criteria per phase. Each phase has a structured table of checks with point allocations. The report template provides a clear output format with prioritized issues (P0/P1/P2) and remediation steps. | 3 / 3 |
Progressive Disclosure | The skill references supporting files (checklist.md, report-template.md) and mentions a related skill (project-scaffold), showing awareness of progressive disclosure. However, the SKILL.md itself is a monolithic wall containing extensive reference material (Lean Context Principle section, Size Guidelines, Skill Quality Dimensions, Skill Development Best Practices, Remediation Patterns) that should be in bundled files per its own advice. The irony is stark—it violates the very progressive disclosure pattern it evaluates others on. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
1e34e1c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.