CtrlK
BlogDocsLog inGet started
Tessl Logo

agentic-evaluator

Evaluates any repository's agentic development maturity. Use when auditing a codebase for best practices in agents, skills, instructions, MCP config, and prompts. Produces a scored report with specific remediation steps.

76

1.50x
Quality

66%

Does it follow best practices?

Impact

93%

1.50x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./.github/skills/agentic-evaluator/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

85%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that clearly communicates a specific, niche capability with explicit 'Use when' guidance and concrete outputs. The main weakness is that trigger terms lean toward domain jargon ('agentic development maturity', 'MCP config') which may not match how all users naturally phrase their requests. Overall it performs well across most dimensions.

Suggestions

Add more natural-language trigger variations such as 'agent setup review', 'check repo for agentic best practices', or 'how ready is my project for AI agents' to improve discoverability.

DimensionReasoningScore

Specificity

Lists multiple concrete actions: 'evaluates repository's agentic development maturity', 'auditing a codebase for best practices', 'produces a scored report with specific remediation steps'. These are specific, actionable capabilities.

3 / 3

Completeness

Clearly answers both what ('Evaluates any repository's agentic development maturity', 'Produces a scored report with specific remediation steps') and when ('Use when auditing a codebase for best practices in agents, skills, instructions, MCP config, and prompts').

3 / 3

Trigger Term Quality

Includes some relevant terms like 'repository', 'codebase', 'agents', 'skills', 'MCP config', 'prompts', 'audit'. However, these are somewhat specialized/jargon-heavy and may miss natural user phrasings like 'check my repo setup', 'evaluate my project', or 'agentic readiness'.

2 / 3

Distinctiveness Conflict Risk

The niche of 'agentic development maturity' evaluation is highly specific and unlikely to conflict with other skills. The combination of agents, skills, MCP config, and scored reports creates a very distinct trigger profile.

3 / 3

Total

11

/

12

Passed

Implementation

47%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill has a well-structured multi-phase evaluation workflow with clear scoring criteria and a concrete output template, which are its strongest aspects. However, it severely violates its own 'Lean Context Principle'—the skill is far too long, embedding extensive research citations, best practice guides, and remediation patterns that should either be in bundled files or omitted entirely. The irony of a skill about token efficiency being one of the most token-heavy skills undermines its credibility.

Suggestions

Move the 'Lean Context Principle', 'Skill Development Best Practices', 'Skill Quality Dimensions', and 'Remediation Patterns' sections into bundled reference files (e.g., lean-context.md, best-practices.md, remediation.md) and reference them with 'See: filename.md'—this would cut SKILL.md by ~60%.

Remove or drastically shorten the SkillsBench citations and research references—Claude doesn't need academic paper numbers to execute the evaluation; just state the derived rules concisely.

Add concrete executable examples for discovery scanning (e.g., shell commands like `find .github -name '*.md' | wc -l` or a script snippet for frontmatter validation) rather than relying on natural language prompts.

Actually provide the referenced bundle files (checklist.md, report-template.md) to support the progressive disclosure pattern claimed in the skill.

DimensionReasoningScore

Conciseness

This skill is extremely verbose at ~400+ lines. It extensively explains concepts Claude already knows (what agentic patterns are, how folder structures work, what YAML frontmatter is). It includes lengthy sections on 'Lean Context Principle' with noise/signal tables, SkillsBench research citations, and best practices that are general knowledge rather than actionable evaluation steps. The irony is that a skill about lean context is itself bloated with context.

1 / 3

Actionability

The scoring rubric tables with specific point allocations and criteria are concrete and actionable. The report template provides a clear output format. However, there are no executable code examples—the 'Running the Evaluator' section just shows natural language prompts, and the actual evaluation logic (how to count lines, check frontmatter validity, detect noise patterns) is left implicit rather than providing concrete commands or scripts.

2 / 3

Workflow Clarity

The 7-phase workflow (Discovery Scan → Foundation → Skills → Agents → Instructions → Consistency → Generate Report) is clearly sequenced with explicit criteria and point allocations at each phase. Each phase has a structured table of checks with specific criteria. The report generation phase provides a complete output template with prioritized issues (P0/P1/P2) and recommendations.

3 / 3

Progressive Disclosure

The skill references supporting files (checklist.md, report-template.md) and a related skill (project-scaffold), which is good progressive disclosure. However, no bundle files are actually provided, and the SKILL.md itself is monolithic—the Lean Context Principle section, Best Practices section, Remediation Patterns, and Size Guidelines could all be moved to bundled reference files, keeping the core SKILL.md focused on the evaluation workflow.

2 / 3

Total

8

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
0xrabbidfly/eric-cartman
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.