Evaluates any repository's agentic development maturity. Use when auditing a codebase for best practices in agents, skills, instructions, MCP config, and prompts. Produces a scored report with specific remediation steps.
Install with Tessl CLI
npx tessl i github:0xrabbidfly/eric-cartman --skill agentic-evaluator85
Quality
81%
Does it follow best practices?
Impact
93%
1.50xAverage score across 3 eval scenarios
Score a repository's implementation of agentic development patterns and provide actionable remediation guidance. Works on any codebase—your own or external repos.
Evaluate this repository's agentic development patterns.
Generate a scored report using the agentic-evaluator skill.| Category | Points | Focus |
|---|---|---|
| Foundation | 25 | Root instructions, structure, MCP config |
| Skills | 25 | Frontmatter, examples, right-sizing |
| Agents | 20 | Tools, mission, handoffs |
| Instructions | 20 | applyTo patterns, coverage |
| Consistency | 10 | Naming, no duplicates, cross-refs |
| Score | Grade | Interpretation |
|---|---|---|
| 90-100 | A | Excellent — Production-ready |
| 80-89 | B | Good — Minor improvements needed |
| 70-79 | C | Adequate — Noticeable gaps |
| 60-69 | D | Developing — Significant work needed |
| <60 | F | Foundational — Start with basics |
Don't tell the agent what it can figure out on its own.
Every line in a context file costs tokens and competes for attention. Instructions that restate discoverable facts add noise, not signal. Keep only rules that correct specific agent mistakes your evals reveal.
— Theo vs Maple debate, Feb 2026 — SkillsBench (Li et al., arXiv:2602.12670): Curated skills +16.2 pp avg, self-generated −1.3 pp. Comprehensive docs hurt (−2.9 pp). Focused 2–3 module skills optimal.
Domain sensitivity: Software Engineering tasks benefit least from skills (+4.5 pp) — agents already know this domain from pretraining. Be extra ruthless trimming SE-focused rules (TypeScript, React, file conventions). Non-SE domains (auth flows, regulatory, infra) benefit most.
| Noise pattern | Why it's noise |
|---|---|
| "Use TypeScript" | Agent sees .ts/.tsx files and tsconfig.json |
"Run npm run dev to start" | Agent reads package.json scripts |
| "Project uses React" | Agent sees react in package.json dependencies |
"Files are in src/" | Agent explores the folder structure |
| "Use ESLint" | Agent finds .eslintrc / eslint.config.* |
| "Components are in PascalCase" | Agent infers from existing filenames |
| "Use Jest for testing" | Agent reads jest.config.* or vitest.config.* |
| Signal pattern | Why it's signal |
|---|---|
| "Never log tokens or auth headers" | Safety rule the agent can't infer |
| "Use OBO flow, not app-only for Graph" | Architectural decision not in code |
| "All strings must be localizable (EN/FR)" | Policy the agent would skip |
| "Prefer discriminated unions over throwing" | Style choice against agent default |
| "Do not cache user-specific content in shared caches" | Non-obvious constraint |
| "Use CSS Modules; never duplicate properties in inline + class" | Prevents a specific recurring bug |
Before adding a rule, ask: "Would the agent do the wrong thing without this?"
Scan for agentic artifacts at these locations:
├── .github/
│ ├── copilot-instructions.md
│ ├── skills/*/SKILL.md
│ ├── agents/*.md
│ ├── instructions/*.instructions.md
│ ├── prompts/*.md
│ ├── commands/*.md
│ ├── references/*.md
│ └── mcp.json
├── .claude/
│ ├── claude.md
│ └── skills/
├── .cursor/
│ └── prompts/
├── .vscode/
│ └── mcp.json
└── AGENTS.mdRecord file counts and line counts per artifact type.
| Check | Points | Criteria |
|---|---|---|
| Root instructions exist | 4 | .github/copilot-instructions.md OR AGENTS.md OR .claude/claude.md |
| Root instructions quality | 4 | Has project context, tech stack, non-negotiables (50+ lines) |
| Lean context (low noise) | 5 | No discoverable facts restated (see Lean Context Principle) |
| No auto-generated context | 2 | Instructions are human-authored; any auto-generated content has eval validation |
.github/ structure | 4 | Organized folders for artifacts |
| README mentions agentic features | 3 | Documents how to use AI assistance |
| MCP config exists | 3 | .github/mcp.json or .vscode/mcp.json |
| Check | Points | Criteria |
|---|---|---|
| Skills folder exists | 2 | .github/skills/ present |
| Valid frontmatter | 4 | name + description in YAML |
| "When to Use" section | 3 | Clear trigger scenarios |
| Examples included | 3 | Concrete code/command examples |
| Right-sized | 3 | SKILL.md: 100–300 lines / 1–2K tokens; total skill ≤5K tokens |
| Activation scope | 2 | Trigger patterns limit concurrent skills to 2–3 per task |
| Progressive disclosure | 3 | 3-tier: metadata → body → bundled files |
| Cover key workflows | 5 | Testing, deployment, or domain-specific |
Progressive Disclosure (per Anthropic guidance):
name + description loaded at startupPer SkillsBench ecosystem data (47K skills): median SKILL.md is ~1.5K tokens, median total skill ~2.5K tokens. "Detailed" skills (+18.8 pp) outperform "Comprehensive" ones (−2.9 pp). Keep SKILL.md focused; move reference tables, templates, and examples into bundled files.
✅ Good: See: templates/component.template.tsx for scaffolding
❌ Bad: Embedding 200-line template directly in SKILL.md
Frontmatter schema:
---
name: required # lowercase-hyphenated
description: required # includes "Use when..." trigger
version: optional # semantic versioning
---| Check | Points | Criteria |
|---|---|---|
| Agents folder exists | 2 | .github/agents/ present |
| Valid frontmatter | 3 | name, description, tools declared |
| Clear mission | 4 | Single responsibility, defined workflow |
| Handoff patterns | 3 | References other agents (@agent-name) |
| Skill references | 3 | Uses See: skill-name for capabilities |
| Right-sized | 2 | 100-400 lines |
| Tools match MCP | 3 | Declared tools are available |
Frontmatter schema:
---
name: required
description: required
model: optional # e.g., "Claude Opus 4.5 (copilot)"
target: optional # e.g., "vscode"
tools: required # array of allowed tools
---| Check | Points | Criteria |
|---|---|---|
| Instructions folder exists | 2 | .github/instructions/ present |
Has applyTo patterns | 4 | Valid glob patterns in frontmatter |
| Has code examples | 3 | Good/bad pattern comparisons |
| No discoverable noise | 3 | Every rule fails the "would the agent get this wrong?" test |
| No conflicting guidance | 2 | No contradictions between root instructions, scoped instructions, and skills |
| Right-sized | 3 | 50-200 lines with concrete guidance |
| Coverage analysis | 3 | Patterns match actual codebase files |
Frontmatter schema:
---
applyTo: required # glob pattern(s)
excludeAgent: optional
---| Check | Points | Criteria |
|---|---|---|
| Naming conventions | 2 | lowercase-hyphenated |
| No duplicates | 2 | No redundant agent/prompt pairs |
| Cross-refs resolve | 2 | @agent-name and "See: skill" work |
| Version fields | 2 | Mature skills have version: |
| Supporting files organized | 2 | Templates in skill subdirs |
Output using this structure:
# Agentic Evaluation Report
**Repository**: [name]
**Evaluated**: [timestamp]
**Overall Score**: X/100 (Grade: X)
## Score Breakdown
| Category | Score | Max | Notes |
|----------|-------|-----|-------|
| Foundation | X | 25 | ... |
| Skills | X | 25 | ... |
| Agents | X | 20 | ... |
| Instructions | X | 20 | ... |
| Consistency | X | 10 | ... |
## Artifacts Found
| Type | Count | Avg Lines | Status |
|------|-------|-----------|--------|
| Skills | X | X | ✅/⚠️/❌ |
| Agents | X | X | ✅/⚠️/❌ |
| Instructions | X | X | ✅/⚠️/❌ |
## Issues Found
### P0 (Critical)
- [ ] Issue → Remediation
### P1 (High)
- [ ] Issue → Remediation
### P2 (Medium)
- [ ] Issue → Remediation
## Recommendations
1. **Quick Win**: [Lowest effort, highest impact]
2. **Next Step**: [Logical follow-up]
3. **Long Term**: [Strategic improvement]| Artifact | Min | Max | Token budget | Notes |
|---|---|---|---|---|
| Root instructions | 50 | 300 | ≤3K | Project overview, non-negotiables |
| Skills (SKILL.md) | 80 | 300 | 1–2K | Single workflow focus; move extras to bundled files |
| Skills (total) | — | — | ≤5K | SKILL.md + scripts/ + references/ combined |
| Agents | 100 | 400 | ≤4K | Clear mission, defined workflow |
| Instructions | 50 | 200 | ≤2K | File-specific patterns |
Why these limits? SkillsBench (Li et al., 2026) tested 47K ecosystem skills: median SKILL.md ~1.5K tokens, median total ~2.5K tokens. "Detailed" skills (+18.8 pp) beat "Comprehensive" ones (−2.9 pp). 4+ skills per task = only +5.9 pp vs +18.6 pp for 2–3 skills.
Signals to split:
Rate each skill on four dimensions (0–3 each, /12 total). Ecosystem mean is 6.2/12 — aim for ≥9/12 on production skills.
| Dimension | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| Completeness | Missing required sections | Has frontmatter only | Has workflow + examples | Full structure with bundled resources |
| Clarity | Ambiguous, wall of text | Some structure | Clear headings + steps | Scannable, progressive disclosure |
| Specificity | Vague platitudes | General guidance | Domain-specific procedures | Concrete steps with verifiable outputs |
| Examples | None | Pseudocode only | One working example | Good/bad comparisons with context |
Source: SkillsBench Appendix A.3 quality rubric, adapted.
From Anthropic's Agent Skills guidance and SkillsBench (Li et al., 2026):
See: reference.mdname and description — Claude uses these to decide whether to trigger the skillWhen files exceed size limits, use these splitting strategies:
Split into:
skill-name/
├── SKILL.md # Core workflow (80-300 lines, 1-2K tokens)
├── reference.md # Detailed reference material
├── patterns.md # Code patterns and examples
├── checklist.md # Validation checklist
└── templates/ # Reusable templates
├── component.template.tsx
└── test.template.tsTotal skill directory should stay ≤5K tokens. The agent loads SKILL.md eagerly but only reads bundled files on demand (
See: reference.md).
Split into sub-agents:
.github/agents/
├── workflow-orchestrator.md # Main agent, coordinates
├── workflow-analyzer.md # Sub-agent: analysis phase
├── workflow-implementer.md # Sub-agent: implementation
└── workflow-validator.md # Sub-agent: validationSplit by concern:
.github/instructions/
├── typescript.instructions.md # Language patterns
├── react-components.instructions.md # Framework patterns
└── api-routes.instructions.md # API patterns# Agentic Evaluation Report
**Repository**: basic-express-app
**Overall Score**: 35/100 (Grade: F)
## Artifacts Found
| Type | Count |
|------|-------|
| Root instructions | 0 |
| Skills | 0 |
## Issues Found
### P0 (Critical)
- [ ] No root instructions → Create `.github/copilot-instructions.md`
### Recommendations
1. **Quick Win**: Create copilot-instructions.md with project overview# Agentic Evaluation Report
**Repository**: ai-hub-portal
**Overall Score**: 92/100 (Grade: A)
## Score Breakdown
| Category | Score | Max |
|----------|-------|-----|
| Foundation | 25 | 25 |
| Skills | 23 | 25 |
| Agents | 19 | 20 |
| Instructions | 18 | 20 |
| Consistency | 7 | 10 |
## Issues Found
### P2 (Medium)
- [ ] 2 skills missing `version:` → Add version to mature skillsOn current repo:
Evaluate this repository using the agentic-evaluator skill.On external repo:
Clone [repo-url] and evaluate its agentic patterns.With threshold:
Evaluate this repo. Fail if score < 70.project-scaffold — Generate missing artifacts identified by evaluatorchecklist.md — Quick manual validationreport-template.md — Output formatc62a8c6
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.