Interactive skill creation and eval-driven optimization. Triggers: create a skill, make a skill, new skill, scaffold skill, optimize skill, run evals, improve skill. Uses AskUserQuestion for interview; WebSearch for research; Bash for eval execution. Outputs: complete skill directory with SKILL.md, tile.json, evals, and repo integration.
93
94%
Does it follow best practices?
Impact
91%
1.26xAverage score across 3 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong, comprehensive description that excels across all dimensions. It provides specific actions, natural trigger phrases with example user utterances, a clear workflow summary, and explicit negative boundaries. The only minor weakness is its length — it's quite verbose and could be slightly more concise — but the content is high quality and well-structured for skill selection.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: create, scaffold, build, fix, improve, benchmark, optimize skills. Also details the full workflow including structured interview, SKILL.md + tile.json scaffolding, linting, CLI pipeline, eval run, and benchmark logging. | 3 / 3 |
Completeness | Clearly answers both 'what' (the full workflow from interview to benchmark logging) and 'when' (explicit trigger scenarios like creating, fixing, improving skills, running evals). Also includes a 'Do NOT use for' clause which strengthens the 'when' guidance. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms users would say: 'create a skill for X', 'build me a skill that does Y', 'scaffold a skill called Z', 'fixing', 'eval scores', 'description not triggering', 'tessl', plus explicit note it applies even without saying 'tessl'. Also includes negative triggers to reduce false matches. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive — focuses specifically on Tessl/Claude skill creation and evaluation, a clear niche. The explicit 'Do NOT use for' clause (editing application code, debugging, refactoring, general documentation, presentations) actively reduces conflict risk with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-architected skill with excellent workflow clarity, actionability, and progressive disclosure. The multi-phase process is clearly sequenced with validation gates and feedback loops. The main weakness is verbosity — at ~400+ lines with detailed tables and repeated cross-references, it could be tightened by extracting more content to companion files and trimming explanatory text that Claude can infer from context.
Suggestions
Consider extracting the full interview table (Phase 1) and the Tessl CLI pipeline details (§2.6) into companion rule files to reduce the main SKILL.md length and improve conciseness.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is comprehensive but verbose in places — the full interview table, multiple mode paths, and extensive phase descriptions push it well beyond what's strictly necessary. Some sections (like the integrated example) earn their space, but the repeated cross-references and detailed CLI step-by-step instructions for Tessl commands could be more compact or extracted to companion files. | 2 / 3 |
Actionability | Highly actionable throughout: concrete CLI commands (`tessl tile lint`, `tessl scenario generate`), exact file structures (tile.json schema, criteria.json with specific constraints like scores summing to 100), specific interview questions with fallback logic, and a complete integrated example showing input-to-output. Nearly everything is copy-paste ready or directly executable. | 3 / 3 |
Workflow Clarity | Excellent multi-step workflow with explicit phases, clear sequencing (interview → scaffold → lint → CLI pipeline → eval → optimize → log), validation checkpoints (lint check, completeness check, gate check), feedback loops (Phase 5 → re-eval → log → warn → re-optimize), and explicit non-negotiables like #6 ensuring mutations complete before eval execution. The boundary between read and write operations is clearly articulated. | 3 / 3 |
Progressive Disclosure | Well-structured with clear references to companion rule files (scaffold-rules.md, activation-design.md, benchmark-loop.md, eval-runner.md) that are one level deep and clearly signaled. The main SKILL.md serves as an overview with appropriate detail, deferring full implementation specifics to referenced files. Navigation is straightforward. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents