Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.
82
94%
Does it follow best practices?
Impact
65%
1.80xAverage score across 5 eval scenarios
Risky
Do not use without reviewing
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly defines a narrow, specific use case with concrete actions, explicit trigger guidance via a 'Use when' clause with multiple natural phrasings, and strong distinctiveness through domain-specific terminology. It also helpfully specifies accepted input formats. The description is concise yet comprehensive.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists concrete actions: 'Evaluates SKILL.md submissions across 11 dimensions (8 official Tessl rubric + 3 bonus)', and specifies input formats: 'GitHub repo URLs, file paths, or raw SKILL.md pastes'. Multiple specific capabilities are named. | 3 / 3 |
Completeness | Clearly answers both 'what' (evaluates SKILL.md submissions across 11 dimensions for the AIE London 2026 Skills Contest) and 'when' (explicit 'Use when' clause with six specific trigger phrases). Also specifies accepted input formats. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger phrases users would actually say: 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', 'how would this score on the leaderboard', 'rate my skill before I submit', 'give me judge feedback on this skill'. These are highly natural and varied. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche: specifically targets 'AI Engineer London 2026 Skills Contest' SKILL.md evaluation with the Tessl rubric. The domain-specific terminology (AIE26, SKILL.md, leaderboard, contest) makes it very unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill with excellent workflow clarity and progressive disclosure. The 5-phase evaluation pipeline is clearly sequenced with validation gates, and the output format is precisely specified with copy-paste-ready templates. Minor verbosity in scope-limiting statements and some behavioral constraints that Claude could infer prevent a perfect conciseness score.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Mostly efficient but includes some unnecessary scaffolding language (e.g., 'Evaluation only — no editing, ranking, or CLI execution' and 'Refuse non-AIE26 skills and contest logistics questions' are behavioral constraints Claude could infer). The phase structure adds overhead but serves organizational purpose. | 2 / 3 |
Actionability | Highly actionable with concrete checklists (structural checks with checkboxes), a specific scoring formula ('round((sum of 8 scores / 24) * 100)'), exact output format templates with table structures, and clear score scales (1-3). Claude knows exactly what to produce. | 3 / 3 |
Workflow Clarity | The 5-phase workflow is clearly sequenced with explicit gates (Phase 2 structural check must pass before Phase 3 scoring), a short-file bypass path, and clear conditional logic ('If any check fails: Report each failure... Do NOT proceed'). Each phase has defined inputs and outputs. | 3 / 3 |
Progressive Disclosure | Appropriately delegates detailed rubric criteria to 'references/scoring-rubric.md' and calibration examples to 'references/example-evaluation.md', keeping the main skill as an overview with clear one-level-deep references. Content is well-split between the overview workflow and reference materials. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents