Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.
82
94%
Does it follow best practices?
Impact
65%
1.80xAverage score across 5 eval scenarios
Risky
Do not use without reviewing
Evaluate SKILL.md files for the AI Engineer London 2026 Skills Contest (skillleaderboard.alan-626.workers.dev/AIE26). Score on the official Tessl rubric plus three bonus dimensions. Accept GitHub repo URLs, local file paths, or raw SKILL.md pastes.
Evaluation only — no editing, ranking, or CLI execution. Refuse non-AIE26 skills and contest logistics questions.
Extract name, description, line count, and reference file count from the SKILL.md.
Very short files (< 20 lines): If the file has valid frontmatter with name and description but is under 20 lines, skip the checks below. Say "This skill appears incomplete (<N> lines). Scoring what exists..." and go to Phase 3.
For all other files, check all of these:
---, ends with ---)name field present and non-emptydescription field present and non-emptyIf any check fails: Report each failure with a specific fix instruction. Do NOT proceed to scoring.
If all pass: Proceed to Phase 3.
Load references/scoring-rubric.md for detailed per-dimension criteria.
Score the 8 official Tessl dimensions. For each dimension:
Core score formula: round((sum of 8 scores / 24) * 100)
Using the bonus criteria from references/scoring-rubric.md, score the 3 bonus dimensions:
Report as +X/9.
Produce the full output in this exact format:
## Scorecard: <skill-name>
### Core Score: XX/100
| Dimension | Score | Reasoning |
|------------------------|-------|------------------------------|
| Specificity | X/3 | <one line with evidence> |
| Trigger Terms | X/3 | <one line with evidence> |
| Completeness | X/3 | <one line with evidence> |
| Distinctiveness | X/3 | <one line with evidence> |
| Conciseness | X/3 | <one line with evidence> |
| Actionability | X/3 | <one line with evidence> |
| Workflow Clarity | X/3 | <one line with evidence> |
| Progressive Disclosure | X/3 | <one line with evidence> |
### Bonus Score: +X/9
| Dimension | Score | Reasoning |
|------------|-------|------------------------------|
| Innovation | X/3 | <one line with evidence> |
| Style | X/3 | <one line with evidence> |
| Vibes | X/3 | <one line with evidence> |
### Detailed Feedback
#### <Dimension Name> (X/3)
<paragraph: what's strong, what's weak, direct quotes from the SKILL.md,
specific fix suggestions if score < 3>Repeat the detailed feedback section for all 11 dimensions. End with:
### Verdict
<2-3 sentences: is this competition-ready? What is the single
highest-leverage improvement? Be specific.>Load references/example-evaluation.md on request for calibration. For batch requests, produce one scorecard per skill sequentially.
docs
superpowers
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
references