Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.
82
94%
Does it follow best practices?
Impact
65%
1.80xAverage score across 5 eval scenarios
Risky
Do not use without reviewing
A dual-purpose evaluation skill for the AI Engineer London 2026 Skills Contest (skillleaderboard.alan-626.workers.dev/AIE26). Designed for Tessl judges scoring submissions, but equally useful for contestants self-checking before they submit.
Name: aie26-skill-judge
The skill accepts a SKILL.md in three ways:
The skill auto-detects which format it's receiving and normalizes before evaluation. Detection logic:
https://github.com or github.com → repo URL, fetch SKILL.md--- (YAML frontmatter) → raw paste/ or ~ or contains .md file extension → local file path8 dimensions, each scored 1-3. Mirrors the official contest rubric for comparability.
| Category | Dimension | What it measures |
|---|---|---|
| Description | Specificity | Concrete, actionable capabilities listed |
| Description | Trigger Terms | Natural phrases users would actually say |
| Description | Completeness | Clear "what" (purpose) and "when" (usage scenarios) |
| Description | Distinctiveness | Low conflict with existing skills; clear niche |
| Content | Conciseness | Token efficiency; no padding or over-explanation |
| Content | Actionability | Executable instructions, concrete examples, specific constraints |
| Content | Workflow Clarity | Sequenced phases with explicit exit gates and loop-back conditions |
| Content | Progressive Disclosure | Layered references loaded only when needed |
Core score: 8 dimensions x 3 max = 24 raw, normalized to 0-100.
3 additional dimensions, each scored 1-3. Reported separately to preserve comparability with the official rubric.
| Dimension | What it measures |
|---|---|
| Innovation | Novel approach, creative problem framing, not a rehash of existing tools |
| Style | Authorial voice, tone consistency, reads like a human wrote it (not AI slop) |
| Vibes | "Would I install this?", solves a real itch, compelling hook, confident attitude |
Bonus score: reported as "+X/9" alongside the core score.
Each dimension uses a 1-3 scale:
The skill produces a structured scorecard followed by per-dimension detailed feedback.
## Scorecard: <skill-name>
### Core Score: XX/100
| Dimension | Score | Reasoning |
|----------------------|-------|------------------------------|
| Specificity | X/3 | <one line> |
| Trigger Terms | X/3 | <one line> |
| Completeness | X/3 | <one line> |
| Distinctiveness | X/3 | <one line> |
| Conciseness | X/3 | <one line> |
| Actionability | X/3 | <one line> |
| Workflow Clarity | X/3 | <one line> |
| Progressive Disclosure | X/3 | <one line> |
### Bonus Score: +X/9
| Dimension | Score | Reasoning |
|------------|-------|------------------------------|
| Innovation | X/3 | <one line> |
| Style | X/3 | <one line> |
| Vibes | X/3 | <one line> |
### Detailed Feedback
#### Specificity (X/3)
<paragraph: what's strong, what to fix, specific examples from the SKILL.md>
#### Trigger Terms (X/3)
<paragraph>
#### Completeness (X/3)
<paragraph>
#### Distinctiveness (X/3)
<paragraph>
#### Conciseness (X/3)
<paragraph>
#### Actionability (X/3)
<paragraph>
#### Workflow Clarity (X/3)
<paragraph>
#### Progressive Disclosure (X/3)
<paragraph>
#### Innovation (X/3)
<paragraph>
#### Style (X/3)
<paragraph>
#### Vibes (X/3)
<paragraph>
### Verdict
<2-3 sentence summary: is this competition-ready, what's the single
highest-leverage improvement the author should make>5-phase sequential evaluation:
name and description fields, line count is under 500, SKILL.md is parseable. If structural issues exist, report them and stop (don't score a broken submission)Authoritative but constructive. Like a senior judge giving feedback at a pitch competition:
Before scoring, the skill checks:
---, ends with ---)name field present and non-emptydescription field present and non-emptyIf any check fails, the skill reports the failures with fix instructions and does not proceed to scoring.
tessl skill review or any CLI commands — it's a pure conversational evaluationdocs
superpowers
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
references