paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

1.80x

Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

Securityby

Risky

Do not use without reviewing

aie26-skill-judge

Name: paker-it/aie26-skill-judge
Rating: 82.39999999999999 (1 reviews)
Author: paker-it

A Claude skill that evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions — 8 from the official Tessl rubric + 3 bonus (innovation, style, vibes).

Built for Tessl judges scoring a batch. Useful for contestants self-checking before they submit.

🏆 See it in action: We evaluated all 15 AIE26 submissions and ranked the top 10 — jump to the results

What it does

Feed it a SKILL.md (GitHub URL, file path, or raw paste) and it runs a 5-phase evaluation:

Phase	What happens
1. Ingest	Detects input format, extracts name + metadata
2. Structural Check	Validates frontmatter, line count, trigger terms — blocks scoring if broken
3. Core Evaluation	Scores 8 official Tessl dimensions (Specificity, Trigger Terms, Completeness, Distinctiveness, Conciseness, Actionability, Workflow Clarity, Progressive Disclosure)
4. Bonus Evaluation	Scores Innovation, Style, and Vibes
5. Synthesize	Produces a scorecard, per-dimension feedback, and a verdict

Core score: 0-100 (normalized from 8 dimensions x 3 max). Bonus: +X/9 reported separately.

Triggers

Say any of:

"judge my AIE26 contest skill"
"score this SKILL.md for the contest"
"review my skill submission"
"how would this score on the leaderboard"
"rate my skill before I submit"
"give me judge feedback on this skill"

Scoring dimensions

Core (Official Tessl Rubric)

Dimension	What it measures
Specificity	Concrete, actionable capabilities listed
Trigger Terms	Natural phrases users would actually say
Completeness	Clear "what" (purpose) and "when" (usage)
Distinctiveness	Low conflict risk; clear niche
Conciseness	Token efficiency; no padding
Actionability	Executable instructions, concrete examples
Workflow Clarity	Sequenced phases with exit gates
Progressive Disclosure	Layered references loaded on demand

Bonus

Dimension	What it measures
Innovation	Novel approach, not a commodity wrapper
Style	Human authorial voice, tone consistency
Vibes	"Would I install this?" + compelling hook

Each dimension scored 1 (Weak), 2 (Adequate), 3 (Strong). Detailed criteria in references/scoring-rubric.md.

What's in the bundle

aie26-skill-judge/
├── SKILL.md                          5-phase evaluation workflow
├── references/
│   ├── scoring-rubric.md             Detailed criteria for all 11 dimensions
│   └── example-evaluation.md         Worked example (devcon-hack-coach, 100/100)
└── README.md                         This file

Install

From the Tessl registry:

tessl install paker-it/aie26-skill-judge

Or directly — clone this repo and point your Claude Code config at the directory:

git clone https://github.com/mertpaker/aie26-skill-judge.git ~/.claude/skills/aie26-skill-judge

Example output

See references/example-evaluation.md for a full worked evaluation of the devcon-hack-coach skill (Core: 100/100, Bonus: +8/9).

AIE26 Leaderboard Evaluation — Top 10

We used aie26-skill-judge to evaluate all 15 submissions on the AIE26 leaderboard and rank the top 10.

Disclaimer

This evaluation is a weekend experiment — a way to dogfood the skill against real submissions, not a judgment on anyone's work. Every skill on the leaderboard represents time, creativity, and craft that we respect. Scores are generated by an LLM applying a rubric and will vary between runs; they are not definitive rankings. If your skill appears here and you'd like it removed, open an issue and we'll take it down immediately. We love all developers who showed up and built something.

Prompt used

"Here are all the AIE26 contest submissions. Judge each one using the aie26-skill-judge rubric (8 core dimensions scored 1-3 from references/scoring-rubric.md + 3 bonus dimensions). Produce a scorecard for each, then rank the top 10 and compare their pros and cons."

Each skill was evaluated independently by a separate agent using the scoring rubric, then results were compiled and ranked.

How to read the scores: Core is our judge score — round((sum of 8 dimension scores / 24) * 100) using the aie26-skill-judge rubric. Leaderboard is the official Tessl automated review score from the AIE26 contest page. These are two independent scoring systems and may disagree.

Rankings

Rank	Skill	Author	Core	Bonus	Leaderboard	Registry
1	wigo	Paulo Matos	96/100	+8/9	86/100	82/100
2	devcon-hack-coach	Mert Paker	96/100	+8/9	100/100	100/100
3	evidence-verifier	Macey Baker	96/100	+7/9	93/100	—
4	k8s-security-audit	Juan	92/100	+7/9	100/100	—
5	spec-interrogator	Jakub Czarnowski	88/100	+8/9	100/100	—
6	skill-writer	Juan	88/100	+7/9	100/100	—
7	shekel-ui	Omer Bresinski	83/100	+7/9	85/100	—
8	de-llm-ify-writing	Alan Pope	79/100	+5/9	53/100	—
9	agent-school	James Moss	78/100	+8/9	81/100	85/100
10	writing-clearly-and-concisely	Martin Wimpress	77/100	+7/9	73/100	—

Registry scores from tessl.io. Paste-submitted skills (—) don't have registry pages.

Dimension-by-dimension breakdown (Top 5)

Dimension	wigo	devcon-hack-coach	evidence-verifier	k8s-security-audit	spec-interrogator
Specificity	3	3	3	3	3
Trigger Terms	3	3	3	3	3
Completeness	3	3	3	3	3
Distinctiveness	3	3	3	3	3
Conciseness	3	3	3	2	3
Actionability	3	3	3	3	3
Workflow Clarity	3	3	2	3	2
Progressive Disclosure	2	3	3	2	3
Innovation	3	3	2	2	3
Style	2	3	3	2	3
Vibes	3	2	2	3	2

Pros & Cons

1. wigo (96 + 8) — Paulo Matos

Pros	Cons
Most innovative technique: mines Claude's own `.jsonl` session logs to reconstruct context	All content in one file — Python script and suggestion matrix should be in references
Solves a universal problem: "where was I?" after context-switching	Voice is technically precise but personality-neutral
Explicit parallel/sequential phasing with conditional branching	Heavier than others — 9400 chars of inline code
Memorable name, strong hook, you'd share this with teammates

2. devcon-hack-coach (96 + 8) — Mert Paker

Pros	Cons
Best workflow clarity — 4 phases with named exit gates, loop-back conditions, and a terminal state	Narrow audience: only useful for DevCon 2026 attendees
Strongest voice: "That's three features. Pick one." — pushy coach persona never slips	Event-specific scoping limits shelf life
Textbook progressive disclosure: 5 references, each tied to a specific phase	Could be generalized to "any 24h hackathon" for broader reach
Spec-before-code hard gate is a genuinely original coaching angle

3. evidence-verifier (96 + 7) — Macey Baker

Pros	Cons
Leanest skill of all — zero waste, every line earns its place	No exit gate: what happens when a claim is `blocked`?
Evidence table output format is immediately actionable	Only 3 trigger phrases (rubric wants 3-6)
Strong epistemic stance: "refuse to certify without evidence"	Concept is "obviously good" rather than "surprisingly brilliant"
Worked mini-example grounds the template in reality	No reference files — simple but no room to deepen

4. k8s-security-audit (92 + 7) — Juan

Pros	Cons
8+ trigger phrases covering every way someone asks for a k8s audit	Essential Evidence Commands block should be in `references/`, not inline
Real kubectl/jq commands make it production-ready	Voice is professional but impersonal
8 audit categories with severity taxonomy = serious depth	Innovation limited — follows well-known CIS/NSA frameworks
Strongest practical "vibes" — you'd install this tomorrow	Partial progressive disclosure

5. spec-interrogator (88 + 8) — Jakub Czarnowski

Pros	Cons
Highest innovation density: "propose a recommended answer with every question"	Soft stop condition — relies on user to say "we're done"
Best craft: "Kill scope creep on sight", "never ask a cold question"	Implicit phases with no named exit gates
"Read the codebase instead of asking" — a rule nobody else thought of	No output by default — some users won't know to ask for a deliverable
Most concise: entire skill in ~50 lines	Slightly less "production-ready" feel than the top 4

6. skill-writer (88 + 7) — Juan

Pros	Cons
Meta-skill: teaches you to write skills — useful for the entire ecosystem	Workflow gates are thin — "return to the relevant step" lacks specifics
"Be pushy — Claude undertriggers" is opinionated, practical advice	Progressive disclosure reference isn't gated with load conditions
Copy-paste-ready minimal example included	Innovation is predictable: meta-skill for a skill ecosystem
Clear folder structure and YAML rules for newcomers

7. shekel-ui (83 + 7) — Omer Bresinski

Pros	Cons
Textbook progressive disclosure: lean main file, 4 named reference docs	No trigger phrases — `user-invocable: false` means router can't find it
Extremely specific: named fonts, oklch tokens, exact utility classes	Workflow deferred entirely to reference file — main SKILL.md has no phases
Strong opinionated voice: "hermetically sealed", "warm editorial"	Niche audience: only useful if you're building with this specific design system
Concise and zero-waste throughout

8. de-llm-ify-writing (79 + 5) — Alan Pope

Pros	Cons
Names specific LLM tells: "staccato sentence patterns", "stock contrast constructions"	No trigger phrases at all — users can't find it
Concrete quality checklist for evaluating prose	No workflow phases or sequencing — reference doc, not a skill
Strong authorial voice with real opinions about writing	Some redundancy in anti-pattern section
Addresses a timely, real problem (AI-sounding prose)

9. agent-school (78 + 8) — James Moss

Pros	Cons
Novel concept: generate persistent tile artifacts to teach agents	Trigger terms are implicit, not conversational
5 clear phases with user confirmation checkpoints	Somewhat verbose — could trim 15-20%
High innovation: bakes knowledge into agent systems, not one-off answers	Niche audience: tessl tile authors only
Strong authoritative voice throughout

10. writing-clearly-and-concisely (77 + 7) — Martin Wimpress

Pros	Cons
Excellent specificity: concrete before/after examples for every rule	No conversational trigger phrases
Novel AI-pattern taxonomy: banned words, puffery detection	No workflow — rules listed but not sequenced
Progressive disclosure to prose-style-reference for extended tasks	Academic presentation — useful but not exciting
Every line earns its place — tight and concise

Credits

Built for the AI Engineer Europe 2026 skills contest. Submission by Mert Paker.

Tessl review score: 94% (Description 100%, Content 85%).

Workspace: paker-it
Visibility: Public
Created: 2 months ago
Last updated: 7 days ago
Publish Source: CLI
Badge