CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

aie26-skill-judge

A Claude skill that evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions — 8 from the official Tessl rubric + 3 bonus (innovation, style, vibes).

Built for Tessl judges scoring a batch. Useful for contestants self-checking before they submit.

Tessl Registry

🏆 See it in action: We evaluated all 15 AIE26 submissions and ranked the top 10 — jump to the results


What it does

Feed it a SKILL.md (GitHub URL, file path, or raw paste) and it runs a 5-phase evaluation:

PhaseWhat happens
1. IngestDetects input format, extracts name + metadata
2. Structural CheckValidates frontmatter, line count, trigger terms — blocks scoring if broken
3. Core EvaluationScores 8 official Tessl dimensions (Specificity, Trigger Terms, Completeness, Distinctiveness, Conciseness, Actionability, Workflow Clarity, Progressive Disclosure)
4. Bonus EvaluationScores Innovation, Style, and Vibes
5. SynthesizeProduces a scorecard, per-dimension feedback, and a verdict

Core score: 0-100 (normalized from 8 dimensions x 3 max). Bonus: +X/9 reported separately.

Triggers

Say any of:

  • "judge my AIE26 contest skill"
  • "score this SKILL.md for the contest"
  • "review my skill submission"
  • "how would this score on the leaderboard"
  • "rate my skill before I submit"
  • "give me judge feedback on this skill"

Scoring dimensions

Core (Official Tessl Rubric)

DimensionWhat it measures
SpecificityConcrete, actionable capabilities listed
Trigger TermsNatural phrases users would actually say
CompletenessClear "what" (purpose) and "when" (usage)
DistinctivenessLow conflict risk; clear niche
ConcisenessToken efficiency; no padding
ActionabilityExecutable instructions, concrete examples
Workflow ClaritySequenced phases with exit gates
Progressive DisclosureLayered references loaded on demand

Bonus

DimensionWhat it measures
InnovationNovel approach, not a commodity wrapper
StyleHuman authorial voice, tone consistency
Vibes"Would I install this?" + compelling hook

Each dimension scored 1 (Weak), 2 (Adequate), 3 (Strong). Detailed criteria in references/scoring-rubric.md.

What's in the bundle

aie26-skill-judge/
├── SKILL.md                          5-phase evaluation workflow
├── references/
│   ├── scoring-rubric.md             Detailed criteria for all 11 dimensions
│   └── example-evaluation.md         Worked example (devcon-hack-coach, 100/100)
└── README.md                         This file

Install

From the Tessl registry:

tessl install paker-it/aie26-skill-judge

Or directly — clone this repo and point your Claude Code config at the directory:

git clone https://github.com/mertpaker/aie26-skill-judge.git ~/.claude/skills/aie26-skill-judge

Example output

See references/example-evaluation.md for a full worked evaluation of the devcon-hack-coach skill (Core: 100/100, Bonus: +8/9).


AIE26 Leaderboard Evaluation — Top 10

We used aie26-skill-judge to evaluate all 15 submissions on the AIE26 leaderboard and rank the top 10.

Disclaimer

This evaluation is a weekend experiment — a way to dogfood the skill against real submissions, not a judgment on anyone's work. Every skill on the leaderboard represents time, creativity, and craft that we respect. Scores are generated by an LLM applying a rubric and will vary between runs; they are not definitive rankings. If your skill appears here and you'd like it removed, open an issue and we'll take it down immediately. We love all developers who showed up and built something.

Prompt used

"Here are all the AIE26 contest submissions. Judge each one using the aie26-skill-judge rubric (8 core dimensions scored 1-3 from references/scoring-rubric.md + 3 bonus dimensions). Produce a scorecard for each, then rank the top 10 and compare their pros and cons."

Each skill was evaluated independently by a separate agent using the scoring rubric, then results were compiled and ranked.

How to read the scores: Core is our judge score — round((sum of 8 dimension scores / 24) * 100) using the aie26-skill-judge rubric. Leaderboard is the official Tessl automated review score from the AIE26 contest page. These are two independent scoring systems and may disagree.

Rankings

RankSkillAuthorCoreBonusLeaderboardRegistry
1wigoPaulo Matos96/100+8/986/10082/100
2devcon-hack-coachMert Paker96/100+8/9100/100100/100
3evidence-verifierMacey Baker96/100+7/993/100
4k8s-security-auditJuan92/100+7/9100/100
5spec-interrogatorJakub Czarnowski88/100+8/9100/100
6skill-writerJuan88/100+7/9100/100
7shekel-uiOmer Bresinski83/100+7/985/100
8de-llm-ify-writingAlan Pope79/100+5/953/100
9agent-schoolJames Moss78/100+8/981/10085/100
10writing-clearly-and-conciselyMartin Wimpress77/100+7/973/100

Registry scores from tessl.io. Paste-submitted skills (—) don't have registry pages.

Dimension-by-dimension breakdown (Top 5)

Dimensionwigodevcon-hack-coachevidence-verifierk8s-security-auditspec-interrogator
Specificity33333
Trigger Terms33333
Completeness33333
Distinctiveness33333
Conciseness33323
Actionability33333
Workflow Clarity33232
Progressive Disclosure23323
Innovation33223
Style23323
Vibes32232

Pros & Cons

<details> <summary><strong>1. wigo</strong> (96 + 8) — Paulo Matos</summary>
ProsCons
Most innovative technique: mines Claude's own .jsonl session logs to reconstruct contextAll content in one file — Python script and suggestion matrix should be in references
Solves a universal problem: "where was I?" after context-switchingVoice is technically precise but personality-neutral
Explicit parallel/sequential phasing with conditional branchingHeavier than others — 9400 chars of inline code
Memorable name, strong hook, you'd share this with teammates
</details> <details> <summary><strong>2. devcon-hack-coach</strong> (96 + 8) — Mert Paker</summary>
ProsCons
Best workflow clarity — 4 phases with named exit gates, loop-back conditions, and a terminal stateNarrow audience: only useful for DevCon 2026 attendees
Strongest voice: "That's three features. Pick one." — pushy coach persona never slipsEvent-specific scoping limits shelf life
Textbook progressive disclosure: 5 references, each tied to a specific phaseCould be generalized to "any 24h hackathon" for broader reach
Spec-before-code hard gate is a genuinely original coaching angle
</details> <details> <summary><strong>3. evidence-verifier</strong> (96 + 7) — Macey Baker</summary>
ProsCons
Leanest skill of all — zero waste, every line earns its placeNo exit gate: what happens when a claim is blocked?
Evidence table output format is immediately actionableOnly 3 trigger phrases (rubric wants 3-6)
Strong epistemic stance: "refuse to certify without evidence"Concept is "obviously good" rather than "surprisingly brilliant"
Worked mini-example grounds the template in realityNo reference files — simple but no room to deepen
</details> <details> <summary><strong>4. k8s-security-audit</strong> (92 + 7) — Juan</summary>
ProsCons
8+ trigger phrases covering every way someone asks for a k8s auditEssential Evidence Commands block should be in references/, not inline
Real kubectl/jq commands make it production-readyVoice is professional but impersonal
8 audit categories with severity taxonomy = serious depthInnovation limited — follows well-known CIS/NSA frameworks
Strongest practical "vibes" — you'd install this tomorrowPartial progressive disclosure
</details> <details> <summary><strong>5. spec-interrogator</strong> (88 + 8) — Jakub Czarnowski</summary>
ProsCons
Highest innovation density: "propose a recommended answer with every question"Soft stop condition — relies on user to say "we're done"
Best craft: "Kill scope creep on sight", "never ask a cold question"Implicit phases with no named exit gates
"Read the codebase instead of asking" — a rule nobody else thought ofNo output by default — some users won't know to ask for a deliverable
Most concise: entire skill in ~50 linesSlightly less "production-ready" feel than the top 4
</details> <details> <summary><strong>6. skill-writer</strong> (88 + 7) — Juan</summary>
ProsCons
Meta-skill: teaches you to write skills — useful for the entire ecosystemWorkflow gates are thin — "return to the relevant step" lacks specifics
"Be pushy — Claude undertriggers" is opinionated, practical adviceProgressive disclosure reference isn't gated with load conditions
Copy-paste-ready minimal example includedInnovation is predictable: meta-skill for a skill ecosystem
Clear folder structure and YAML rules for newcomers
</details> <details> <summary><strong>7. shekel-ui</strong> (83 + 7) — Omer Bresinski</summary>
ProsCons
Textbook progressive disclosure: lean main file, 4 named reference docsNo trigger phrases — user-invocable: false means router can't find it
Extremely specific: named fonts, oklch tokens, exact utility classesWorkflow deferred entirely to reference file — main SKILL.md has no phases
Strong opinionated voice: "hermetically sealed", "warm editorial"Niche audience: only useful if you're building with this specific design system
Concise and zero-waste throughout
</details> <details> <summary><strong>8. de-llm-ify-writing</strong> (79 + 5) — Alan Pope</summary>
ProsCons
Names specific LLM tells: "staccato sentence patterns", "stock contrast constructions"No trigger phrases at all — users can't find it
Concrete quality checklist for evaluating proseNo workflow phases or sequencing — reference doc, not a skill
Strong authorial voice with real opinions about writingSome redundancy in anti-pattern section
Addresses a timely, real problem (AI-sounding prose)
</details> <details> <summary><strong>9. agent-school</strong> (78 + 8) — James Moss</summary>
ProsCons
Novel concept: generate persistent tile artifacts to teach agentsTrigger terms are implicit, not conversational
5 clear phases with user confirmation checkpointsSomewhat verbose — could trim 15-20%
High innovation: bakes knowledge into agent systems, not one-off answersNiche audience: tessl tile authors only
Strong authoritative voice throughout
</details> <details> <summary><strong>10. writing-clearly-and-concisely</strong> (77 + 7) — Martin Wimpress</summary>
ProsCons
Excellent specificity: concrete before/after examples for every ruleNo conversational trigger phrases
Novel AI-pattern taxonomy: banned words, puffery detectionNo workflow — rules listed but not sequenced
Progressive disclosure to prose-style-reference for extended tasksAcademic presentation — useful but not exciting
Every line earns its place — tight and concise
</details>

Credits

Built for the AI Engineer Europe 2026 skills contest. Submission by Mert Paker.

Tessl review score: 94% (Description 100%, Content 85%).

Workspace
paker-it
Visibility
Public
Created
Last updated
Publish Source
CLI
Badge
paker-it/aie26-skill-judge badge