paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

1.80x

Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

Securityby

Risky

Do not use without reviewing

AIE26 Scoring Rubric — Detailed Criteria

Name: paker-it/aie26-skill-judge
Rating: 82.39999999999999 (1 reviews)
Author: paker-it

Core Dimensions (Official Tessl Rubric)

Each scored 1 (Weak), 2 (Adequate), 3 (Strong).

Specificity

Does the description name concrete, actionable capabilities?

Score	Criteria
1	Vague or abstract ("helps with development", "AI-powered assistant"). No specific actions named.
2	Names some capabilities but mixes concrete with vague. Unclear what the skill actually does vs. what it aspires to.
3	Every capability is a concrete verb + object ("generates Kubernetes RBAC policies", "scores SKILL.md submissions on 11 dimensions"). Reader knows exactly what they get.

Red flags for 1: "helps", "assists", "enhances", "leverages" without a direct object.

Trigger Terms

Does the description include natural phrases a user would actually say?

Score	Criteria
1	No trigger phrases, or phrases no human would say ("utilize the skill to evaluate").
2	1-2 trigger phrases present but generic ("help me with X") or forced-sounding.
3	3-6 natural trigger phrases that read like something a real person would type. Covers both direct requests and situational triggers.

Strong examples: "judge my AIE26 contest skill", "score this for the contest", "will this win?" Weak examples: "invoke skill evaluation mode", "perform assessment"

Completeness

Does the description cover both what (purpose) and when (usage scenarios)?

Score	Criteria
1	Missing either what or when entirely. Reader can't tell what it does OR when to use it.
2	Has what but weak/missing when, or vice versa. Purpose is clear but activation context is vague.
3	Crystal clear on both. Reader knows the skill's purpose AND can identify the exact moment they'd reach for it.

Distinctiveness

Is this skill clearly different from existing skills? Low conflict risk?

Score	Criteria
1	Overlaps heavily with common built-in skills or well-known existing skills. Would confuse the skill router.
2	Somewhat distinct but shares surface area with adjacent skills. Trigger terms could collide.
3	Clear niche. No realistic conflict with existing skills. Name + description + triggers carve out unique territory.

Conciseness

Is the content token-efficient? No padding or over-explanation?

Score	Criteria
1	Bloated with filler, redundant sections, or verbose explanations of simple concepts. Could be half the length.
2	Some unnecessary prose but core content is present. Could trim 20-30% without losing substance.
3	Every line earns its place. No padding, no redundancy, no over-explaining. Uses tables and lists over paragraphs where appropriate.

Red flags for 1: Repeating the same instruction in different words, explaining what markdown formatting is, long preambles before the actual instructions.

Actionability

Does the content contain executable instructions with concrete examples?

Score	Criteria
1	Abstract methodology or theory. No examples, no constraints, no concrete steps.
2	Some concrete instructions but mixed with vague guidance ("handle edge cases appropriately").
3	Every instruction is specific enough to execute without interpretation. Includes examples, constraints, expected outputs.

Workflow Clarity

Are instructions sequenced into clear phases with exit gates?

Score	Criteria
1	Unstructured wall of instructions. No clear order or phases.
2	Has phases/sections but missing exit gates, or unclear when to move between phases.
3	Numbered/named phases with explicit entry conditions, exit gates, and loop-back conditions. Reader always knows where they are in the workflow.

Progressive Disclosure

Does the skill load information only when needed?

Score	Criteria
1	Everything in one file. No reference files. Or: references exist but are loaded eagerly.
2	Some references exist but loading isn't well-timed, or reference structure is unclear.
3	Reference files loaded only at the phase that needs them. Main SKILL.md is lean. References are clearly named and scoped.

Note: A simple skill that genuinely doesn't need references can still score 3 — progressive disclosure means "don't front-load what isn't needed yet", not "must have reference files."

Bonus Dimensions

Innovation

Score	Criteria
1	Commodity wrapper around a well-known tool or API. No novel framing. "ChatGPT but for X."
2	Applies existing techniques to a specific domain in a useful way. Competent but not surprising.
3	Genuinely novel approach, creative problem framing, or addresses a gap nobody else has filled. Makes you think "why didn't this exist already?"

Style

Score	Criteria
1	Reads like generic AI-generated text. No authorial voice. Corporate-bland or template-obvious.
2	Has some personality but inconsistent. Mixes voices or defaults to generic in places.
3	Consistent, confident authorial voice throughout. Reads like a person with opinions wrote it. Tone matches the skill's purpose.

Strong signal: The voice section (if present) has specific examples, not just adjectives. Weak signal: "Be helpful and professional" — says nothing.

Vibes

The gut-check composite. Score based on these three sub-questions:

Would I install this? — Does it solve a real problem I've had, or is it a solution looking for a problem?
Does the voice feel human? — Confident and opinionated, or hedging and generic?
Is the hook compelling? — After reading the name + description + first 10 lines, do I want to keep reading?

Score	Criteria
1	Fails all three. Academic exercise or toy demo. No pull.
2	Passes 1-2. Useful but not exciting, or exciting but not practical.
3	Passes all three. You'd install it, recommend it, and remember the name.