CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

scoring-rubric.mdreferences/

AIE26 Scoring Rubric — Detailed Criteria

Core Dimensions (Official Tessl Rubric)

Each scored 1 (Weak), 2 (Adequate), 3 (Strong).


Specificity

Does the description name concrete, actionable capabilities?

ScoreCriteria
1Vague or abstract ("helps with development", "AI-powered assistant"). No specific actions named.
2Names some capabilities but mixes concrete with vague. Unclear what the skill actually does vs. what it aspires to.
3Every capability is a concrete verb + object ("generates Kubernetes RBAC policies", "scores SKILL.md submissions on 11 dimensions"). Reader knows exactly what they get.

Red flags for 1: "helps", "assists", "enhances", "leverages" without a direct object.


Trigger Terms

Does the description include natural phrases a user would actually say?

ScoreCriteria
1No trigger phrases, or phrases no human would say ("utilize the skill to evaluate").
21-2 trigger phrases present but generic ("help me with X") or forced-sounding.
33-6 natural trigger phrases that read like something a real person would type. Covers both direct requests and situational triggers.

Strong examples: "judge my AIE26 contest skill", "score this for the contest", "will this win?" Weak examples: "invoke skill evaluation mode", "perform assessment"


Completeness

Does the description cover both what (purpose) and when (usage scenarios)?

ScoreCriteria
1Missing either what or when entirely. Reader can't tell what it does OR when to use it.
2Has what but weak/missing when, or vice versa. Purpose is clear but activation context is vague.
3Crystal clear on both. Reader knows the skill's purpose AND can identify the exact moment they'd reach for it.

Distinctiveness

Is this skill clearly different from existing skills? Low conflict risk?

ScoreCriteria
1Overlaps heavily with common built-in skills or well-known existing skills. Would confuse the skill router.
2Somewhat distinct but shares surface area with adjacent skills. Trigger terms could collide.
3Clear niche. No realistic conflict with existing skills. Name + description + triggers carve out unique territory.

Conciseness

Is the content token-efficient? No padding or over-explanation?

ScoreCriteria
1Bloated with filler, redundant sections, or verbose explanations of simple concepts. Could be half the length.
2Some unnecessary prose but core content is present. Could trim 20-30% without losing substance.
3Every line earns its place. No padding, no redundancy, no over-explaining. Uses tables and lists over paragraphs where appropriate.

Red flags for 1: Repeating the same instruction in different words, explaining what markdown formatting is, long preambles before the actual instructions.


Actionability

Does the content contain executable instructions with concrete examples?

ScoreCriteria
1Abstract methodology or theory. No examples, no constraints, no concrete steps.
2Some concrete instructions but mixed with vague guidance ("handle edge cases appropriately").
3Every instruction is specific enough to execute without interpretation. Includes examples, constraints, expected outputs.

Workflow Clarity

Are instructions sequenced into clear phases with exit gates?

ScoreCriteria
1Unstructured wall of instructions. No clear order or phases.
2Has phases/sections but missing exit gates, or unclear when to move between phases.
3Numbered/named phases with explicit entry conditions, exit gates, and loop-back conditions. Reader always knows where they are in the workflow.

Progressive Disclosure

Does the skill load information only when needed?

ScoreCriteria
1Everything in one file. No reference files. Or: references exist but are loaded eagerly.
2Some references exist but loading isn't well-timed, or reference structure is unclear.
3Reference files loaded only at the phase that needs them. Main SKILL.md is lean. References are clearly named and scoped.

Note: A simple skill that genuinely doesn't need references can still score 3 — progressive disclosure means "don't front-load what isn't needed yet", not "must have reference files."


Bonus Dimensions

Innovation

ScoreCriteria
1Commodity wrapper around a well-known tool or API. No novel framing. "ChatGPT but for X."
2Applies existing techniques to a specific domain in a useful way. Competent but not surprising.
3Genuinely novel approach, creative problem framing, or addresses a gap nobody else has filled. Makes you think "why didn't this exist already?"

Style

ScoreCriteria
1Reads like generic AI-generated text. No authorial voice. Corporate-bland or template-obvious.
2Has some personality but inconsistent. Mixes voices or defaults to generic in places.
3Consistent, confident authorial voice throughout. Reads like a person with opinions wrote it. Tone matches the skill's purpose.

Strong signal: The voice section (if present) has specific examples, not just adjectives. Weak signal: "Be helpful and professional" — says nothing.


Vibes

The gut-check composite. Score based on these three sub-questions:

  1. Would I install this? — Does it solve a real problem I've had, or is it a solution looking for a problem?
  2. Does the voice feel human? — Confident and opinionated, or hedging and generic?
  3. Is the hook compelling? — After reading the name + description + first 10 lines, do I want to keep reading?
ScoreCriteria
1Fails all three. Academic exercise or toy demo. No pull.
2Passes 1-2. Useful but not exciting, or exciting but not practical.
3Passes all three. You'd install it, recommend it, and remember the name.

README.md

SKILL.md

tessl.json

tile.json