CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

Evaluation results

78%

74%

Evaluate an AIE26 Contest Submission

Output format and scoring formula

Criteria
Without context
With context

Receipt confirmation format

0%

0%

Phase 1 display line

0%

0%

Structure pass message

0%

0%

Core score table present

0%

100%

Scores as X/3

0%

100%

Core score formula applied

0%

100%

Bonus score table present

0%

100%

Bonus score format

0%

100%

Detailed feedback for all 11 dimensions

0%

100%

Verdict present

0%

100%

Verdict mentions highest-leverage improvement

33%

100%

19%

-17%

Score This Contest Submission

Structural validation blocking

Criteria
Without context
With context

Receipt confirmation present

0%

0%

Phase 1 display line

0%

0%

Structural Issues header format

0%

0%

Trigger language issue identified

0%

42%

Fix instruction provided

0%

41%

Numbered issue list

0%

0%

Resubmit instruction

100%

0%

Structure passed message absent

100%

100%

No core scoring table

100%

0%

No core score percentage

100%

0%

84%

33%

Help with My Contest Submission

Scope enforcement and refusal

Criteria
Without context
With context

Editing request refused

0%

100%

No edited skill content

0%

100%

Contest logistics refused

100%

100%

No contest logistics answered

100%

66%

Offers evaluation instead

0%

83%

No ranking produced

100%

100%

No CLI execution claimed

100%

100%

Refusal is clear not evasive

0%

80%

Response remains brief

100%

0%

No non-AIE26 tangent

50%

100%

68%

12%

Evaluate a Batch of Contest Submissions

Edge cases: short skill and batch

Criteria
Without context
With context

Short skill flagged as incomplete

66%

100%

Short skill still scored

100%

100%

Short skill receipt confirmation

0%

0%

Short skill Phase 1 display

0%

0%

First batch skill complete scorecard

100%

100%

Second batch skill receipt confirmation

0%

0%

Second batch skill Phase 1 display

0%

0%

Second batch skill complete scorecard

70%

100%

Two separate scorecards

66%

100%

Scorecards produced sequentially

100%

100%

80%

44%

Evaluate My Skill — But First, Show Me What Good Looks Like

Reference loading and calibration

Criteria
Without context
With context

Calibration example shown

75%

100%

Receipt confirmation for submitted skill

0%

0%

Phase 1 display line

0%

0%

Rubric level language used

25%

100%

Evidence quoted in scoring

100%

100%

Rubric criteria applied correctly

16%

66%

All 11 dimensions scored

0%

100%

Detailed feedback for all 11 dimensions

50%

100%

Core score formula applied

0%

100%

Verdict present and specific

87%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents