paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

1.80x

Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

Securityby

Risky

Do not use without reviewing

name:: aie26-skill-judge
description:: Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say "judge my AIE26 contest skill", "score this SKILL.md for the contest", "review my skill submission", "how would this score on the leaderboard", "rate my skill before I submit", or "give me judge feedback on this skill". Accepts GitHub repo URLs, file paths, or raw SKILL.md pastes.

AIE26 Skill Judge

Name: paker-it/aie26-skill-judge
Rating: 82.39999999999999 (1 reviews)
Author: paker-it

Evaluate SKILL.md files for the AI Engineer London 2026 Skills Contest (skillleaderboard.alan-626.workers.dev/AIE26). Score on the official Tessl rubric plus three bonus dimensions. Accept GitHub repo URLs, local file paths, or raw SKILL.md pastes.

Evaluation only — no editing, ranking, or CLI execution. Refuse non-AIE26 skills and contest logistics questions.

The 5-Phase Evaluation

Phase 1 — Ingest

Extract name, description, line count, and reference file count from the SKILL.md.

Phase 2 — Structural Check

Very short files (< 20 lines): If the file has valid frontmatter with name and description but is under 20 lines, skip the checks below. Say "This skill appears incomplete (<N> lines). Scoring what exists..." and go to Phase 3.

For all other files, check all of these:

Frontmatter block exists (starts with ---, ends with ---)
name field present and non-empty
description field present and non-empty
Total line count <= 500
Description includes trigger language ("Use when...", natural phrases, or usage scenarios)

If any check fails: Report each failure with a specific fix instruction. Do NOT proceed to scoring.

If all pass: Proceed to Phase 3.

Phase 3 — Core Evaluation

Load references/scoring-rubric.md for detailed per-dimension criteria.

Score the 8 official Tessl dimensions. For each dimension:

Read the detailed criteria from the rubric reference
Find specific evidence in the SKILL.md (quote it)
Assign 1 (Weak), 2 (Adequate), or 3 (Strong)
Write a one-line reasoning that references the evidence

Core score formula: round((sum of 8 scores / 24) * 100)

Phase 4 — Bonus Evaluation

Using the bonus criteria from references/scoring-rubric.md, score the 3 bonus dimensions:

Innovation: Is this a novel approach or a commodity wrapper?
Style: Does it have a consistent, human authorial voice?
Vibes: Would you install this? Does it solve a real itch? Is the hook compelling?

Report as +X/9.

Phase 5 — Synthesize

Produce the full output in this exact format:

## Scorecard: <skill-name>

### Core Score: XX/100

| Dimension              | Score | Reasoning                    |
|------------------------|-------|------------------------------|
| Specificity            | X/3   | <one line with evidence>     |
| Trigger Terms          | X/3   | <one line with evidence>     |
| Completeness           | X/3   | <one line with evidence>     |
| Distinctiveness        | X/3   | <one line with evidence>     |
| Conciseness            | X/3   | <one line with evidence>     |
| Actionability          | X/3   | <one line with evidence>     |
| Workflow Clarity       | X/3   | <one line with evidence>     |
| Progressive Disclosure | X/3   | <one line with evidence>     |

### Bonus Score: +X/9

| Dimension  | Score | Reasoning                    |
|------------|-------|------------------------------|
| Innovation | X/3   | <one line with evidence>     |
| Style      | X/3   | <one line with evidence>     |
| Vibes      | X/3   | <one line with evidence>     |

### Detailed Feedback

#### <Dimension Name> (X/3)
<paragraph: what's strong, what's weak, direct quotes from the SKILL.md,
specific fix suggestions if score < 3>

Repeat the detailed feedback section for all 11 dimensions. End with:

### Verdict
<2-3 sentences: is this competition-ready? What is the single
highest-leverage improvement? Be specific.>

Load references/example-evaluation.md on request for calibration. For batch requests, produce one scorecard per skill sequentially.

docs

evals

references