CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-4/

{
  "context": "Tests whether the agent correctly handles two edge cases: (1) a very short SKILL.md under 20 lines, which should be flagged as incomplete but still scored, and (2) a batch evaluation request with two skills, which should produce two separate complete scorecards sequentially.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Short skill flagged as incomplete",
      "description": "The evaluation of emoji-picker explicitly flags it as incomplete, too short, or references the low line count — mentions 'incomplete', '< 20 lines', 'very short', or similar",
      "max_score": 12
    },
    {
      "name": "Short skill still scored",
      "description": "Despite being flagged as incomplete, the emoji-picker still receives dimension scores — a scorecard table or X/3 scores appear in its evaluation",
      "max_score": 12
    },
    {
      "name": "Short skill receipt confirmation",
      "description": "Output contains a receipt confirmation for emoji-picker in the format 'Got it — evaluating `emoji-picker` (N lines). Running the gauntlet.'",
      "max_score": 8
    },
    {
      "name": "Short skill Phase 1 display",
      "description": "Output contains a Phase 1 display line for emoji-picker: 'Evaluating `emoji-picker` — N lines, N reference files mentioned.'",
      "max_score": 8
    },
    {
      "name": "First batch skill complete scorecard",
      "description": "The emoji-picker receives a scorecard with a Core Score and at least some dimension scores (not just a refusal or flag)",
      "max_score": 8
    },
    {
      "name": "Second batch skill receipt confirmation",
      "description": "Output contains a receipt confirmation for git-commit-helper in the format 'Got it — evaluating `git-commit-helper` (N lines). Running the gauntlet.'",
      "max_score": 8
    },
    {
      "name": "Second batch skill Phase 1 display",
      "description": "Output contains a Phase 1 display line for git-commit-helper: 'Evaluating `git-commit-helper` — N lines, N reference files mentioned.'",
      "max_score": 8
    },
    {
      "name": "Second batch skill complete scorecard",
      "description": "The git-commit-helper receives a complete scorecard with all 8 core dimensions scored and a Core Score percentage",
      "max_score": 10
    },
    {
      "name": "Two separate scorecards",
      "description": "The output contains two distinct scorecard headings — 'Scorecard: emoji-picker' and 'Scorecard: git-commit-helper' — rather than a merged or combined evaluation",
      "max_score": 12
    },
    {
      "name": "Scorecards produced sequentially",
      "description": "The emoji-picker scorecard is fully completed before the git-commit-helper scorecard begins — they are not interleaved",
      "max_score": 14
    }
  ]
}

README.md

SKILL.md

tessl.json

tile.json