CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests whether the agent runs the full 5-phase evaluation workflow and produces output in the exact required format, including the receipt confirmation, Phase 1 display line, scorecard tables, detailed feedback for all 11 dimensions, the correct core score formula, bonus score format, and a properly structured verdict.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Receipt confirmation format",
      "description": "Output contains the receipt confirmation with backtick-quoted skill name, line count in parentheses, and the phrase 'Running the gauntlet.' — e.g. \"Got it — evaluating `X` (N lines). Running the gauntlet.\"",
      "max_score": 8
    },
    {
      "name": "Phase 1 display line",
      "description": "Output contains Phase 1 display line in format \"Evaluating `<name>` — <N> lines, <N> reference files mentioned.\"",
      "max_score": 8
    },
    {
      "name": "Structure pass message",
      "description": "Output contains \"Structure checks passed. Scoring...\" or equivalent phrase indicating all Phase 2 checks passed before proceeding",
      "max_score": 6
    },
    {
      "name": "Core score table present",
      "description": "Output contains a scorecard table with all 8 official dimensions: Specificity, Trigger Terms, Completeness, Distinctiveness, Conciseness, Actionability, Workflow Clarity, Progressive Disclosure",
      "max_score": 10
    },
    {
      "name": "Scores as X/3",
      "description": "Each of the 8 core dimension scores is expressed as X/3 (e.g. \"2/3\" or \"3/3\")",
      "max_score": 6
    },
    {
      "name": "Core score formula applied",
      "description": "The displayed Core Score (e.g. \"XX/100\") is consistent with round((sum of 8 dimension scores / 24) * 100)",
      "max_score": 12
    },
    {
      "name": "Bonus score table present",
      "description": "Output contains a bonus score table with all 3 bonus dimensions: Innovation, Style, Vibes",
      "max_score": 8
    },
    {
      "name": "Bonus score format",
      "description": "Bonus total is expressed as \"+X/9\" format (e.g. \"+7/9\")",
      "max_score": 8
    },
    {
      "name": "Detailed feedback for all 11 dimensions",
      "description": "Output contains a \"Detailed Feedback\" section with individual subsections for all 11 dimensions (8 core + 3 bonus), each with a heading that includes the dimension name and score",
      "max_score": 16
    },
    {
      "name": "Verdict present",
      "description": "Output contains a \"Verdict\" section at the end of the evaluation",
      "max_score": 6
    },
    {
      "name": "Verdict mentions highest-leverage improvement",
      "description": "The Verdict section identifies a specific single improvement suggestion (not just 'good job') — something concrete that references the skill content",
      "max_score": 12
    }
  ]
}

README.md

SKILL.md

tessl.json

tile.json