CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-2/

{
  "context": "Tests whether the agent correctly identifies structural problems in a SKILL.md before attempting to score it. The submitted skill has a description that contains no trigger language — it is entirely abstract with no natural phrases a user would say. The agent must report this in the required structural failure format, provide actionable fixes, and must NOT proceed to scoring.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Receipt confirmation present",
      "description": "Output contains the receipt confirmation 'Got it — evaluating `log-analyzer` (N lines). Running the gauntlet.' before the structural failure report",
      "max_score": 8
    },
    {
      "name": "Phase 1 display line",
      "description": "Output contains Phase 1 display line: 'Evaluating `log-analyzer` — N lines, N reference files mentioned.'",
      "max_score": 8
    },
    {
      "name": "Structural Issues header format",
      "description": "Output contains the phrase 'Structural Issues' followed by a count in parentheses — e.g. 'Structural Issues (1 found):' or 'Structural Issues (2 found):'",
      "max_score": 14
    },
    {
      "name": "Trigger language issue identified",
      "description": "Output explicitly identifies the missing trigger language in the description as a structural issue — mentions trigger phrases, trigger language, activation phrases, or natural phrases that users would say",
      "max_score": 14
    },
    {
      "name": "Fix instruction provided",
      "description": "The reported structural issue includes a specific actionable fix instruction (not just identifying the problem, but telling what to do — e.g. add trigger phrases or usage scenarios to the description)",
      "max_score": 12
    },
    {
      "name": "Numbered issue list",
      "description": "Structural issues are presented as a numbered list (1., 2., etc.) rather than as an undifferentiated paragraph",
      "max_score": 8
    },
    {
      "name": "Resubmit instruction",
      "description": "Output instructs the user to fix the issues before resubmitting (any phrasing equivalent to 'Fix these before resubmitting')",
      "max_score": 8
    },
    {
      "name": "Structure passed message absent",
      "description": "Output does NOT contain the phrase 'Structure checks passed' or 'Scoring...' — since checks failed, this phrase should not appear",
      "max_score": 8
    },
    {
      "name": "No core scoring table",
      "description": "Output does NOT contain a scorecard table with the 8 core dimensions (Specificity, Trigger Terms, Completeness, etc.)",
      "max_score": 10
    },
    {
      "name": "No core score percentage",
      "description": "Output does NOT contain a Core Score expressed as XX/100",
      "max_score": 10
    }
  ]
}

README.md

SKILL.md

tessl.json

tile.json