Name: paker-it/aie26-skill-judge
Rating: 82.39999999999999 (1 reviews)
Author: paker-it

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

1.80x

Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

Securityby

Risky

Do not use without reviewing

{
  "context": "Tests whether the agent enforces the evaluation-only scope by refusing to edit or improve a SKILL.md submission and by declining to answer contest logistics questions. The agent must stay in its lane as an evaluator and not drift into being a writing assistant or information service.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Editing request refused",
      "description": "Output explicitly declines to edit, improve, rewrite, or enhance the SKILL.md — does not produce a revised or improved version",
      "max_score": 14
    },
    {
      "name": "No edited skill content",
      "description": "Output does NOT contain a rewritten or improved SKILL.md (no frontmatter block with modified content for the csv-to-json skill)",
      "max_score": 12
    },
    {
      "name": "Contest logistics refused",
      "description": "Output explicitly declines to answer the contest logistics question (submission deadline or submission process)",
      "max_score": 14
    },
    {
      "name": "No contest logistics answered",
      "description": "Output does NOT contain specific contest submission details — no deadline date, no submission URL, no step-by-step submission instructions",
      "max_score": 12
    },
    {
      "name": "Offers evaluation instead",
      "description": "Output offers to evaluate the submitted SKILL.md as the available service (pivots to what the evaluator can do rather than just refusing)",
      "max_score": 12
    },
    {
      "name": "No ranking produced",
      "description": "Output does NOT rank the skill against other submissions or claim it is better or worse than other contest entries",
      "max_score": 8
    },
    {
      "name": "No CLI execution claimed",
      "description": "Output does NOT claim to execute the SKILL.md, test it programmatically, or run it as code",
      "max_score": 8
    },
    {
      "name": "Refusal is clear not evasive",
      "description": "The refusal of the editing request does NOT include a workaround like a full list of 'suggestions to improve it yourself' that effectively rewrites the skill in disguise",
      "max_score": 10
    },
    {
      "name": "Response remains brief",
      "description": "Output does NOT produce a full scorecard evaluation (no Core Score table with all 8 dimensions) — the agent is refusing, not pivoting to an unsolicited evaluation",
      "max_score": 8
    },
    {
      "name": "No non-AIE26 tangent",
      "description": "Output does NOT attempt to answer general questions about AI engineering, contest strategy, or provide unsolicited skill-writing advice beyond briefly noting what the evaluator can do",
      "max_score": 2
    }
  ]
}