CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines a narrow, specific use case with concrete actions, explicit trigger guidance via a 'Use when' clause with multiple natural phrasings, and strong distinctiveness through domain-specific terminology. It also helpfully specifies accepted input formats. The description is concise yet comprehensive.

DimensionReasoningScore

Specificity

Lists concrete actions: 'Evaluates SKILL.md submissions across 11 dimensions (8 official Tessl rubric + 3 bonus)', and specifies input formats: 'GitHub repo URLs, file paths, or raw SKILL.md pastes'. Multiple specific capabilities are named.

3 / 3

Completeness

Clearly answers both 'what' (evaluates SKILL.md submissions across 11 dimensions for the AIE London 2026 Skills Contest) and 'when' (explicit 'Use when' clause with six specific trigger phrases). Also specifies accepted input formats.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger phrases users would actually say: 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', 'how would this score on the leaderboard', 'rate my skill before I submit', 'give me judge feedback on this skill'. These are highly natural and varied.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche: specifically targets 'AI Engineer London 2026 Skills Contest' SKILL.md evaluation with the Tessl rubric. The domain-specific terminology (AIE26, SKILL.md, leaderboard, contest) makes it very unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with excellent workflow clarity and progressive disclosure. The 5-phase evaluation pipeline is clearly sequenced with validation gates, and the output format is precisely specified with copy-paste-ready templates. Minor verbosity in scope-limiting statements and some behavioral constraints that Claude could infer prevent a perfect conciseness score.

DimensionReasoningScore

Conciseness

Mostly efficient but includes some unnecessary scaffolding language (e.g., 'Evaluation only — no editing, ranking, or CLI execution' and 'Refuse non-AIE26 skills and contest logistics questions' are behavioral constraints Claude could infer). The phase structure adds overhead but serves organizational purpose.

2 / 3

Actionability

Highly actionable with concrete checklists (structural checks with checkboxes), a specific scoring formula ('round((sum of 8 scores / 24) * 100)'), exact output format templates with table structures, and clear score scales (1-3). Claude knows exactly what to produce.

3 / 3

Workflow Clarity

The 5-phase workflow is clearly sequenced with explicit gates (Phase 2 structural check must pass before Phase 3 scoring), a short-file bypass path, and clear conditional logic ('If any check fails: Report each failure... Do NOT proceed'). Each phase has defined inputs and outputs.

3 / 3

Progressive Disclosure

Appropriately delegates detailed rubric criteria to 'references/scoring-rubric.md' and calibration examples to 'references/example-evaluation.md', keeping the main skill as an overview with clear one-level-deep references. Content is well-split between the overview workflow and reference materials.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

Table of Contents