CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-5/

Evaluate My Skill — But First, Show Me What Good Looks Like

Problem/Feature Description

A developer is submitting their first entry to the AI Engineer London 2026 Skills Contest. Before they get a score, they want to understand what a strong evaluation looks like — they've seen some contest results online but aren't sure what separates a 70 from a 100. They want to see an example of a top-scoring evaluation, and then get a full evaluation of their own skill.

Their skill is a PR reviewer — a tool that reviews GitHub pull requests for code quality and security issues. They've been using it internally and think it's pretty solid, but they're not sure how it will hold up against the official rubric.

Output Specification

Write your full response to response.md in your working directory. Include:

  1. Your answer to their calibration question
  2. The complete evaluation of their SKILL.md

Input Files

The following file is provided as input. Extract it before beginning.

=============== FILE: inputs/submission.md ===============

name: pr-reviewer description: Reviews GitHub pull requests for code quality, security vulnerabilities, and completeness. Use when you say "review this PR", "check my pull request", "give me PR feedback", "audit this code change", or "is this PR ready to merge". Works with GitHub PR URLs and raw diffs.

PR Reviewer

You review GitHub pull requests systematically across three dimensions: code quality, security, and completeness.

Scope

Code review only. Do not merge, approve, or comment via the GitHub API. Do not explain git or GitHub concepts.

Workflow

Phase 1 — Ingest

Accept input as:

  • A GitHub PR URL (fetch the diff)
  • A raw diff pasted directly

Extract:

  • Number of files changed
  • Lines added and removed
  • Primary language(s)
  • Whether tests are included in the diff

Display: "Reviewing PR — [N] files changed, +[X]/-[Y] lines, language: [lang]."

Phase 2 — Security Scan

Check for:

  • Hardcoded secrets or credentials (API keys, passwords, tokens)
  • Injection vulnerabilities (SQL, shell, template)
  • Insecure dependencies (newly added packages with known CVEs)
  • Sensitive data logged or exposed in error messages

Flag each finding as CRITICAL, HIGH, or MEDIUM.

Phase 3 — Code Quality

Check for:

  • Function/variable naming clarity
  • Cyclomatic complexity (flag functions over ~10 branches)
  • Missing or inadequate error handling
  • Commented-out code left in
  • Test coverage for new logic (flag if tests are absent)

Phase 4 — Completeness

Check for:

  • PR description present and meaningful
  • Breaking changes documented
  • Migration steps included (if schema or API changes)
  • Related issues linked

Phase 5 — Report

Output:

## PR Review Summary

**Files changed:** N | **Lines:** +X/-Y

### Security (X issues)
[findings by severity]

### Code Quality (X issues)
[findings]

### Completeness (X issues)
[findings]

### Recommendation
APPROVE / REQUEST CHANGES / NEEDS DISCUSSION

**Rationale:** [1-2 sentences]

Edge Cases

  • No diff provided: Ask the user to paste the diff or provide a PR URL.
  • Very large PR (>50 files): Flag as high-risk and limit detailed review to files with security findings.
  • Non-code files only: Skip code quality phase, focus on completeness.

README.md

SKILL.md

tessl.json

tile.json