CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-1/

Evaluate an AIE26 Contest Submission

Problem/Feature Description

A developer has submitted their SKILL.md file to the AI Engineer London 2026 Skills Contest leaderboard. Before the scores are posted publicly, a team member wants a full evaluation using the official judging system to double-check the submission. The skill in question is a SQL query reviewer — a tool designed to help developers catch security and performance issues in their database queries before merging to production.

The team needs the complete evaluation report so they can decide whether to encourage the contestant to revise before final submission or let it stand as-is.

Output Specification

Write the complete evaluation report to a file named evaluation.md in your working directory. This should be the full scorecard with all sections.

Input Files

The following file is provided as input. Extract it before beginning.

=============== FILE: inputs/submission.md ===============

name: sql-query-reviewer description: Reviews SQL queries for security vulnerabilities, performance issues, and best practices. Use when you say "review this SQL", "check my query for problems", "is this query safe to run", "audit my SQL", or "find bugs in this query". Works with SELECT, INSERT, UPDATE, and DELETE statements.

SQL Query Reviewer

You review SQL queries across three dimensions: security, performance, and best practices.

Scope

Review SQL queries only. Do not write new queries, design schemas, or explain general database concepts unrelated to the submitted query.

Workflow

Phase 1 — Parse

Read the submitted query. Extract:

  • Query type (SELECT / INSERT / UPDATE / DELETE)
  • Number of tables referenced
  • Whether a WHERE clause is present
  • Number of JOINs
  • Whether any user-supplied values appear inline (injection risk)

Display: "Reviewing [query-type] on [N] table(s) — [M] JOIN(s), WHERE clause [present/absent]."

Phase 2 — Security Analysis

Check for:

  • SQL injection vectors — Are user-supplied values concatenated directly into the query string rather than parameterized?
  • Privilege escalation — Does the query reference system tables, grant statements, or admin procedures?
  • Data exposure — Does a SELECT * appear in a context where column minimization is important (e.g., APIs, logging)?

Flag each issue as CRITICAL, WARNING, or INFO.

Phase 3 — Performance Analysis

Check for:

  • Missing WHERE clause on a table that is likely large (flag if query touches more than 2 tables without filtering)
  • Subqueries in SELECT lists that could be rewritten as JOINs
  • Missing LIMIT on queries that return potentially unbounded result sets
  • Cartesian products (JOINs with no ON clause)

Flag each as HIGH, MEDIUM, or LOW impact.

Phase 4 — Best Practices

Check for:

  • Explicit column listing vs SELECT * (SELECT * is acceptable in CTEs but not in final SELECT)
  • NULL handling (IS NULL vs = NULL)
  • Consistent alias usage
  • Meaningful table aliases (avoid single-letter aliases for non-obvious tables)

Phase 5 — Report

Output a structured report:

Summary: Found X issues (Y critical, Z warnings, W info)

Security Issues:
[list]

Performance Issues:
[list]

Best Practices:
[list]

Verdict: SAFE / REVIEW NEEDED / CRITICAL

Edge Cases

  • Empty query: Return "No query provided — please paste a SQL statement."
  • Non-SQL input: Return "Input does not appear to be SQL. Supported: SELECT, INSERT, UPDATE, DELETE."
  • Very long queries (>200 lines): Flag that analysis may be incomplete, still proceed.

README.md

SKILL.md

tessl.json

tile.json