paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

1.80x

Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

Securityby

Risky

Do not use without reviewing

Score This Contest Submission

Name: paker-it/aie26-skill-judge
Rating: 82.39999999999999 (1 reviews)
Author: paker-it

Problem/Feature Description

A team member has drafted a SKILL.md for the AI Engineer London 2026 Skills Contest and wants it evaluated before submitting to the leaderboard. They've been iterating on the content for a while and are feeling confident about it, but want an official judge review first.

The skill is an elaborate log-analysis assistant with extensive documentation they wrote. Please run it through the judging process and provide the evaluation.

Output Specification

Write the evaluation output to evaluation.md in your working directory.

Input Files

The following file is provided as input. Extract it before beginning.

=============== FILE: inputs/submission.md ===============

name: log-analyzer description: A sophisticated log analysis system that processes application logs, system logs, and structured event streams to identify anomalies, extract patterns, and surface actionable insights using advanced parsing techniques and machine learning-inspired heuristics.

Log Analyzer

You analyze log files and event streams to identify patterns, anomalies, and actionable insights.

Overview

This skill provides comprehensive log analysis capabilities for engineering teams dealing with production systems. It handles a wide variety of log formats and can correlate events across multiple sources to identify root causes of incidents.

Supported Log Formats

The following log formats are supported:

Apache/Nginx access logs (Combined Log Format)
Syslog (RFC 3164 and RFC 5424)
JSON structured logs
AWS CloudTrail events
Kubernetes event logs
Docker container logs
Java stack traces
Python tracebacks
Node.js error dumps
Custom delimited formats

Analysis Dimensions

Anomaly Detection

Anomaly detection operates by establishing baselines from the first N lines of a log file and then flagging deviations. The following anomaly types are detected:

Frequency spikes — A particular event type occurring at more than 3x its baseline rate
Error rate increases — Error-level events rising above 5% of total events in a rolling window
New event types — Event signatures not seen in the baseline period
Temporal gaps — Periods of expected activity where no events are logged
Cascading failures — Chains of errors that follow a common root cause within a 60-second window

Pattern Extraction

Pattern extraction uses template-based clustering to group similar log lines:

Tokenize each line into fixed and variable parts
Cluster lines with matching fixed-part templates
Report the top-N most frequent templates and their variable value distributions
Highlight templates that appear only once (potential one-off errors)

Correlation Analysis

When multiple log sources are provided, correlation analysis links events across sources:

Align events by timestamp (with configurable timezone offset)
Match request IDs, trace IDs, or user IDs across sources
Build a causality graph: event A in source 1 → event B in source 2 within T seconds
Summarize the most common causal chains

Workflow

Phase 1 — Ingest

Accept the log input as:

A file path (will be read from disk)
A raw paste (detect format automatically)
A directory path (will process all .log files recursively)

Extract metadata:

Total line count
Detected format
Time range covered (first and last timestamp)
Source system (if determinable)

Display: "Ingested [N] lines — [format] format, [time range], source: [system or unknown]."

Phase 2 — Preprocess

Normalize timestamps to UTC. Parse each line into:

Timestamp
Severity level (DEBUG, INFO, WARN, ERROR, FATAL, UNKNOWN)
Source component (service name, hostname, etc.)
Message body
Structured fields (if JSON)

Report: "Parsed [N] lines successfully. [M] lines could not be parsed (see parse-errors.txt)."

Phase 3 — Anomaly Detection

Apply the anomaly detection algorithms described above. For each detected anomaly:

Report the anomaly type
Show the baseline vs. observed value
Quote the most representative log line(s)
Suggest a likely cause (1 sentence)

Phase 4 — Pattern Extraction

Apply template clustering. Report:

Top 10 most frequent templates with counts
Top 5 templates appearing only once
Any templates matching known error signatures (OOM, connection refused, timeout)

Phase 5 — Correlation

If multiple sources are provided, run correlation analysis. Report:

Top 5 causal chains discovered
Any unmatched events that appear to be orphaned (no corresponding events in other sources)

Phase 6 — Summary Report

Produce a structured markdown report with:

Executive summary (3-5 sentences)
Anomalies section (sorted by severity)
Patterns section
Correlation section (if applicable)
Recommended next steps (3 bullet points)

Configuration Options

The following configuration options can be provided by the user:

Option	Default	Description
baseline_lines	1000	Number of lines used to establish baseline
frequency_threshold	3.0	Multiplier above baseline to flag as spike
error_rate_threshold	0.05	Fraction of events that must be errors to flag
window_size	60	Rolling window size in seconds for rate calculations
max_templates	10	Number of top templates to report
cascade_window	60	Seconds within which to look for cascading failures
timezone_offset	0	Offset in hours from UTC for correlation

Edge Cases

Malformed Lines

Lines that cannot be parsed are written to parse-errors.txt and counted separately. Parsing failures above 20% of total lines trigger a warning: "High parse failure rate ([N]%) — verify log format."

Very Large Files

Files above 100MB are processed in streaming mode. Pattern extraction is limited to the first 50,000 lines. A notice is displayed: "Large file detected — pattern extraction limited to first 50,000 lines."

Empty Input

If no log data is provided, return: "No log data received. Please paste log lines or provide a file path."

Single-Line Input

If only one line is provided, skip anomaly detection (insufficient baseline) and report: "Single-line input — anomaly detection requires at least 100 lines."

Binary Files

If the input appears to be binary (>10% non-printable characters), return: "Input appears to be a binary file. Log Analyzer requires text-format logs."

Limitations

Does not execute system commands or access external log aggregation services
Cannot read from remote hosts directly (S3, GCS, Splunk) — paste the content instead
Does not support real-time streaming — works on snapshots only
Correlation analysis requires at least 2 log sources with overlapping timestamps

Examples

Example 1: Simple Error Detection

Input:

2024-01-15 10:00:01 INFO  service=api msg="Request received" path=/users method=GET
2024-01-15 10:00:02 INFO  service=api msg="Request received" path=/orders method=POST
2024-01-15 10:00:03 ERROR service=api msg="Database connection failed" host=db-01
2024-01-15 10:00:04 ERROR service=api msg="Database connection failed" host=db-01
2024-01-15 10:00:05 ERROR service=api msg="Database connection failed" host=db-01

Expected output:

Anomaly detected: ERROR frequency spike (baseline: 0%, observed: 60%)
Pattern: "Database connection failed" template appears 3 times
Suggested cause: Database host db-01 is unreachable

Example 2: Multi-Source Correlation

When two log sources are provided (API logs + database logs), the correlation analysis will attempt to match:

API request IDs to database query IDs
API error timestamps to database error timestamps
API response times to database query durations

Performance Notes

For typical engineering use cases (application logs under 10MB), full analysis completes in under 30 seconds. For larger files, streaming mode may take several minutes depending on the complexity of the patterns.

The template clustering algorithm is O(N log N) in the number of unique templates, so very diverse logs (e.g., free-text messages with no structure) may take longer to process.

Output Format

All output is in Markdown format. Structured data (anomaly tables, template lists) uses Markdown tables. Code blocks are used for log line examples.

The final report is written to analysis-report.md unless a different output path is specified by the user.

docs

evals

scenario-1

scenario-2

criteria.json

task.md

scenario-3

scenario-4

scenario-5

references

paker-it/aie26-skill-judge

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

Score This Contest Submission

Problem/Feature Description

Output Specification

Input Files

=============== FILE: inputs/submission.md ===============

name: log-analyzer description: A sophisticated log analysis system that processes application logs, system logs, and structured event streams to identify anomalies, extract patterns, and surface actionable insights using advanced parsing techniques and machine learning-inspired heuristics.

Log Analyzer

Overview

Supported Log Formats

Analysis Dimensions

Anomaly Detection

Pattern Extraction

Correlation Analysis

Workflow

Phase 1 — Ingest

Phase 2 — Preprocess

Phase 3 — Anomaly Detection

Phase 4 — Pattern Extraction

Phase 5 — Correlation

Phase 6 — Summary Report

Configuration Options

Edge Cases

Malformed Lines

Very Large Files

Empty Input

Single-Line Input

Binary Files

Limitations

Examples

Example 1: Simple Error Detection

Example 2: Multi-Source Correlation

Performance Notes

Output Format

task.mdevals/scenario-2/