Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.
82
94%
Does it follow best practices?
Impact
65%
1.80xAverage score across 5 eval scenarios
Risky
Do not use without reviewing
A team member has drafted a SKILL.md for the AI Engineer London 2026 Skills Contest and wants it evaluated before submitting to the leaderboard. They've been iterating on the content for a while and are feeling confident about it, but want an official judge review first.
The skill is an elaborate log-analysis assistant with extensive documentation they wrote. Please run it through the judging process and provide the evaluation.
Write the evaluation output to evaluation.md in your working directory.
The following file is provided as input. Extract it before beginning.
You analyze log files and event streams to identify patterns, anomalies, and actionable insights.
This skill provides comprehensive log analysis capabilities for engineering teams dealing with production systems. It handles a wide variety of log formats and can correlate events across multiple sources to identify root causes of incidents.
The following log formats are supported:
Anomaly detection operates by establishing baselines from the first N lines of a log file and then flagging deviations. The following anomaly types are detected:
Pattern extraction uses template-based clustering to group similar log lines:
When multiple log sources are provided, correlation analysis links events across sources:
Accept the log input as:
Extract metadata:
Display: "Ingested [N] lines — [format] format, [time range], source: [system or unknown]."
Normalize timestamps to UTC. Parse each line into:
Report: "Parsed [N] lines successfully. [M] lines could not be parsed (see parse-errors.txt)."
Apply the anomaly detection algorithms described above. For each detected anomaly:
Apply template clustering. Report:
If multiple sources are provided, run correlation analysis. Report:
Produce a structured markdown report with:
The following configuration options can be provided by the user:
| Option | Default | Description |
|---|---|---|
| baseline_lines | 1000 | Number of lines used to establish baseline |
| frequency_threshold | 3.0 | Multiplier above baseline to flag as spike |
| error_rate_threshold | 0.05 | Fraction of events that must be errors to flag |
| window_size | 60 | Rolling window size in seconds for rate calculations |
| max_templates | 10 | Number of top templates to report |
| cascade_window | 60 | Seconds within which to look for cascading failures |
| timezone_offset | 0 | Offset in hours from UTC for correlation |
Lines that cannot be parsed are written to parse-errors.txt and counted separately. Parsing failures above 20% of total lines trigger a warning: "High parse failure rate ([N]%) — verify log format."
Files above 100MB are processed in streaming mode. Pattern extraction is limited to the first 50,000 lines. A notice is displayed: "Large file detected — pattern extraction limited to first 50,000 lines."
If no log data is provided, return: "No log data received. Please paste log lines or provide a file path."
If only one line is provided, skip anomaly detection (insufficient baseline) and report: "Single-line input — anomaly detection requires at least 100 lines."
If the input appears to be binary (>10% non-printable characters), return: "Input appears to be a binary file. Log Analyzer requires text-format logs."
Input:
2024-01-15 10:00:01 INFO service=api msg="Request received" path=/users method=GET
2024-01-15 10:00:02 INFO service=api msg="Request received" path=/orders method=POST
2024-01-15 10:00:03 ERROR service=api msg="Database connection failed" host=db-01
2024-01-15 10:00:04 ERROR service=api msg="Database connection failed" host=db-01
2024-01-15 10:00:05 ERROR service=api msg="Database connection failed" host=db-01Expected output:
When two log sources are provided (API logs + database logs), the correlation analysis will attempt to match:
For typical engineering use cases (application logs under 10MB), full analysis completes in under 30 seconds. For larger files, streaming mode may take several minutes depending on the complexity of the patterns.
The template clustering algorithm is O(N log N) in the number of unique templates, so very diverse logs (e.g., free-text messages with no structure) may take longer to process.
All output is in Markdown format. Structured data (anomaly tables, template lists) uses Markdown tables. Code blocks are used for log line examples.
The final report is written to analysis-report.md unless a different output path is specified by the user.
docs
superpowers
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
references