CtrlK
BlogDocsLog inGet started
Tessl Logo

paker-it/aie26-skill-judge

Evaluates SKILL.md submissions for the AI Engineer London 2026 Skills Contest across 11 dimensions (8 official Tessl rubric + 3 bonus). Use when you say 'judge my AIE26 contest skill', 'score this SKILL.md for the contest', 'review my skill submission', or 'how would this score on the leaderboard'. Accepts GitHub repo URLs, file paths, or raw pastes.

82

1.80x
Quality

94%

Does it follow best practices?

Impact

65%

1.80x

Average score across 5 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-2/

Score This Contest Submission

Problem/Feature Description

A team member has drafted a SKILL.md for the AI Engineer London 2026 Skills Contest and wants it evaluated before submitting to the leaderboard. They've been iterating on the content for a while and are feeling confident about it, but want an official judge review first.

The skill is an elaborate log-analysis assistant with extensive documentation they wrote. Please run it through the judging process and provide the evaluation.

Output Specification

Write the evaluation output to evaluation.md in your working directory.

Input Files

The following file is provided as input. Extract it before beginning.

=============== FILE: inputs/submission.md ===============

name: log-analyzer description: A sophisticated log analysis system that processes application logs, system logs, and structured event streams to identify anomalies, extract patterns, and surface actionable insights using advanced parsing techniques and machine learning-inspired heuristics.

Log Analyzer

You analyze log files and event streams to identify patterns, anomalies, and actionable insights.

Overview

This skill provides comprehensive log analysis capabilities for engineering teams dealing with production systems. It handles a wide variety of log formats and can correlate events across multiple sources to identify root causes of incidents.

Supported Log Formats

The following log formats are supported:

  • Apache/Nginx access logs (Combined Log Format)
  • Syslog (RFC 3164 and RFC 5424)
  • JSON structured logs
  • AWS CloudTrail events
  • Kubernetes event logs
  • Docker container logs
  • Java stack traces
  • Python tracebacks
  • Node.js error dumps
  • Custom delimited formats

Analysis Dimensions

Anomaly Detection

Anomaly detection operates by establishing baselines from the first N lines of a log file and then flagging deviations. The following anomaly types are detected:

  • Frequency spikes — A particular event type occurring at more than 3x its baseline rate
  • Error rate increases — Error-level events rising above 5% of total events in a rolling window
  • New event types — Event signatures not seen in the baseline period
  • Temporal gaps — Periods of expected activity where no events are logged
  • Cascading failures — Chains of errors that follow a common root cause within a 60-second window

Pattern Extraction

Pattern extraction uses template-based clustering to group similar log lines:

  1. Tokenize each line into fixed and variable parts
  2. Cluster lines with matching fixed-part templates
  3. Report the top-N most frequent templates and their variable value distributions
  4. Highlight templates that appear only once (potential one-off errors)

Correlation Analysis

When multiple log sources are provided, correlation analysis links events across sources:

  • Align events by timestamp (with configurable timezone offset)
  • Match request IDs, trace IDs, or user IDs across sources
  • Build a causality graph: event A in source 1 → event B in source 2 within T seconds
  • Summarize the most common causal chains

Workflow

Phase 1 — Ingest

Accept the log input as:

  • A file path (will be read from disk)
  • A raw paste (detect format automatically)
  • A directory path (will process all .log files recursively)

Extract metadata:

  • Total line count
  • Detected format
  • Time range covered (first and last timestamp)
  • Source system (if determinable)

Display: "Ingested [N] lines — [format] format, [time range], source: [system or unknown]."

Phase 2 — Preprocess

Normalize timestamps to UTC. Parse each line into:

  • Timestamp
  • Severity level (DEBUG, INFO, WARN, ERROR, FATAL, UNKNOWN)
  • Source component (service name, hostname, etc.)
  • Message body
  • Structured fields (if JSON)

Report: "Parsed [N] lines successfully. [M] lines could not be parsed (see parse-errors.txt)."

Phase 3 — Anomaly Detection

Apply the anomaly detection algorithms described above. For each detected anomaly:

  • Report the anomaly type
  • Show the baseline vs. observed value
  • Quote the most representative log line(s)
  • Suggest a likely cause (1 sentence)

Phase 4 — Pattern Extraction

Apply template clustering. Report:

  • Top 10 most frequent templates with counts
  • Top 5 templates appearing only once
  • Any templates matching known error signatures (OOM, connection refused, timeout)

Phase 5 — Correlation

If multiple sources are provided, run correlation analysis. Report:

  • Top 5 causal chains discovered
  • Any unmatched events that appear to be orphaned (no corresponding events in other sources)

Phase 6 — Summary Report

Produce a structured markdown report with:

  • Executive summary (3-5 sentences)
  • Anomalies section (sorted by severity)
  • Patterns section
  • Correlation section (if applicable)
  • Recommended next steps (3 bullet points)

Configuration Options

The following configuration options can be provided by the user:

OptionDefaultDescription
baseline_lines1000Number of lines used to establish baseline
frequency_threshold3.0Multiplier above baseline to flag as spike
error_rate_threshold0.05Fraction of events that must be errors to flag
window_size60Rolling window size in seconds for rate calculations
max_templates10Number of top templates to report
cascade_window60Seconds within which to look for cascading failures
timezone_offset0Offset in hours from UTC for correlation

Edge Cases

Malformed Lines

Lines that cannot be parsed are written to parse-errors.txt and counted separately. Parsing failures above 20% of total lines trigger a warning: "High parse failure rate ([N]%) — verify log format."

Very Large Files

Files above 100MB are processed in streaming mode. Pattern extraction is limited to the first 50,000 lines. A notice is displayed: "Large file detected — pattern extraction limited to first 50,000 lines."

Empty Input

If no log data is provided, return: "No log data received. Please paste log lines or provide a file path."

Single-Line Input

If only one line is provided, skip anomaly detection (insufficient baseline) and report: "Single-line input — anomaly detection requires at least 100 lines."

Binary Files

If the input appears to be binary (>10% non-printable characters), return: "Input appears to be a binary file. Log Analyzer requires text-format logs."

Limitations

  • Does not execute system commands or access external log aggregation services
  • Cannot read from remote hosts directly (S3, GCS, Splunk) — paste the content instead
  • Does not support real-time streaming — works on snapshots only
  • Correlation analysis requires at least 2 log sources with overlapping timestamps

Examples

Example 1: Simple Error Detection

Input:

2024-01-15 10:00:01 INFO  service=api msg="Request received" path=/users method=GET
2024-01-15 10:00:02 INFO  service=api msg="Request received" path=/orders method=POST
2024-01-15 10:00:03 ERROR service=api msg="Database connection failed" host=db-01
2024-01-15 10:00:04 ERROR service=api msg="Database connection failed" host=db-01
2024-01-15 10:00:05 ERROR service=api msg="Database connection failed" host=db-01

Expected output:

  • Anomaly detected: ERROR frequency spike (baseline: 0%, observed: 60%)
  • Pattern: "Database connection failed" template appears 3 times
  • Suggested cause: Database host db-01 is unreachable

Example 2: Multi-Source Correlation

When two log sources are provided (API logs + database logs), the correlation analysis will attempt to match:

  • API request IDs to database query IDs
  • API error timestamps to database error timestamps
  • API response times to database query durations

Performance Notes

For typical engineering use cases (application logs under 10MB), full analysis completes in under 30 seconds. For larger files, streaming mode may take several minutes depending on the complexity of the patterns.

The template clustering algorithm is O(N log N) in the number of unique templates, so very diverse logs (e.g., free-text messages with no structure) may take longer to process.

Output Format

All output is in Markdown format. Structured data (anomaly tables, template lists) uses Markdown tables. Code blocks are used for log line examples.

The final report is written to analysis-report.md unless a different output path is specified by the user.

README.md

SKILL.md

tessl.json

tile.json