Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
A developer has just applied a set of changes to a log-analyzer skill file and wants to make sure nothing was broken in the process. Before they commit the updated skill, they want a thorough quality audit that checks for common mistakes that can sneak in during editing: broken code syntax, invalid commands, dead links, and content that shouldn't be in the skill at all.
Audit the updated SKILL.md below and produce a structured quality report. Check every potential issue category and give a clear pass/fail result for each, with specific details where problems are found.
Produce a file accuracy_check.md containing:
The following files are provided as inputs. Extract them before beginning.
Parse and analyze application logs using our internal logq tool.
Find errors in the last hour:
logq query --source app-logs --level error --since 1h --format jsonCount error types:
import subprocess
import json
def count_errors(log_source: str, window: str = "1h") -> dict:
result = subprocess.run(
["logq", "query", "--source", log_source, "--level", "error", "--since", window, "--format", "json"],
capture_output=True, text=True
)
logs = json.loads(result.stdout
error_counts = {}
for entry in logs:
err_type = entry.get("error_type", "unknown")
error_counts[err_type] = error_counts.get(err_type, 0) + 1
return error_countsExport results to CSV:
const { exec } = require('child_process');
const fs = require('fs');
function exportLogs(source, outputFile) {
exec(`logq export --source ${source} --format csv --output ${outputFile}`, (err, stdout, stderr) => {
if (err) throw err;
console.log('Export complete:', outputFile);
});
}For more query patterns and aggregation options see QUERY_PATTERNS.md.
Use logq stats to get percentile breakdowns:
logq stats --source app-logs --metric response_time --percentiles 50,95,99 --since 24hNote: HTTP is a stateless request/response protocol. Each log entry represents one request-response cycle. TCP connections may be reused. =============== END FILE ===============
=============== FILE: skills/log-analyzer/QUERY_PATTERNS.md ===============
Advanced logq query patterns and aggregations.
logq aggregate --group-by error_type --countlogq aggregate --group-by service --sum response_time--since 1h — last 1 hour--since 24h — last 24 hours--since 7d — last 7 days
=============== END FILE ===============evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions