Collect and normalize agent logs, discover installed verifiers, and dispatch LLM judges to evaluate adherence. Produces per-session verdicts and aggregated reports.
91
90%
Does it follow best practices?
Impact
96%
3.09xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent follows the audit-logs skill's scripting patterns: finding SCRIPTS_PATH using the correct find command pattern, invoking scripts with uv run python3, using --max-sessions to limit sessions, and using --search before auditing for targeted investigations.",
"type": "weighted_checklist",
"checklist": [
{
"name": "SCRIPTS_PATH find pattern",
"description": "audit-recent.sh uses a find command to locate SCRIPTS_PATH that searches both $(pwd)/.tessl/tiles AND $HOME/.tessl/tiles (both paths must be present)",
"max_score": 15
},
{
"name": "Correct script path in find",
"description": "The find command in SCRIPTS_PATH lookup searches for run_pipeline.py specifically (e.g. -path '*/audit-logs/scripts/run_pipeline.py' or similar), and strips the filename to get the directory",
"max_score": 10
},
{
"name": "uv run python3 invocation",
"description": "Both scripts invoke pipeline scripts using `uv run python3 \"$SCRIPTS_PATH/...\"` — not bare `python3`, `python`, or `$SCRIPTS_PATH/script.py` directly",
"max_score": 15
},
{
"name": "SCRIPTS_PATH empty check",
"description": "audit-recent.sh checks whether SCRIPTS_PATH is empty (e.g. if [ -z \"$SCRIPTS_PATH\" ]) before attempting to run the pipeline",
"max_score": 10
},
{
"name": "max-sessions flag",
"description": "audit-recent.sh passes --max-sessions to run_pipeline.py (value of 2 or similar small number to account for the current session)",
"max_score": 15
},
{
"name": "search-before-audit pattern",
"description": "audit-search.sh uses --search (or a search script) to find matching sessions before running the pipeline on specific sessions",
"max_score": 15
},
{
"name": "Small batch for search results",
"description": "audit-search.sh audits only a small number of search matches (2-3 sessions), not all results",
"max_score": 10
},
{
"name": "No auto Phase 2 progression",
"description": "Neither script automatically invokes Phase 2 or Phase 3 commands — scripts stop after Phase 1",
"max_score": 10
}
]
}