Analyze agent sessions against verifier checklists, detect friction points, and create structured verifiers from skills and docs. Produces per-session verdicts and aggregated quality reports.
88
86%
Does it follow best practices?
Impact
97%
2.93xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent follows the analyze-sessions skill's scripting patterns: finding SCRIPTS_PATH using the correct find command pattern, invoking scripts with python3, using --max-sessions to limit sessions, and using --search before analyzing for targeted investigations.",
"type": "weighted_checklist",
"checklist": [
{
"name": "SCRIPTS_PATH find pattern",
"description": "analyze-recent.sh uses a find command to locate SCRIPTS_PATH that searches both $(pwd)/.tessl/tiles AND $HOME/.tessl/tiles (both paths must be present)",
"max_score": 15
},
{
"name": "Correct script path in find",
"description": "The find command in SCRIPTS_PATH lookup searches for run_pipeline.py specifically (e.g. -path '*/agent-quality/skills/analyze-sessions/scripts/run_pipeline.py' or similar), and strips the filename to get the directory",
"max_score": 10
},
{
"name": "python3 invocation",
"description": "Both scripts invoke pipeline scripts using `python3 \"$SCRIPTS_PATH/...\"` — not bare `python`, `$SCRIPTS_PATH/script.py` directly, or `uv run python3`",
"max_score": 15
},
{
"name": "SCRIPTS_PATH empty check",
"description": "analyze-recent.sh checks whether SCRIPTS_PATH is empty (e.g. if [ -z \"$SCRIPTS_PATH\" ]) before attempting to run the pipeline",
"max_score": 10
},
{
"name": "max-sessions flag",
"description": "analyze-recent.sh passes --max-sessions to run_pipeline.py (value of 2 or similar small number to account for the current session)",
"max_score": 15
},
{
"name": "search-before-analyze pattern",
"description": "analyze-search.sh uses --search (or a search script) to find matching sessions before running the pipeline on specific sessions",
"max_score": 15
},
{
"name": "Small batch for search results",
"description": "analyze-search.sh analyzes only a small number of search matches (2-3 sessions), not all results",
"max_score": 10
},
{
"name": "No auto Phase 2 progression",
"description": "Neither script automatically invokes Phase 2 or Phase 3 commands — scripts stop after Phase 1",
"max_score": 10
}
]
}