Name: try-tessl/agent-quality
Rating: 88.64999999999999 (1 reviews)
Author: try-tessl

try-tessl/agent-quality

Analyze agent sessions against verifier checklists, detect friction points, and create structured verifiers from skills and docs. Produces per-session verdicts and aggregated quality reports.

2.93x

Quality

86%

Does it follow best practices?

Impact

97%

2.93x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent follows the analyze-sessions skill's scripting patterns: finding SCRIPTS_PATH using the correct find command pattern, invoking scripts with python3, using --max-sessions to limit sessions, and using --search before analyzing for targeted investigations.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "SCRIPTS_PATH find pattern",
      "description": "analyze-recent.sh uses a find command to locate SCRIPTS_PATH that searches both $(pwd)/.tessl/tiles AND $HOME/.tessl/tiles (both paths must be present)",
      "max_score": 15
    },
    {
      "name": "Correct script path in find",
      "description": "The find command in SCRIPTS_PATH lookup searches for run_pipeline.py specifically (e.g. -path '*/agent-quality/skills/analyze-sessions/scripts/run_pipeline.py' or similar), and strips the filename to get the directory",
      "max_score": 10
    },
    {
      "name": "python3 invocation",
      "description": "Both scripts invoke pipeline scripts using `python3 \"$SCRIPTS_PATH/...\"` — not bare `python`, `$SCRIPTS_PATH/script.py` directly, or `uv run python3`",
      "max_score": 15
    },
    {
      "name": "SCRIPTS_PATH empty check",
      "description": "analyze-recent.sh checks whether SCRIPTS_PATH is empty (e.g. if [ -z \"$SCRIPTS_PATH\" ]) before attempting to run the pipeline",
      "max_score": 10
    },
    {
      "name": "max-sessions flag",
      "description": "analyze-recent.sh passes --max-sessions to run_pipeline.py (value of 2 or similar small number to account for the current session)",
      "max_score": 15
    },
    {
      "name": "search-before-analyze pattern",
      "description": "analyze-search.sh uses --search (or a search script) to find matching sessions before running the pipeline on specific sessions",
      "max_score": 15
    },
    {
      "name": "Small batch for search results",
      "description": "analyze-search.sh analyzes only a small number of search matches (2-3 sessions), not all results",
      "max_score": 10
    },
    {
      "name": "No auto Phase 2 progression",
      "description": "Neither script automatically invokes Phase 2 or Phase 3 commands — scripts stop after Phase 1",
      "max_score": 10
    }
  ]
}

try-tessl/agent-quality

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

criteria.jsonevals/scenario-2/