CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/audit-logs

Collect and normalize agent logs, discover installed verifiers, and dispatch LLM judges to evaluate adherence. Produces per-session verdicts and aggregated reports.

91

3.09x
Quality

90%

Does it follow best practices?

Impact

96%

3.09x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-2/

{
  "context": "Tests whether the agent follows the audit-logs skill's scripting patterns: finding SCRIPTS_PATH using the correct find command pattern, invoking scripts with uv run python3, using --max-sessions to limit sessions, and using --search before auditing for targeted investigations.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "SCRIPTS_PATH find pattern",
      "description": "audit-recent.sh uses a find command to locate SCRIPTS_PATH that searches both $(pwd)/.tessl/tiles AND $HOME/.tessl/tiles (both paths must be present)",
      "max_score": 15
    },
    {
      "name": "Correct script path in find",
      "description": "The find command in SCRIPTS_PATH lookup searches for run_pipeline.py specifically (e.g. -path '*/audit-logs/scripts/run_pipeline.py' or similar), and strips the filename to get the directory",
      "max_score": 10
    },
    {
      "name": "uv run python3 invocation",
      "description": "Both scripts invoke pipeline scripts using `uv run python3 \"$SCRIPTS_PATH/...\"` — not bare `python3`, `python`, or `$SCRIPTS_PATH/script.py` directly",
      "max_score": 15
    },
    {
      "name": "SCRIPTS_PATH empty check",
      "description": "audit-recent.sh checks whether SCRIPTS_PATH is empty (e.g. if [ -z \"$SCRIPTS_PATH\" ]) before attempting to run the pipeline",
      "max_score": 10
    },
    {
      "name": "max-sessions flag",
      "description": "audit-recent.sh passes --max-sessions to run_pipeline.py (value of 2 or similar small number to account for the current session)",
      "max_score": 15
    },
    {
      "name": "search-before-audit pattern",
      "description": "audit-search.sh uses --search (or a search script) to find matching sessions before running the pipeline on specific sessions",
      "max_score": 15
    },
    {
      "name": "Small batch for search results",
      "description": "audit-search.sh audits only a small number of search matches (2-3 sessions), not all results",
      "max_score": 10
    },
    {
      "name": "No auto Phase 2 progression",
      "description": "Neither script automatically invokes Phase 2 or Phase 3 commands — scripts stop after Phase 1",
      "max_score": 10
    }
  ]
}

evals

tile.json