Analyze agent sessions against verifier checklists, detect friction points, and create structured verifiers from skills and docs. Produces per-session verdicts and aggregated quality reports.
88
86%
Does it follow best practices?
Impact
97%
2.93xAverage score across 3 eval scenarios
Passed
No known issues
Read this when debugging or understanding how the analysis pipeline works internally.
verifiers/ directory installed in .tessl/tiles/ or ~/.tessl/tiles/claude CLI installed and authenticated (judges are dispatched via claude -p --model haiku)If no verifiers are found, suggest the user create some:
No verifiers found in any installed tiles. Let's create some — which skills or rules would you like to check agent behavior against?
Options:
- A specific skill — pick an installed tile and I'll extract verifiers from its SKILL.md
- Project rules — extract from your CLAUDE.md, AGENTS.md, or .cursor/rules/
- Something specific — tell me what you want to track and I'll create verifiers for it
To create verifiers, I'll use the
create-verifiersskill included in this tile.
Then activate the create-verifiers skill to walk the user through verifier creation. New verifier tiles should be created outside .tessl/ (e.g. in tiles/) and installed as a local file source:
tessl tile new --name <workspace>/my-verifiers --path tiles/my-verifiers --workspace <workspace>
# ... create verifiers with the create-verifiers skill ...
tessl install file:tiles/my-verifiers --watch-localverifiers/ directories — root, skill subdirectories, anywhere in the tile)claude -p to evaluate each session against each verifierWith --friction, steps 5-6 also run friction reviewers in parallel:
All raw log content is passed through redact_secrets() in normalize_logs.py before any further processing. This strips API keys, bearer tokens, AWS credentials, Stripe keys, and other common secret patterns. Prepared transcripts and judge inputs only ever see redacted content.
Session transcripts contain untrusted content — tool outputs, web page text, user messages, and other data from prior agent sessions. This content is passed to LLM judges for evaluation, creating a theoretical indirect prompt injection surface. Mitigations:
review_session.py wraps transcripts in <transcript> tags and explicitly instructs the judge that the content is data to evaluate, not instructions to follow, and to ignore any embedded instructions or prompt overrides.claude -p with no tool access. Even if an injection influenced the judge's reasoning, it cannot escalate beyond the verdict output.--sessions filter)claude -p --model haiku (called by dispatch_judges.py)All analysis data is stored in ~/.tessl/session-analyses/<project-slug>/:
~/.tessl/session-analyses/-Users-amy-dev-myproject/
├── raw/ # Collected raw logs
│ ├── claude-code/
│ ├── codex/
│ ├── gemini/
│ ├── cursor-ide/
│ └── cursor-agent/
├── normalized/ # NormalizedEvent JSONL
│ └── <agent>/
├── runs/
│ └── <timestamp>/ # Each analysis run
│ ├── manifest.json # What was analyzed (tiles, verifiers, agents, sessions)
│ ├── prepared/ # Condensed transcripts
│ │ └── <agent>/
│ ├── verdicts/ # Per-session judge verdicts
│ │ └── <tile>/ # Namespaced by tile
│ │ └── <agent>/
│ ├── verdicts-aggregate.json
│ ├── friction/ # Per-session friction reviews (with --friction)
│ │ └── <agent>/
│ ├── friction-summary.json # Aggregated friction data
│ └── synthesis.json # Combined verifier + friction findings
└── latest -> runs/<timestamp>/ # Symlink to most recent runThe project slug uses the same dash-encoding as tessl's agent-logs cache (e.g. /Users/amy/dev/myproject -> -Users-amy-dev-myproject).
When --project-dir is passed multiple paths (e.g. worktrees or separate checkouts), each path gets its own analysis directory with independent raw/, normalized/, and runs/<timestamp>/ trees. The merge scripts (merge_verdicts.py, merge_friction.py) accept multiple --dir values and aggregate verdicts from all of them. The aggregated output (verdicts-aggregate.json, friction-summary.json, synthesis.json) is written to the primary (first) path's run directory.