CtrlK
BlogDocsLog inGet started
Tessl Logo

try-tessl/agent-quality

Analyze agent sessions against verifier checklists, detect friction points, and create structured verifiers from skills and docs. Produces per-session verdicts and aggregated quality reports.

88

2.93x
Quality

86%

Does it follow best practices?

Impact

97%

2.93x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

pipeline-reference.mdskills/analyze-sessions/references/

Pipeline Reference

Read this when debugging or understanding how the analysis pipeline works internally.

Prerequisites

  • At least one tile with a verifiers/ directory installed in .tessl/tiles/ or ~/.tessl/tiles/
  • Agent logs to analyze (from Claude Code, Codex, Gemini, or Cursor)
  • claude CLI installed and authenticated (judges are dispatched via claude -p --model haiku)

If no verifiers are found, suggest the user create some:

No verifiers found in any installed tiles. Let's create some — which skills or rules would you like to check agent behavior against?

Options:

  1. A specific skill — pick an installed tile and I'll extract verifiers from its SKILL.md
  2. Project rules — extract from your CLAUDE.md, AGENTS.md, or .cursor/rules/
  3. Something specific — tell me what you want to track and I'll create verifiers for it

To create verifiers, I'll use the create-verifiers skill included in this tile.

Then activate the create-verifiers skill to walk the user through verifier creation. New verifier tiles should be created outside .tessl/ (e.g. in tiles/) and installed as a local file source:

tessl tile new --name <workspace>/my-verifiers --path tiles/my-verifiers --workspace <workspace>
# ... create verifiers with the create-verifiers skill ...
tessl install file:tiles/my-verifiers --watch-local

How the Pipeline Works

  1. Collects raw logs from coding agents on your machine
  2. Normalizes them to a common format (with automatic secret redaction — see Security below)
  3. Discovers verifiers in installed tiles (searches all verifiers/ directories — root, skill subdirectories, anywhere in the tile)
  4. Prepares session transcripts for LLM review
  5. Dispatches haiku judges via claude -p to evaluate each session against each verifier
  6. Merges verdicts into an aggregate summary

With --friction, steps 5-6 also run friction reviewers in parallel:

  • Dispatches separate haiku calls to detect friction (errors, backtracking, user frustration)
  • Merges friction results and synthesizes them with verifier data
  • Classifies each friction event by its relationship to installed tiles:
    • preventable — skill covers this, agent didn't follow
    • introduced — skill instructions caused the friction
    • adjacent — in the skill's domain, not covered by verifiers
    • unrelated — general agent/environment issues

Security

Secret Redaction

All raw log content is passed through redact_secrets() in normalize_logs.py before any further processing. This strips API keys, bearer tokens, AWS credentials, Stripe keys, and other common secret patterns. Prepared transcripts and judge inputs only ever see redacted content.

Indirect Prompt Injection

Session transcripts contain untrusted content — tool outputs, web page text, user messages, and other data from prior agent sessions. This content is passed to LLM judges for evaluation, creating a theoretical indirect prompt injection surface. Mitigations:

  1. Judge framingreview_session.py wraps transcripts in <transcript> tags and explicitly instructs the judge that the content is data to evaluate, not instructions to follow, and to ignore any embedded instructions or prompt overrides.
  2. Structured output — judges must return a specific JSON verdict schema. There is no mechanism for a judge to take actions, write files, or execute code.
  3. No tools — judges run via claude -p with no tool access. Even if an injection influenced the judge's reasoning, it cannot escalate beyond the verdict output.
  4. Limited blast radius — the worst case is a biased verdict (a check incorrectly marked as passed/failed). Verdicts are reviewed by the orchestrating agent and presented to the user, so anomalies are visible.

Scripts

Pipeline Orchestrator

  • run_pipeline.pyrun the full pipeline in a single command (calls all scripts below)

Collection & Normalization

Search & Discovery

Evaluation

Friction (parallel pipeline, enabled with --friction)

Analysis (tile-level)

Data Storage

All analysis data is stored in ~/.tessl/session-analyses/<project-slug>/:

~/.tessl/session-analyses/-Users-amy-dev-myproject/
├── raw/                          # Collected raw logs
│   ├── claude-code/
│   ├── codex/
│   ├── gemini/
│   ├── cursor-ide/
│   └── cursor-agent/
├── normalized/                   # NormalizedEvent JSONL
│   └── <agent>/
├── runs/
│   └── <timestamp>/             # Each analysis run
│       ├── manifest.json         # What was analyzed (tiles, verifiers, agents, sessions)
│       ├── prepared/             # Condensed transcripts
│       │   └── <agent>/
│       ├── verdicts/             # Per-session judge verdicts
│       │   └── <tile>/           # Namespaced by tile
│       │       └── <agent>/
│       ├── verdicts-aggregate.json
│       ├── friction/             # Per-session friction reviews (with --friction)
│       │   └── <agent>/
│       ├── friction-summary.json  # Aggregated friction data
│       └── synthesis.json         # Combined verifier + friction findings
└── latest -> runs/<timestamp>/   # Symlink to most recent run

The project slug uses the same dash-encoding as tessl's agent-logs cache (e.g. /Users/amy/dev/myproject -> -Users-amy-dev-myproject).

Multi-Path Analyses

When --project-dir is passed multiple paths (e.g. worktrees or separate checkouts), each path gets its own analysis directory with independent raw/, normalized/, and runs/<timestamp>/ trees. The merge scripts (merge_verdicts.py, merge_friction.py) accept multiple --dir values and aggregate verdicts from all of them. The aggregated output (verdicts-aggregate.json, friction-summary.json, synthesis.json) is written to the primary (first) path's run directory.

README.md

tile.json