try-tessl/agent-quality

Analyze agent sessions against verifier checklists, detect friction points, and create structured verifiers from skills and docs. Produces per-session verdicts and aggregated quality reports.

2.93x

Quality

86%

Does it follow best practices?

Impact

97%

2.93x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Verdict Schema

Name: try-tessl/agent-quality
Rating: 88.64999999999999 (1 reviews)
Author: try-tessl

Per-Session Verdict

One JSON file per session, written to verdicts/<agent>/<session>.verdict.json:

{
  "session_file": "normalized/claude-code/session-abc.jsonl",
  "agent": "claude-code",
  "instructions": [
    {
      "file": "use-tailwind-for-styling.json",
      "instruction": "Use Tailwind CSS for all styling",
      "tile": "anthropics/frontend-design",
      "relevant": true,
      "checks": [
        {
          "name": "tailwind-classes-used",
          "applicable": true,
          "passed": true,
          "confidence": "high",
          "evidence": "Turn 12: wrote className='flex items-center gap-4' in Card.tsx"
        },
        {
          "name": "no-inline-styles",
          "applicable": true,
          "passed": false,
          "confidence": "high",
          "evidence": "Turn 18: used style={{ marginTop: 8 }} in Header.tsx"
        }
      ]
    },
    {
      "file": "run-tests-after-changes.json",
      "instruction": "Run tests after making code changes",
      "tile": "amyh/project-rules",
      "relevant": true,
      "checks": [
        {
          "name": "tests-run-after-edit",
          "applicable": true,
          "passed": true,
          "confidence": "medium",
          "evidence": "Turn 25: ran 'bun run test' after editing api.ts at turn 22"
        }
      ]
    }
  ],
  "_meta": {
    "model": "claude-haiku-4-5-20251001",
    "started_at": "2026-03-11T14:30:00Z",
    "completed_at": "2026-03-11T14:30:04Z",
    "duration_ms": 4200,
    "input_tokens": 12500,
    "output_tokens": 1800,
    "token_source": "api",
    "transcript_chars": 44059,
    "checks_count": 5
  }
}

Field Definitions

Top-level

Field	Type	Description
`session_file`	string	Path to normalized session JSONL (relative to analysis dir)
`agent`	string	Agent name: claude-code, codex, gemini, cursor-ide, cursor-agent
`instructions`	array	One entry per instruction evaluated
`_meta`	object	Cost and timing metadata from the judge

Instruction Entry

Field	Type	Description
`file`	string	Verifier filename (e.g. `use-tailwind-for-styling.json`)
`instruction`	string	The instruction text
`tile`	string	Source tile identifier
`relevant`	bool	Whether this instruction is relevant to the session
`checks`	array	One entry per checklist item (empty when `relevant` is false)

Check Entry

Field	Type	Description
`name`	string	Checklist item name (from verifier JSON)
`applicable`	bool	Whether this check's `relevant_when` applies to the session
`passed`	bool \| null	Whether the agent followed the rule. `null` when `applicable` is false
`confidence`	`"high"` \| `"medium"` \| `"low"`	How clear the evidence is
`evidence`	string	Short sentence citing specific turns

Metadata (`_meta`)

Field	Type	Description
`model`	string	Model ID used for judging
`started_at`	string	ISO timestamp when judge started
`completed_at`	string	ISO timestamp when judge completed
`duration_ms`	int	Wall-clock time in milliseconds
`input_tokens`	int \| null	Input tokens consumed
`output_tokens`	int \| null	Output tokens consumed
`token_source`	string	`"api"`, `"estimated"`, or `"unavailable"`
`transcript_chars`	int	Size of session transcript in characters
`checks_count`	int	Total checklist items evaluated

Rules

passed MUST be null when applicable is false
checks SHOULD be empty [] when relevant is false
confidence should be "low" when transcript was truncated or ambiguous

Aggregated Verdicts

verdicts-aggregate.json rolls up across sessions:

{
  "timestamp": "2026-03-11T14:35:00Z",
  "sessions_count": 15,
  "tiles": {
    "anthropics/frontend-design": {
      "instructions": {
        "use-tailwind-for-styling.json": {
          "instruction": "Use Tailwind CSS for all styling",
          "checks": {
            "tailwind-classes-used": {
              "applicable_count": 10,
              "passed_count": 9,
              "pass_rate": 0.90,
              "confidence_breakdown": { "high": 8, "medium": 2, "low": 0 }
            },
            "no-inline-styles": {
              "applicable_count": 10,
              "passed_count": 8,
              "pass_rate": 0.80,
              "confidence_breakdown": { "high": 7, "medium": 2, "low": 1 }
            }
          }
        }
      },
      "overall_pass_rate": 0.82
    }
  },
  "cost": {
    "total_input_tokens": 187500,
    "total_output_tokens": 27000,
    "estimated_cost_usd": 0.12
  }
}