CtrlK
BlogDocsLog inGet started
Tessl Logo

try-tessl/agent-quality

Analyze agent sessions against verifier checklists, detect friction points, and create structured verifiers from skills and docs. Produces per-session verdicts and aggregated quality reports.

88

2.93x
Quality

86%

Does it follow best practices?

Impact

97%

2.93x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

verdict-schema.mdskills/analyze-sessions/references/

Verdict Schema

Per-Session Verdict

One JSON file per session, written to verdicts/<agent>/<session>.verdict.json:

{
  "session_file": "normalized/claude-code/session-abc.jsonl",
  "agent": "claude-code",
  "instructions": [
    {
      "file": "use-tailwind-for-styling.json",
      "instruction": "Use Tailwind CSS for all styling",
      "tile": "anthropics/frontend-design",
      "relevant": true,
      "checks": [
        {
          "name": "tailwind-classes-used",
          "applicable": true,
          "passed": true,
          "confidence": "high",
          "evidence": "Turn 12: wrote className='flex items-center gap-4' in Card.tsx"
        },
        {
          "name": "no-inline-styles",
          "applicable": true,
          "passed": false,
          "confidence": "high",
          "evidence": "Turn 18: used style={{ marginTop: 8 }} in Header.tsx"
        }
      ]
    },
    {
      "file": "run-tests-after-changes.json",
      "instruction": "Run tests after making code changes",
      "tile": "amyh/project-rules",
      "relevant": true,
      "checks": [
        {
          "name": "tests-run-after-edit",
          "applicable": true,
          "passed": true,
          "confidence": "medium",
          "evidence": "Turn 25: ran 'bun run test' after editing api.ts at turn 22"
        }
      ]
    }
  ],
  "_meta": {
    "model": "claude-haiku-4-5-20251001",
    "started_at": "2026-03-11T14:30:00Z",
    "completed_at": "2026-03-11T14:30:04Z",
    "duration_ms": 4200,
    "input_tokens": 12500,
    "output_tokens": 1800,
    "token_source": "api",
    "transcript_chars": 44059,
    "checks_count": 5
  }
}

Field Definitions

Top-level

FieldTypeDescription
session_filestringPath to normalized session JSONL (relative to analysis dir)
agentstringAgent name: claude-code, codex, gemini, cursor-ide, cursor-agent
instructionsarrayOne entry per instruction evaluated
_metaobjectCost and timing metadata from the judge

Instruction Entry

FieldTypeDescription
filestringVerifier filename (e.g. use-tailwind-for-styling.json)
instructionstringThe instruction text
tilestringSource tile identifier
relevantboolWhether this instruction is relevant to the session
checksarrayOne entry per checklist item (empty when relevant is false)

Check Entry

FieldTypeDescription
namestringChecklist item name (from verifier JSON)
applicableboolWhether this check's relevant_when applies to the session
passedbool | nullWhether the agent followed the rule. null when applicable is false
confidence"high" | "medium" | "low"How clear the evidence is
evidencestringShort sentence citing specific turns

Metadata (_meta)

FieldTypeDescription
modelstringModel ID used for judging
started_atstringISO timestamp when judge started
completed_atstringISO timestamp when judge completed
duration_msintWall-clock time in milliseconds
input_tokensint | nullInput tokens consumed
output_tokensint | nullOutput tokens consumed
token_sourcestring"api", "estimated", or "unavailable"
transcript_charsintSize of session transcript in characters
checks_countintTotal checklist items evaluated

Rules

  • passed MUST be null when applicable is false
  • checks SHOULD be empty [] when relevant is false
  • confidence should be "low" when transcript was truncated or ambiguous

Aggregated Verdicts

verdicts-aggregate.json rolls up across sessions:

{
  "timestamp": "2026-03-11T14:35:00Z",
  "sessions_count": 15,
  "tiles": {
    "anthropics/frontend-design": {
      "instructions": {
        "use-tailwind-for-styling.json": {
          "instruction": "Use Tailwind CSS for all styling",
          "checks": {
            "tailwind-classes-used": {
              "applicable_count": 10,
              "passed_count": 9,
              "pass_rate": 0.90,
              "confidence_breakdown": { "high": 8, "medium": 2, "low": 0 }
            },
            "no-inline-styles": {
              "applicable_count": 10,
              "passed_count": 8,
              "pass_rate": 0.80,
              "confidence_breakdown": { "high": 7, "medium": 2, "low": 1 }
            }
          }
        }
      },
      "overall_pass_rate": 0.82
    }
  },
  "cost": {
    "total_input_tokens": 187500,
    "total_output_tokens": 27000,
    "estimated_cost_usd": 0.12
  }
}

README.md

tile.json