Collect and normalize agent logs, discover installed verifiers, and dispatch LLM judges to evaluate adherence. Produces per-session verdicts and aggregated reports.
91
90%
Does it follow best practices?
Impact
96%
3.09xAverage score across 3 eval scenarios
Passed
No known issues
Read this when presenting audit results to the user.
Focus on what happened and what to fix. Do NOT use percentages or aggregate stats — this is one session. Group checks by tile so the user knows where each rule comes from.
When everything passes:
## Session Check — claude-code/abc123
✅ All checks passed — your agent followed the rules.
### elevenlabs/text-to-speech — ✅ all passing
✓ activated skill when needed
✓ no-hardcoded-key, correct-header, correct-js-package, valid-setting-ranges
### anthropics/frontend-design — ✅ all passing
✓ activated skill when needed
✓ use-tailwind-for-styling, no-inline-styles, dev-server-before-screenshot, ts-extensions
Everything looks good. Want to check more sessions or add verifiers for other behaviors?When there are failures:
## Session Check — claude-code/abc123
### elevenlabs/text-to-speech — ✅ all passing
✓ activated skill when needed
✓ no-hardcoded-key, correct-header, correct-js-package, valid-setting-ranges
### amyh/frontend-design — ⚠ 2 issues
✗ skill not activated
✓ tailwind-classes-used, no-css-modules, ts-extensions
1. **no-inline-styles** — FAILED
Rule: "Use Tailwind utility classes, never inline styles"
What happened: Agent used style={{}} on 3 components (turns 12, 18, 24)
→ Fix: Replace inline styles with Tailwind classes in these files
2. **dev-server-before-screenshot** — FAILED
Rule: "Start dev server before taking screenshots"
What happened: Agent ran screenshot tool at turn 31 without starting dev server
→ Fix: Run `bun dev` and retake the screenshotAlways explain what the rule says vs what actually happened. Offer to fix each issue.
When friction is enabled, add after the verifier results:
### Friction Points
**amyh/frontend-design**
Preventable (skill has the answer, agent didn't follow):
- **repeated_failure** (turns 15-22, major) — Agent tried 3x to screenshot without dev server
→ dev-server-before-screenshot verifier also failed. Skill says to do this.
Introduced (skill instructions caused the problem):
- **wrong_approach** (turn 8, moderate) — Skill says use CSS modules but project uses Tailwind
→ Fix: update the skill's styling instructions
**Unrelated**
- **tool_misuse** (turn 32, minor) — Wrong file path in Read tool, resolved in 1 turnStart with a tile-level overview (when 2+ tiles), then per-tile details. Lead with what's working.
## Cross-Session Analysis (12 sessions, last 7 days)
### Overview
✅ elevenlabs/text-to-speech — activated (1/3 sessions), all checks passing
⚠ amyh/frontend-design — never activated, 2 recurring issues
➖ amyh/webapp-testing — not relevant to recent sessions
### elevenlabs/text-to-speech — ✅ all passing
✓ no-hardcoded-key — 100% (4/4 applicable)
✓ correct-header — 100% (4/4)
✓ correct-js-package — 100% (2/2)
✓ valid-setting-ranges — 100% (6/6)
### amyh/frontend-design — ⚠ 2 recurring issues
✗ never activated
Consistently passing:
✓ tailwind-classes-used — 95% (10/10 applicable)
✓ ts-extensions — 100% (12/12)
✓ no-css-modules — 100% (8/8)
Improving:
✓ no-inline-styles — 80% now (was 60% last week) — trending up
Recurring issues:
1. **dev-server-before-screenshot** — 40% pass rate (2/5 applicable sessions)
Pattern: Agent consistently skips starting dev server before screenshots
→ This tile is installed locally — review its SKILL.md to clarify the
instruction, or add to AGENTS.md as a backup.
2. **color-tokens-only** — 60% pass rate (3/5 applicable sessions)
Pattern: Agent uses raw hex values instead of design tokens
→ Run `tessl skill review tiles/frontend-design --optimize` to
tighten the instruction wording.When friction is enabled, extend the tile sections:
### amyh/frontend-design — ⚠ 2 failing checks + friction
Adherence: 75% pass rate (dev-server-before-screenshot: 40%)
Friction breakdown (4 events across 3 sessions):
Preventable × 2 — agent not following existing instructions
Introduced × 1 — skill's styling instruction conflicts with project setup
Adjacent × 1 — responsive layout struggles not covered by skill
→ Priority: fix the introduced friction first (wrong instruction),
then strengthen activation for preventable issues