Collect and normalize agent logs, discover installed verifiers, and dispatch LLM judges to evaluate adherence. Produces per-session verdicts and aggregated reports.
91
90%
Does it follow best practices?
Impact
96%
3.09xAverage score across 3 eval scenarios
Passed
No known issues
Create verifiers — structured pass/fail checklists that track any aspect of agent behavior you care about. Verifiers can come from:
Each verifier captures one instruction with a checklist of binary checks that an LLM judge evaluates against session transcripts.
Verifier JSON files go in a directory called verifiers/. This directory is always flat (no nesting inside it). There are two valid locations:
Put the verifiers/ directory inside the skill directory the verifiers apply to:
target-tile/
tile.json
skills/
frontend-design/
SKILL.md
verifiers/ # adjacent to SKILL.md
use-tailwind-for-styling.json
run-tests-after-changes.json
webapp-testing/
SKILL.md
verifiers/ # each skill gets its own
use-playwright.jsonIf verifiers apply to the tile generally (not a specific skill), put them at the tile root.
When creating a new tile just to hold verifiers, there are two approaches:
One tile per skill — cleaner traceability, put verifiers/ at the tile root:
tiles/frontend-design-verifiers/
tile.json
docs/
overview.md
verifiers/
use-tailwind-for-styling.json
run-tests-after-changes.jsonOne tile for multiple skills — all verifiers in a single verifiers/ directory, use a naming convention to trace back which skill they apply to (e.g. prefix with the skill name):
tiles/my-project-verifiers/
tile.json
docs/
overview.md
verifiers/
frontend-design--use-tailwind.json
frontend-design--run-tests.json
webapp-testing--use-playwright.jsonEither approach works. One-tile-per-skill is easier to manage if verifiers change at different rates. A single tile is simpler if there are only a few verifiers across skills.
See verifier-schema.md for the full JSON schema and examples.
Ask the user what to create verifiers from, or identify it from context. Sources can be:
tile.json to find skills, then read each SKILL.md and its referencesCLAUDE.md, AGENTS.md, .cursor/rules/, or any file the user points toAlso identify the output target. This depends on how the tile is installed — check tessl.json to determine:
IMPORTANT: Never write verifiers into .tessl/tiles/ — that directory is tessl-managed and will be overwritten on install/update.
If the tile has "source": "file:..." in tessl.json — it's a local tile. The file path points to the editable source directory. Write verifiers there:
verifiers/ inside each skill's directory (adjacent to SKILL.md), or at the tile root for general verifiers--watch-local is active, or re-run tessl install file:<path>If the tile has only "version": (no file: source) — it's from the registry or git. The source is read-only. Create a new companion tile to hold verifiers:
# Create a new tile alongside the project (not in .tessl/)
tessl tile new --name <workspace>/<tile-name>-verifiers --path tiles/<tile-name>-verifiers --workspace <workspace>
# Install it so tessl tracks it
tessl install file:tiles/<tile-name>-verifiers --watch-localDecide with the user: one companion tile per source tile, or one combined verifier tile with a naming convention prefix.
If the user specifies a different path — use that instead.
Verifier-only tiles: If the target tile will only contain verifiers (no skills, rules, or other content), it must also include a short docs/ file to pass tessl tile lint. Create docs/overview.md with a brief description of what the verifiers check and note that they can be applied with the audit-logs skill. Add "docs": "docs/overview.md" to tile.json. The overview must include a markdown link to every verifier JSON file (e.g. [Use uv for Python](../verifiers/use-uv-for-python.json)) so that all verifiers are discoverable from the docs.
Tiles with existing content: If the target tile already has skills, docs, or other content, every verifier must still be linked via a markdown link from somewhere reachable from tile.json (e.g. a docs file, SKILL.md, or a references file). Choose the most natural place — for example, a "Verifiers" section in the existing docs file, or a dedicated docs/verifiers.md if the existing docs are focused on something else. Use your judgement on where fits best, but every verifier JSON must be linked.
Read extraction-guide.md for what to extract and field guidance.
Read all source material thoroughly:
SKILL.md and all linked references (references/, scripts/)For each source, identify every instruction that directs an agent to do or not do something specific.
For each instruction found, create a verifier JSON file with:
instruction, relevant_when, context filled insources filled in only if the verifier is at the tile root or in a standalone verifier-only tile. When verifiers are embedded inside a skill directory (e.g. skills/my-skill/verifiers/), omit sources — the source is implicitly the skill the verifier lives inside. This avoids sources drifting out of sync with the actual location.checklist: [] (empty — filled in Phase 2)See verifier-schema.md for the schema.
File naming: short kebab-case slug from the instruction (e.g. use-tailwind-for-styling.json).
For skill sources: also create an activation verifier if appropriate — "was the skill loaded?" See the activation section in extraction-guide.md.
For docs/rules sources: skip activation verifiers (docs are loaded automatically).
Do NOT pause here — proceed directly to filling out checklists.
For each instruction file, decompose into checklist items following extraction-guide.md.
Each checklist item needs:
name — short identifier (1-4 words, kebab-case)rule — binary pass/fail check a judge can evaluaterelevant_when — when this specific check appliesAfter filling out all checklists, run validation:
uv run python3 scripts/validate_verifiers.py <verifiers-dir>Fix any errors and re-run until clean.
Now pause and present the full set of verifiers for review. Ask the user to think critically about:
relevant_when fields match the user's actual workflow. A rule might say "when writing React components" but the user's agents rarely do that.Present the list clearly:
## Verifiers created — please review
Source: skills/frontend-design/SKILL.md (14 verifiers, 23 checklist items)
1. use-tailwind-for-styling.json — "Use Tailwind CSS for all styling" (3 checks)
2. run-dev-server-first.json — "Start dev server before screenshots" (1 check)
3. prefer-shadcn-components.json — "Use shadcn/ui over custom" (2 checks)
...
Source: CLAUDE.md (8 verifiers, 11 checklist items)
1. ts-extensions-in-imports.json — "Use .ts extensions in imports" (2 checks)
2. run-lint-after-changes.json — "Run bun lint after changes" (1 check)
...
Total: 22 verifiers, 34 checklist items.
Are these all worth tracking? Should any be removed or modified?
Think about: which of these actually matter for your workflow,
and whether the "relevant_when" conditions match when your agents
actually encounter these situations.Wait for user confirmation. Remove or adjust verifiers based on feedback before proceeding.
Run tessl tile lint on the target tile to verify the tile structure is valid:
tessl tile lint <tile-path>Fix any errors and re-run until clean.
If a new tile was created (not adding verifiers to an existing one), offer to install it so the audit pipeline can discover the verifiers:
tessl install file:<tile-path> --watch-local--watch-local keeps the installed copy in sync as verifiers are added or edited. Without it, changes require re-running tessl install.
Present a final summary:
Source: skills/frontend-design/SKILL.md
Instructions: 12 (2 removed during review)
Checklist items: 19
Coverage: 12/14 extracted (2 removed by user)
Total: 18 verifiers, 28 checklist items| File | Read before |
|---|---|
| extraction-guide.md | Step 2: reading source material |
| verifier-schema.md | Step 3: creating verifier files |
| validate_verifiers.py | Step 4: after writing verifiers |
tessl tile lint | Step 6: after validation passes |