Collect and normalize agent logs, discover installed verifiers, and dispatch LLM judges to evaluate adherence. Produces per-session verdicts and aggregated reports.
91
90%
Does it follow best practices?
Impact
96%
3.09xAverage score across 3 eval scenarios
Passed
No known issues
A verifier is a JSON file containing one instruction and its checklist of things to evaluate. Each instruction produces one file. Judge agents use these to score agent sessions.
Verifiers live in verifiers/ directories inside a tile. For tiles with skills, place verifiers inside the skill directory. For verifier-only tiles, place at the tile root.
Tile with skills (default):
my-tile/
tile.json
skills/
my-skill/
SKILL.md
verifiers/
use-tailwind-for-styling.json
run-tests-after-changes.jsonVerifier-only tile:
my-tile/
tile.json
docs/
overview.md
verifiers/
prefer-bun-over-npm.jsonThe audit pipeline discovers verifiers anywhere in the tile tree — root, skill subdirectories, or any other location with a verifiers/ directory.
File names should be short kebab-case slugs derived from the instruction (e.g. use-tailwind-for-styling.json).
{
"instruction": "Use Tailwind CSS for all styling",
"relevant_when": "Agent is writing or modifying frontend React components",
"context": "The project uses Tailwind v4 with the Vite plugin. Inline styles and CSS modules should be avoided. Tailwind classes go in className attributes on JSX elements.",
"sources": [
{
"type": "file",
"filename": "skills/frontend-design/SKILL.md",
"tile": "anthropics/frontend-design@1.2.0",
"line_no": 42
}
],
"checklist": [
{
"name": "tailwind-classes-used",
"rule": "Agent uses Tailwind utility classes (className='...') when writing JSX/TSX components",
"relevant_when": "Agent writes or modifies React component files"
},
{
"name": "no-inline-styles",
"rule": "Agent does not use inline style objects or style={{ }} attributes",
"relevant_when": "Agent writes or modifies React component files"
},
{
"name": "no-css-modules",
"rule": "Agent does not create or import .module.css files",
"relevant_when": "Agent creates new style files or imports for components"
}
]
}| Field | Type | Required | Description |
|---|---|---|---|
instruction | string | yes | The rule from the source material, stated positively and specifically |
relevant_when | string | yes | When this instruction applies at the session level. If the session has nothing to do with this scenario, the judge skips the entire instruction |
context | string | yes | 2-3 sentences of background: definitions, applicability, edge cases. Helps the judge understand intent without reading the full source |
sources | array | no | Where this instruction came from (see Sources below). Omit when verifier is embedded inside a skill directory — the source is implied by location. Include only for root-level or standalone verifiers. |
checklist | array | yes | One or more checks to evaluate (see Checklist below) |
Each source identifies where the instruction was found:
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "file" (from a document) or "user" (stated by user directly) |
filename | string | when type=file | Path to source file, relative to tile root |
tile | string | no | Tile identifier if instruction came from an installed tile (e.g. "anthropics/frontend-design@1.2.0") |
line_no | int | no | Line number in the source file |
Each checklist item is one binary check the judge evaluates:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Short identifier, 1-4 words, kebab-case. Must be unique within the file |
rule | string | yes | What the agent should or shouldn't do. Binary and specific — a judge must be able to answer yes/no |
relevant_when | string | yes | When this specific check applies. Can be narrower than the instruction-level relevant_when |
| Avoid (needs interpretation) | Use instead (binary) |
|---|---|
| "Properly handles errors" | "Uses try/catch around external API calls" |
| "Follows the import style" | "Local import paths use .ts extension, not .js" |
| "Good commit messages" | "Commit message contains more than 5 words" |
| "Creative layout" | "Uses at least ONE of: asymmetric grid, overlapping elements, rotated content" |
Split when an instruction contains:
Each instruction should have 1-5 checklist items. A typical instruction has 1-3. If you're writing more than 5, you may be over-decomposing — consider whether some checks are really testing the same thing.
{
"instruction": "Always run collect_logs.py before normalize_logs.py",
"relevant_when": "Agent is running the log normalization pipeline",
"context": "The normalization script expects raw logs to exist in the raw/ directory. Running it without collecting first will produce empty output or errors.",
"sources": [
{
"type": "file",
"filename": "skills/audit-skill/SKILL.md",
"line_no": 85
}
],
"checklist": [
{
"name": "collect-before-normalize",
"rule": "Agent runs collect_logs.py before running normalize_logs.py in the same session",
"relevant_when": "Agent runs normalize_logs.py"
}
]
}{
"instruction": "Use .ts extensions in local imports, not .js",
"relevant_when": "Agent is writing or editing TypeScript files with local imports",
"context": "The project uses modern TypeScript with native .ts resolution. Using .js extensions in imports is a legacy pattern that should be avoided.",
"sources": [
{
"type": "file",
"filename": "CLAUDE.md",
"line_no": 28
}
],
"checklist": [
{
"name": "uses-ts-extension",
"rule": "Local import paths in written or edited TypeScript files end in .ts",
"relevant_when": "Agent writes or edits files containing local imports"
},
{
"name": "no-js-extension",
"rule": "No local import paths in written or edited TypeScript files end in .js",
"relevant_when": "Agent writes or edits files containing local imports"
}
]
}{
"instruction": "Always use bun, never npm or yarn",
"relevant_when": "Agent runs package management commands",
"context": "User preference for bun as the package manager. This applies to install, add, remove, and run commands.",
"sources": [
{
"type": "user"
}
],
"checklist": [
{
"name": "uses-bun",
"rule": "Agent uses bun (not npm or yarn) for package management commands like install, add, remove, run",
"relevant_when": "Agent runs package management commands"
}
]
}