CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/audit-logs

Collect and normalize agent logs, discover installed verifiers, and dispatch LLM judges to evaluate adherence. Produces per-session verdicts and aggregated reports.

91

3.09x
Quality

90%

Does it follow best practices?

Impact

96%

3.09x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-3/

{
  "context": "Tests whether the agent correctly structures verifier files for a locally-sourced tile: placing them inside the skill directory, using kebab-case naming, omitting the sources field, creating an activation verifier, and writing binary checklist items that decompose multi-part instructions.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Verifiers in skill dir",
      "description": "All verifier JSON files are placed inside the skill directory (e.g. tiles/data-pipeline/skills/ingest/verifiers/) — NOT at the tile root, NOT in .tessl/",
      "max_score": 12
    },
    {
      "name": "Flat verifiers dir",
      "description": "The verifiers/ directory contains only JSON files directly — no subdirectories inside it",
      "max_score": 8
    },
    {
      "name": "Kebab-case file names",
      "description": "All verifier JSON filenames use kebab-case slugs (e.g. use-pandas-for-loading.json) — no spaces, underscores, or uppercase",
      "max_score": 8
    },
    {
      "name": "Sources field omitted",
      "description": "No verifier file includes a sources field (sources are implied by the skill directory location)",
      "max_score": 12
    },
    {
      "name": "Activation verifier",
      "description": "At least one verifier file is specifically about whether the ingest skill was activated/loaded by the agent",
      "max_score": 12
    },
    {
      "name": "Binary checklist rules",
      "description": "Every checklist item rule is a binary yes/no check — no subjective or graded language like 'properly', 'clearly', 'well-structured'",
      "max_score": 10
    },
    {
      "name": "Multi-part decomposition",
      "description": "Instructions with multiple independent requirements are split into separate checklist items (e.g. log source field and log row_count field are separate checks, not combined into one)",
      "max_score": 10
    },
    {
      "name": "Specific decision-point relevant_when",
      "description": "At least 3 verifier files have relevant_when fields describing a specific decision point (e.g. 'when writing a new data loading function') rather than a broad activity (e.g. 'when working on the pipeline')",
      "max_score": 8
    },
    {
      "name": "Context fields present",
      "description": "Every verifier JSON file includes a non-empty context field",
      "max_score": 8
    },
    {
      "name": "Prohibition verifier present",
      "description": "At least one verifier covers a prohibition from the skill (e.g. do not use polars/dask, do not skip validation, do not use Python logging module)",
      "max_score": 12
    }
  ]
}

evals

tile.json