CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/eval-setup

Generate eval scenarios from repo commits, configure multi-agent runs, execute baseline + with-context evals, and compare results — the full setup pipeline before improvement begins

Overall
score

90%

Does it follow best practices?

Validation for skill structure

Overview
Skills
Evals
Files

rubric.jsonevals/scenario-1/

{
  "context": "Testing whether an agent following the eval-setup skill correctly guides a new user through the full eval setup pipeline, including prerequisites, commit browsing, context file detection, scenario generation, and running the eval.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "checks_prerequisites",
      "description": "The agent verifies the user is logged in (e.g., runs tessl whoami) before proceeding with setup steps.",
      "max_score": 1
    },
    {
      "name": "browses_commits",
      "description": "The agent runs `tessl repo select-commits acme/backend` to show actual commits from the repo, rather than asking the user to supply commit hashes directly without any browsing step.",
      "max_score": 3
    },
    {
      "name": "auto_detects_context_files",
      "description": "The agent searches the repository for context files (CLAUDE.md, *.mdc, AGENTS.md, tessl.json, etc.) automatically — rather than asking the user to specify them without any investigation.",
      "max_score": 2
    },
    {
      "name": "uses_context_flag",
      "description": "The agent includes a `--context` flag when running `tessl scenario generate`, specifying appropriate glob patterns for the detected context files.",
      "max_score": 2
    },
    {
      "name": "workspace_in_eval_run",
      "description": "The agent includes `--workspace=<name>` when running `tessl eval run`. Omitting --workspace would cause the command to fail.",
      "max_score": 2
    },
    {
      "name": "explains_baseline_vs_context",
      "description": "The agent explains that each scenario runs twice — once without context files (baseline) and once with them injected — and that the delta shows whether CLAUDE.md is helping the agent.",
      "max_score": 2
    }
  ]
}

Install with Tessl CLI

npx tessl i tessl-labs/eval-setup

evals

scenario-1

rubric.json

task.md

README.md

tile.json