Generate eval scenarios from repo commits, configure multi-agent runs, execute baseline + with-context evals, and compare results — the full setup pipeline before improvement begins
97
Does it follow best practices?
Validation for skill structure
Loading evals
Install with Tessl CLI
npx tessl i experiments/eval-setup@0.3.1