Generate eval scenarios from repo commits, configure multi-agent runs, execute baseline + with-context evals, and compare results — the full setup pipeline before improvement begins
Overall
score
90%
Does it follow best practices?
Validation for skill structure
checks_prerequisites
0%
100%
browses_commits
0%
100%
auto_detects_context_files
0%
100%
uses_context_flag
0%
100%
workspace_in_eval_run
0%
100%
explains_baseline_vs_context
100%
100%
Without context: $0.7838 · 2m 19s · 32 turns · 2,830 in / 6,441 out tokens
With context: $0.4344 · 1m 58s · 18 turns · 324 in / 5,729 out tokens
does_not_use_last_only
33%
100%
finds_generation_ids
100%
100%
downloads_each_separately
66%
100%
explains_why
0%
100%
Without context: $0.4949 · 1m 31s · 26 turns · 1,820 in / 4,649 out tokens
With context: $0.4564 · 1m 48s · 19 turns · 64 in / 5,333 out tokens
Install with Tessl CLI
npx tessl i tessl-labs/eval-setupTable of Contents