CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/eval-setup

Generate eval scenarios from repo commits, configure multi-agent runs, execute baseline + with-context evals, and compare results — the full setup pipeline before improvement begins

Overall
score

90%

Does it follow best practices?

Validation for skill structure

Overview
Skills
Evals
Files

Evaluation results

100%

84%

Scenario 1

Criteria
Without context
With context

checks_prerequisites

0%

100%

browses_commits

0%

100%

auto_detects_context_files

0%

100%

uses_context_flag

0%

100%

workspace_in_eval_run

0%

100%

explains_baseline_vs_context

100%

100%

Without context: $0.7838 · 2m 19s · 32 turns · 2,830 in / 6,441 out tokens

With context: $0.4344 · 1m 58s · 18 turns · 324 in / 5,729 out tokens

100%

50%

Scenario 2

Criteria
Without context
With context

does_not_use_last_only

33%

100%

finds_generation_ids

100%

100%

downloads_each_separately

66%

100%

explains_why

0%

100%

Without context: $0.4949 · 1m 31s · 26 turns · 1,820 in / 4,649 out tokens

With context: $0.4564 · 1m 48s · 19 turns · 64 in / 5,333 out tokens

Install with Tessl CLI

npx tessl i tessl-labs/eval-setup
Evaluated
Agent
Claude Code

Table of Contents