Name: tessl-labs/eval-improve
Rating: 0.944 (1 reviews)
Author: tessl-labs

tessl-labs/eval-improve

Analyze eval results, diagnose low-scoring criteria, fix tile content, and re-run evals — the full improvement loop automated

1.30x

Quality

89%

Does it follow best practices?

Impact

98%

1.30x

Average score across 7 eval scenarios

Securityby

Passed

No known issues

You have eval results for a tile. The user wants you to analyze them.

The user says: "My eval scores look bad. Can you look at the latest results and tell me what needs to be fixed and in what priority order?"

Analyze the eval results by running the appropriate commands, classify each criterion into the correct bucket, and present a clear summary to the user.

evals

scenario-1

scenario-2

rubric.json

task.md

scenario-3

scenario-4

scenario-5

scenario-6

scenario-7

skills

README.md

tile.json

tessl-labs/eval-improve

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

task.mdevals/scenario-2/