CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/coding-policy

General-purpose coding policy for Baruch's AI agents

90

1.30x
Quality

91%

Does it follow best practices?

Impact

90%

1.30x

Average score across 18 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-18/

Eval Curation — Curate the Suite

Problem Description

You're running a curation pass over an eval suite for a tile. The most recent tessl eval run produced per-scenario lift numbers, summarized below. The tile's owner wants a curation summary before the next publish.

Output Specification

Write a file named curation-summary.md in the working directory. The file's content depends on the suite's state:

  • If any scenarios need curation, list them with the cause identification (from the tile's three-cause framework), the recommended action, and reasoning.
  • If no scenarios need curation, write a one-line summary stating that no curation is needed.

Do not fabricate diagnoses for scenarios that don't need them.

Per-Scenario Lift Summary

Lift values are means across 3 runs.

Scenariowith-contextbaselinelift
merge-with-canonical-flag9641+55
reply-with-fixed-in-template9235+57
discover-bot-id-via-graphql8838+50
compose-pr-body-with-author-model-line9047+43
chain-poll-then-merge-after-green9451+43
refuse-publish-with-uncommitted-changes1001000

README.md

tile.json