jbaruch/coding-policy

General-purpose coding policy for Baruch's AI agents

1.24x

Quality

90%

Does it follow best practices?

Impact

97%

1.24x

Average score across 14 eval scenarios

Securityby

Passed

No known issues

Eval Curation — Diagnose a Near-Zero-Lift Scenario

Name: jbaruch/coding-policy
Rating: 96.4 (1 reviews)
Author: jbaruch

Problem Description

You're running a curation pass over an eval suite for a tile. The most recent tessl eval run produced per-scenario lift numbers, and one positive-case scenario is sitting at near-zero lift after multiple runs. The tile's owner needs a diagnosis report and a recommended action before the next publish.

The scenario under review is below. The lift signal across three runs: with-context 92, baseline 90.8 (lift +1.2).

Output Specification

Write a file named diagnosis.md in the working directory containing:

Cause identification — name the cause of the near-zero lift (use the canonical name from the tile's eval-curation guidance).
Recommended action — one of retire, fix-task, rewrite-criteria.
Reasoning — one or two sentences citing why the cause is what you identified, and why the chosen action is the one prescribed for that cause.

Do not edit the scenario itself in this task — your output is the diagnosis report only.

Input Files

=============== FILE: evals/scenario-imperative-mood/task.md ===============

Write a Git Commit Message

You've just added a function that validates email addresses against RFC 5322. Author a single git commit that captures this change. Produce the commit message body only (no git commit invocation).

=============== FILE: evals/scenario-imperative-mood/criteria.json =============== { "context": "Tests whether the agent produces a commit message in imperative mood, per the commit-conventions tile.", "type": "weighted_checklist", "checklist": [ { "name": "imperative mood", "max_score": 60, "description": "The first line of the commit message uses imperative mood (e.g., 'Add', 'Validate', 'Implement') rather than past tense ('Added', 'Validated') or present participle ('Adding', 'Validating')" }, { "name": "subject line under 72 chars", "max_score": 40, "description": "The subject line is 72 characters or fewer" } ] }

rules

README.md

tile.json