jbaruch/coding-policy

General-purpose coding policy for Baruch's AI agents

1.31x

Quality

91%

Does it follow best practices?

Impact

96%

1.31x

Average score across 10 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Eval Coverage Gap Analysis

Name: jbaruch/coding-policy
Rating: 95.56 (1 reviews)
Author: jbaruch

Problem/Feature Description

A team has built a Tessl skill called deploy that automates a deployment workflow. The skill picks different paths based on the target environment, runs post-deploy verification before declaring success, and emits notifications about the outcome.

Their existing eval suite has two scenarios, both happy paths: one deploys to staging and reports success, the other deploys to production with an approval flag set and also reports success. The team suspects coverage is thin but isn't sure exactly what's missing.

They ask you to audit the eval coverage, identify what the existing two scenarios don't exercise, and write new scenarios that fill those gaps.

Output Specification

Produce a written coverage analysis describing the gaps you identified and why each one matters. Pick an appropriate filename.
For each gap, create a new scenario directory under evals/. The file layout and criteria format should follow this tile's conventions for eval scenarios — consult the tile's own rules and existing scenarios rather than guessing.

rules

README.md

tile.json

jbaruch/coding-policy

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-4/

Eval Coverage Gap Analysis

Problem/Feature Description

Output Specification

task.mdevals/scenario-4/