CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/coding-policy

General-purpose coding policy for Baruch's AI agents

95

1.31x
Quality

91%

Does it follow best practices?

Impact

96%

1.31x

Average score across 10 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-4/

Eval Coverage Gap Analysis

Problem/Feature Description

A team has built a Tessl skill called deploy that automates a deployment workflow. The skill picks different paths based on the target environment, runs post-deploy verification before declaring success, and emits notifications about the outcome.

Their existing eval suite has two scenarios, both happy paths: one deploys to staging and reports success, the other deploys to production with an approval flag set and also reports success. The team suspects coverage is thin but isn't sure exactly what's missing.

They ask you to audit the eval coverage, identify what the existing two scenarios don't exercise, and write new scenarios that fill those gaps.

Output Specification

  • Produce a written coverage analysis describing the gaps you identified and why each one matters. Pick an appropriate filename.
  • For each gap, create a new scenario directory under evals/. The file layout and criteria format should follow this tile's conventions for eval scenarios — consult the tile's own rules and existing scenarios rather than guessing.

evals

README.md

tile.json