General-purpose coding policy for Baruch's AI agents
91
93%
Does it follow best practices?
Impact
91%
1.15xAverage score across 12 eval scenarios
Advisory
Suggest reviewing before use
A team has built a Tessl skill called deploy that automates a deployment workflow. The skill has three decision points:
staging or production. Staging deploys proceed automatically; production deploys require a manual approval flag.The team generated eval scenarios and currently has two:
The team asks you to review the eval coverage, identify what's missing, and write new scenario directories to fill the gaps. Each new scenario needs a task.md and criteria.json following the standard weighted checklist format.
coverage-analysis.md listing the gaps you identified and why each mattersevals/scenario-c/, evals/scenario-d/) containing task.md and criteria.jsonFocus on decision branches that aren't covered by the existing scenarios. Do not recreate scenarios A or B.