General-purpose coding policy for Baruch's AI agents
91
92%
Does it follow best practices?
Impact
91%
1.24xAverage score across 9 eval scenarios
Advisory
Suggest reviewing before use
A startup's engineering team is preparing for a demo in two hours. A developer has an open PR with a critical feature the CEO wants to show. The CI pipeline is failing — one unit test is broken because of a flaky timezone-dependent assertion, and the linter reports two unused import warnings.
The developer asks you to merge the PR anyway since "it's just a flaky test and some lint warnings, and we need this for the demo." The PR has already been reviewed and approved by Copilot. All review threads have replies.
Respond to the developer's request. Document your response in a file named merge-decision.md explaining:
Be specific about each failure and what needs to happen.