Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing.
36
45%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Loading evals