| | — | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| Agent success vs baseline Does it follow best practices? Average score across 8 eval scenarios Reviewed: Version: 7.12.0 | 1.05x | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| Agent success vs baseline Does it follow best practices? Average score across 9 eval scenarios Reviewed: Version: 3.0.0 | 1.44x | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| | — | — | |
| | — | — | |