Collect and normalize agent logs, discover installed verifiers, and dispatch LLM judges to evaluate adherence. Produces per-session verdicts and aggregated reports.
91
90%
Does it follow best practices?
Impact
96%
3.09xAverage score across 3 eval scenarios
Passed
No known issues
A platform engineering team runs regular compliance audits to check whether their AI agents are following internal guidelines. They have the audit-logs tile installed in their project and want to make the audit process repeatable. Today, running an audit requires knowing the right sequence of commands, which scripts to call, and which flags to use — knowledge that's inconsistently applied across team members.
The team lead wants two things: a reusable shell script that any team member can run to kick off a Phase 1 compliance check on recent sessions, and a separate script for when someone has a specific concern to investigate (e.g. "check how agents handle authentication changes"). The scripts should be robust and follow the recommended patterns for using the audit pipeline.
Produce two shell scripts:
audit-recent.sh — Runs a standard Phase 1 compliance check on the most recent sessions for the current project directory. Should handle the case where the running session itself might be captured.
audit-search.sh — Takes a search query as an argument ($1) and first searches for matching sessions, then audits a small number of them. Should demonstrate the recommended workflow for investigating a specific concern.
Both scripts should be executable and include brief comments explaining what each section does. Also produce a README.md explaining when to use each script.