Use when the user wants to review, audit, or check safety for an AI memory system, agent learning pipeline, prompt-tuning workflow, skill builder, trace-mining tool, or eval/feedback loop. Produces an evidence-led audit report with learning-loop map, evidence inventory, maturity scorecard, severity-ranked findings, privacy/provenance gaps, counterfactual/eval coverage, and Stabilize/Standardize/Scale roadmap.
100
100%
Does it follow best practices?
Impact
100%
1.28xAverage score across 3 eval scenarios
Passed
No known issues
A product team has been running PromptTuner for six weeks, automatically improving their customer support agent's system prompt every week. The optimization loop has promoted a new prompt in each of its four completed runs, and the team is pleased with the improving scores in the log.
The head of AI is now asking for a full audit before expanding the system to additional product lines. She wants to understand how mature the current setup is, what risks exist, and what needs to be fixed before scaling further.
The source code and supporting files for PromptTuner are in the inputs/ directory. No access
to live dashboards, session logs, or deployed infrastructure has been provided — only the files
in that directory.
Produce a single file called audit_report.md with a complete audit of the PromptTuner system.
Your report should:
Where you reference specific code behavior, cite the relevant file and line. Where information was not made available, note it as a gap rather than assuming it is in order.