Use when the user wants to review, audit, or check safety for an AI memory system, agent learning pipeline, prompt-tuning workflow, skill builder, trace-mining tool, or eval/feedback loop. Produces an evidence-led audit report with learning-loop map, evidence inventory, maturity scorecard, severity-ranked findings, privacy/provenance gaps, counterfactual/eval coverage, and Stabilize/Standardize/Scale roadmap.
100
100%
Does it follow best practices?
Impact
100%
1.28xAverage score across 3 eval scenarios
Passed
No known issues
Use when generated skills, plugins, tool bundles, or registry packages can execute code or request tools.
| Surface | Check |
|---|---|
| Lifecycle | Source sessions -> package -> validation -> registry -> install -> runtime |
| Execution | Sandbox shell, package, filesystem, network, tool, resource, and seccomp boundaries |
| Provenance | Trace lineage, hashes/signatures, SBOM, dependency locks, and attestations |
| Verification | State in the Executive Summary and findings that registry or manifest metadata is not verification; require syntax, unit, replay, held-out, paired, regression, CI integration, and thresholds |
| Review | Reviewer queue, approval roles, calibration, canary, deprecation, and rollback |
| Activation | Standalone trigger/conflict finding covering trigger quality, positive and negative trigger tests, inter-skill conflict handling, conflict resolution policy, over/under-activation thresholds, and deprecation |
| Deployment | Standalone local/cloud/MCP/Git-registry trust-zone finding covering local builders, cloud/CI validators, MCP/tool servers, registry install, runtime hosts, secrets, filesystem, network, tools, Git-backed workflow, CI gates, protected labels/environments, canary, rollback |
If the system publishes generated executable skills, add dedicated findings for activation safety and deployment boundaries. Name where code executes locally, in cloud/CI builders, through MCP/tool servers, during registry install, and at runtime. Treat missing conflict policy, protected-label workflow, or boundary policy as unresolved risk, even when the validator exits zero.