Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines
61
61%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Loading evals