The default eval model has changed to DeepSeek v4 Flash.
DeepSeek v4 Flash brings faster evals with strong accuracy. You can still select any model when starting a run. Read more →
Eval Run Status
Agent
codex
Model
gpt-5.4
Score
Agent success rate when using this plugin
100%
Improvement
Agent success rate improvement when using this plugin compared to baseline
1.96x
Baseline
Agent success rate without this plugin
51%
Approval gate on connector-dependent skill
Compatibility report produced
100%
100%
Capability matrix present
100%
100%
Connector classified host-dependent
100%
100%
Credential assumption documented
100%
100%
Risk score high or blocked
50%
100%
Approval gate triggered
0%
100%
Direct approval wording used
0%
100%
No overbroadening
50%
100%
Translation mode named
62%
100%
Host-specific assumptions listed
83%
100%
Clean port with kebab naming and OpenAI metadata
Name normalized to kebab-case
0%
100%
agents/openai.yaml created
0%
100%
openai.yaml has display name
0%
100%
openai.yaml has icon field
0%
100%
openai.yaml has accent color
0%
100%
Frontmatter description is lower-case
0%
100%
Frontmatter includes trigger conditions
80%
100%
Claude runtime references replaced
100%
100%
skill.zip produced
100%
100%
Round-trip review present
58%
100%