The default eval model has changed to DeepSeek v4 Flash.

DeepSeek v4 Flash brings faster evals with strong accuracy. You can still select any model when starting a run. Read more →

monkey-thought-translator

Eval Details

Eval Run Status

Completed

Agent

codex

Model

gpt-5.4

Score

Agent success rate when using this plugin

100%

Improvement

Agent success rate improvement when using this plugin compared to baseline

1.96x

Baseline

Agent success rate without this plugin

51%

Evaluation results

100%

37%

Port the Notion Task Dashboard to ChatGPT

Approval gate on connector-dependent skill

Criteria

Without context

With context

Compatibility report produced

100%

Capability matrix present

100%

Connector classified host-dependent

100%

Credential assumption documented

100%

Risk score high or blocked

50%

100%

Approval gate triggered

100%

Direct approval wording used

100%

No overbroadening

50%

100%

Translation mode named

62%

100%

Host-specific assumptions listed

83%

100%

61%

Translate Our Meeting Summarizer Skill to ChatGPT

Clean port with kebab naming and OpenAI metadata

Criteria

Without context

With context

Name normalized to kebab-case

100%

agents/openai.yaml created

100%

openai.yaml has display name

100%

openai.yaml has icon field

100%

openai.yaml has accent color

100%

Frontmatter description is lower-case

100%

Frontmatter includes trigger conditions

80%

100%

Claude runtime references replaced

100%

skill.zip produced

100%

Round-trip review present

58%

100%