CtrlK
BlogDocsLog inGet started
Tessl Logo

The default eval model has changed to DeepSeek v4 Flash.

DeepSeek v4 Flash brings faster evals with strong accuracy. You can still select any model when starting a run. Read more →

monkey-thought-translator

Eval Details

Eval Run Status

Completed

Agent

codex

Model

gpt-5.4

Score

Agent success rate when using this plugin

100%

Improvement

Agent success rate improvement when using this plugin compared to baseline

1.96x

Baseline

Agent success rate without this plugin

51%

Evaluation results

100%

37%

Port the Notion Task Dashboard to ChatGPT

Approval gate on connector-dependent skill

Criteria
Without context
With context

Compatibility report produced

100%

100%

Capability matrix present

100%

100%

Connector classified host-dependent

100%

100%

Credential assumption documented

100%

100%

Risk score high or blocked

50%

100%

Approval gate triggered

0%

100%

Direct approval wording used

0%

100%

No overbroadening

50%

100%

Translation mode named

62%

100%

Host-specific assumptions listed

83%

100%

100%

61%

Translate Our Meeting Summarizer Skill to ChatGPT

Clean port with kebab naming and OpenAI metadata

Criteria
Without context
With context

Name normalized to kebab-case

0%

100%

agents/openai.yaml created

0%

100%

openai.yaml has display name

0%

100%

openai.yaml has icon field

0%

100%

openai.yaml has accent color

0%

100%

Frontmatter description is lower-case

0%

100%

Frontmatter includes trigger conditions

80%

100%

Claude runtime references replaced

100%

100%

skill.zip produced

100%

100%

Round-trip review present

58%

100%