Four-skill presentation system: ingest talks into a rhetoric vault, run interactive clarification, generate a speaker profile, then create new presentations that match your documented patterns. Includes an 88-entry Presentation Patterns taxonomy for scoring, brainstorming, and go-live preparation.
96
93%
Does it follow best practices?
Impact
97%
1.21xAverage score across 30 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent correctly triages illustration fix requests (regenerate vs edit), follows prompt engineering rules for image edits, applies the versioning strategy, flags PIL masking as wrong approach, and flags prompt quality anti-patterns in a guardrail audit — all based on the illustration editing and iteration guidelines.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Regenerate for content additions",
"description": "The triage plan recommends regeneration from the full prompt (not image editing) for slide 19 (adding a third soldier) — because editing strips the style when adding new content to the image",
"max_score": 10
},
{
"name": "Edit for content removal",
"description": "The triage plan recommends image editing (not regeneration) for slide 12 and/or slide 40 (removing labels/borders) — because the model preserves style when removing content",
"max_score": 10
},
{
"name": "Simplified anchor flagged",
"description": "The prompt audit flags Edit 2 (slide 15) as problematic because the prompt uses a shortened version of the style anchor instead of the full version — the full specificity of the anchor is what makes the style work",
"max_score": 10
},
{
"name": "Safety suffix: no new elements",
"description": "Edit commands in the triage plan include 'DO NOT add any new elements' as a safety suffix in edit prompts (for removal/erasure operations)",
"max_score": 8
},
{
"name": "Safety suffix: background continuation",
"description": "Edit commands in the triage plan include 'let background continue naturally' (or similar phrasing about no parchment patch) as a safety suffix when erasing content",
"max_score": 8
},
{
"name": "Explicit preservation instructions",
"description": "Edit prompts in the triage plan include explicit 'keep [X]' instructions for elements that should be preserved — not just what to change but what to leave alone",
"max_score": 10
},
{
"name": "Versioned output naming",
"description": "The triage plan specifies saving edited/fixed images as versioned files (e.g., slide-12-v2.jpg, slide-25-v3.jpg) instead of overwriting the original — preserving previous versions for comparison",
"max_score": 8
},
{
"name": "Fix mode for near-perfect images",
"description": "The triage plan recommends a targeted fix approach (not full regeneration) for slide 25 which is described as 90% correct with only road visibility needing adjustment",
"max_score": 8
},
{
"name": "PIL masking flagged as wrong approach",
"description": "The triage plan identifies that the PIL/programmatic masking used for slide 47 builds is the wrong approach — recommends using the model's native image editing instead of pasting colored rectangles",
"max_score": 10
},
{
"name": "Missing preservation flagged",
"description": "The prompt audit flags Edit 1 (slide 8: 'Remove the tank from the background') as missing explicit preservation instructions — removing content without saying what to keep risks the model removing other elements too",
"max_score": 10
},
{
"name": "Content modification via edit flagged",
"description": "The prompt audit flags Edit 4 and/or Edit 5 as potentially problematic — changing content (hat style, text size) via image editing risks stripping the style, and should be noted as risky or recommended for regeneration instead",
"max_score": 8
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
rules
skills
presentation-creator
references
patterns
build
deliver
prepare
scripts
vault-clarification
vault-ingress
vault-profile