Five-skill presentation system: ingest talks into a rhetoric vault, run interactive clarification, generate a speaker profile, create presentations that match your documented patterns, and produce the deck illustrations + thumbnail visual layer. Includes a 102-entry Presentation Patterns taxonomy (91 observable, 11 unobservable go-live items) for scoring, brainstorming, and go-live preparation.
93
94%
Does it follow best practices?
Impact
93%
1.43xAverage score across 21 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent runs the two Phase 4 checkers (check-rhetorical.py + guardrail-check.py) against outline.yaml, surfaces their structured output, and adds judgment-based commentary the deterministic scripts can't make (e.g., vague image prompts, illustration coverage). The fixture is a valid outline carrying deliberate quality issues the audit should catch.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Both checker scripts referenced",
"description": "The report explicitly invokes (or shows output from) both `check-rhetorical.py outline.yaml` and `guardrail-check.py outline.yaml speaker-profile.json`. Agent does not write its own audit logic from scratch when scripts exist",
"max_score": 15
},
{
"name": "Data attribution FAIL surfaced",
"description": "The guardrail-check.py data-attribution check flags slide 4 (87% statistic with no source). The agent's report includes this finding",
"max_score": 12
},
{
"name": "Closing FAIL surfaced",
"description": "The guardrail-check.py closing check flags missing CTA in the closing chapter — slides 9 and 10 don't contain explicit action-item language. The agent's report surfaces this",
"max_score": 12
},
{
"name": "Cut-lines FAIL surfaced",
"description": "The guardrail-check.py cut-lines check flags zero cuttable chapters or slides — the talk can't compress for shorter slots. The agent's report surfaces this finding",
"max_score": 10
},
{
"name": "Rushed-closing recurring issue noted",
"description": "The agent's judgment commentary flags the 2-minute closing chapter as below the speaker profile's `rushed_closing` recurring-issue threshold (≥3 min), and notes the 'wrap up fast' tag in argument_beats as a corroborating signal",
"max_score": 8
},
{
"name": "Vague image prompt flagged (slide 5)",
"description": "The agent's judgment commentary flags slide 5's image_prompt ('Components flying apart.') as a one-liner that doesn't reference the style anchor and lacks the specificity of the other prompts. This is judgment territory — not script-detected",
"max_score": 10
},
{
"name": "Schema validation confirmed up front",
"description": "Before running the audit checks, the agent confirms `outline_schema.py outline.yaml` exits 0 — the YAML loads cleanly. This is the gate before any further analysis",
"max_score": 8
},
{
"name": "Big-idea singleton confirmed",
"description": "The check-rhetorical.py output (and/or the agent's summary) confirms exactly one slide carries big_idea: true (slide 2)",
"max_score": 5
},
{
"name": "Thesis preview/payoff ordering confirmed",
"description": "The check-rhetorical.py output confirms slide 2 (preview) precedes slide 9 (payoff). The agent surfaces this PASS",
"max_score": 5
},
{
"name": "Illustration coverage commentary",
"description": "The agent notes that this is an illustration-strategy talk (style_anchor present) and surveys image_prompt coverage on non-EXCEPTION slides — flagging any FULL/IMG+TXT slide missing the [STYLE ANCHOR] token or missing an image_prompt entirely",
"max_score": 7
},
{
"name": "Report distinguishes FAIL vs FLAG vs INFO",
"description": "The report preserves the status labels from the scripts — FAIL/PASS/WARN from guardrail-check.py, PASS/FLAG/N/A/INFO from check-rhetorical.py. Labels are not paraphrased or homogenized",
"max_score": 8
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
rules
skills
illustrations
presentation-creator
references
patterns
build
deliver
prepare
scripts
vault-clarification
vault-ingress
vault-profile