Four-skill presentation system: ingest talks into a rhetoric vault, run interactive clarification, generate a speaker profile, then create new presentations that match your documented patterns. Includes an 88-entry Presentation Patterns taxonomy for scoring, brainstorming, and go-live preparation.
96
93%
Does it follow best practices?
Impact
97%
1.21xAverage score across 30 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent runs all 9 guardrail check categories with correct PASS/FAIL/WARN labeling, sources thresholds from the profile, and correctly identifies the planted violations.",
"type": "weighted_checklist",
"checklist": [
{
"name": "All 9 guardrail items present",
"description": "The report contains at least 9 distinct labeled check items covering: slide budget, Act 1 ratio, branding/conference elements, profanity, data attribution, time-sensitive content, closing completeness, cut lines, and anti-patterns — each as a separate line or block",
"max_score": 15
},
{
"name": "PASS/FAIL/WARN labels used",
"description": "Guardrail check items use the literal labels [PASS], [FAIL], or [WARN] (or equivalent format) — not just descriptive prose without labels",
"max_score": 10
},
{
"name": "WARN for near-limit Act 1",
"description": "The Act 1 ratio check produces [WARN] (not [PASS] and not [FAIL]) because the ratio is within 5% of the configured limit — specifically Act 1 is slides 8-33 (26 slides out of 60 total = 43.3%), which is under the 45% limit but within the 5-percentage-point warn threshold",
"max_score": 15
},
{
"name": "Data attribution flagged",
"description": "The report flags missing data attribution — at least one of: slide 16 (84% stat with no source), slide 17 (71% stat with no source), or slide 21 ('Various reports' is not a real source)",
"max_score": 12
},
{
"name": "Cut lines missing flagged",
"description": "The report flags that no [CUT LINE] markers are present in the outline despite modular_design being enabled in the profile",
"max_score": 10
},
{
"name": "Profile slide budgets used",
"description": "The slide budget check uses the value from the speaker profile's guardrail_sources.slide_budgets (68 max for a 45-minute talk) — not a hardcoded number",
"max_score": 10
},
{
"name": "Recurring issue flagged",
"description": "The anti-patterns section references at least one of the speaker's known recurring_issues from the profile (meme accretion in Act 1's opening sequence or theoretical framing delay)",
"max_score": 10
},
{
"name": "Pattern score projection",
"description": "The report includes a pattern score projection or estimated score — not absent or replaced only by a list",
"max_score": 10
},
{
"name": "Structured summary block",
"description": "The report includes a structured summary block listing all checks together (not scattered across unstructured prose)",
"max_score": 8
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
rules
skills
presentation-creator
references
patterns
build
deliver
prepare
scripts
vault-clarification
vault-ingress
vault-profile