Four-skill presentation system: ingest talks into a rhetoric vault, run interactive clarification, generate a speaker profile, then create new presentations that match your documented patterns. Includes an 88-entry Presentation Patterns taxonomy for scoring, brainstorming, and go-live preparation.
96
93%
Does it follow best practices?
Impact
97%
1.21xAverage score across 30 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests the vault-clarification skill's interactive session: rhetoric clarification with one-at-a-time questioning, humor post-mortem with per-beat grading, blind spot probing grounded in analysis observations, infrastructure config capture for first session, and proper DB updates.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Rhetoric clarification — one question at a time",
"description": "The agent asks about the delayed_self_introduction pattern as a focused single question — not bundled with other topics. The question references the specific observation (content before intro, slide 4) rather than being generic.",
"max_score": 10
},
{
"name": "Intent confirmation stored",
"description": "The updated tracking-database.json contains a confirmed_intents entry for the delayed intro pattern with fields: pattern, intent (deliberate/accidental/context_dependent), rule, and note.",
"max_score": 10
},
{
"name": "Humor grading per beat",
"description": "Each of the 4 identified humor beats receives a grade from the set {hit, nod, flat, spontaneous_hit}. Grades are stored in a humor_postmortem structure on the talk entry.",
"max_score": 12
},
{
"name": "Spontaneous humor probe",
"description": "The agent specifically asks about the 8-second transcript gap after slide 15 — referencing the gap evidence rather than asking generically. Also asks about any humor NOT in the transcript.",
"max_score": 10
},
{
"name": "Promote to planned recommendation",
"description": "For any spontaneous humor that landed well, the output includes a recommendation about promoting it to a planned beat in future deliveries.",
"max_score": 8
},
{
"name": "Blind spot — demo engagement",
"description": "The agent asks about audience engagement during the live coding demo (slides 14-16), referencing the specific observation that transcript has minimal dialogue.",
"max_score": 8
},
{
"name": "Blind spot — theatrical opening",
"description": "The agent asks about the 'Judgment Day' theatrical element at slides 1-2, probing for stage effects, costume elements, or dramatic entrance not visible in transcript.",
"max_score": 8
},
{
"name": "Infrastructure config captured",
"description": "The updated tracking-database.json has non-empty values for speaker_name, speaker_handle, shownotes_url_pattern, and at least one publishing_process field (export_format, qr_code, or shortener).",
"max_score": 10
},
{
"name": "Session marked complete",
"description": "config.clarification_sessions_completed is incremented from 0 to 1 in the updated tracking database.",
"max_score": 12
},
{
"name": "Rhetoric summary updated",
"description": "The updated rhetoric-style-summary.md incorporates findings from the clarification session — at minimum, the confirmed intent for delayed intro and humor effectiveness data.",
"max_score": 12
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
rules
skills
presentation-creator
references
patterns
build
deliver
prepare
scripts
vault-clarification
vault-ingress
vault-profile