Four-skill presentation system: ingest talks into a rhetoric vault, run interactive clarification, generate a speaker profile, then create new presentations that match your documented patterns. Includes an 88-entry Presentation Patterns taxonomy for scoring, brainstorming, and go-live preparation.
96
93%
Does it follow best practices?
Impact
97%
1.21xAverage score across 30 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent implements the skill's Known Pitfalls guidance: detecting wide-angle recording dedup failure, recommending hash threshold adjustments, handling multi-language transcripts, flagging Whisper hallucination, and detecting non-target speakers in playlist recordings.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Wide-angle detection",
"description": "The tool detects wide-angle room recordings by comparing the ratio of total extracted frames to unique slides — a ratio above 5:1 or 10:1 (or similar high threshold) triggers a warning about dedup failure due to speaker movement",
"max_score": 12
},
{
"name": "Hash threshold recommendation",
"description": "For wide-angle recordings with poor dedup, the tool recommends increasing the hash threshold to 14-18 (or a higher value than the default 8-10), not just reporting the problem",
"max_score": 10
},
{
"name": "Slide region crop suggestion",
"description": "For recordings where speaker movement dominates, the tool suggests manually specifying a slide region crop to isolate the projected screen area from the speaker",
"max_score": 8
},
{
"name": "Multi-language transcript handling",
"description": "The tool handles transcripts in languages other than the expected language — detects the actual language and does NOT flag a valid non-English transcript as an error when the talk was delivered in that language",
"max_score": 10
},
{
"name": "Whisper hallucination detection",
"description": "The tool identifies potential Whisper hallucination by checking for repetitive loops, implausible text patterns, or sections where transcript content doesn't match visible slide text",
"max_score": 10
},
{
"name": "Transcript source tracking",
"description": "The diagnostics output distinguishes between transcript sources — YouTube auto-captions vs Whisper transcription vs manual — using a field like 'transcript_source'",
"max_score": 8
},
{
"name": "Speaker identity verification",
"description": "The tool checks whether the detected speaker matches the expected speaker and flags mismatches — handling the case where playlist recordings include talks by other presenters",
"max_score": 10
},
{
"name": "Recording type classification",
"description": "The tool classifies recordings into at least 3 distinct types (e.g., fullscreen slides, picture-in-picture, wide-angle room) based on extraction metrics, not just pass/fail",
"max_score": 10
},
{
"name": "Structured diagnostics output",
"description": "The output is structured JSON with named fields for each diagnostic dimension (recording_type, dedup_quality, transcript_quality, speaker_match, recommendations) — not just prose",
"max_score": 10
},
{
"name": "Clean recording passes cleanly",
"description": "For a clean fullscreen recording with good dedup ratio and valid transcript, the tool produces no warnings and classifies it as healthy — it doesn't over-flag",
"max_score": 12
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
rules
skills
presentation-creator
references
patterns
build
deliver
prepare
scripts
vault-clarification
vault-ingress
vault-profile