Four-skill presentation system: ingest talks into a rhetoric vault, run interactive clarification, generate a speaker profile, then create new presentations that match your documented patterns. Includes an 88-entry Presentation Patterns taxonomy for scoring, brainstorming, and go-live preparation.
96
93%
Does it follow best practices?
Impact
97%
1.21xAverage score across 30 eval scenarios
Advisory
Suggest reviewing before use
Six conference talk recordings have been processed through the video-slide-extraction pipeline. The extraction results vary widely — some are clean, others show signs of wide-angle recording dedup failure, missing transcripts, wrong languages, wrong speakers, or Whisper hallucination.
Analyze the extraction results and produce a structured diagnostics report for each case.
Download the fixed extraction results:
curl -sLO https://github.com/jbaruch/speaker-toolkit/raw/main/eval-resources/scenario-13/extraction_results.jsonAnalyze extraction_results.json (6 recordings) and produce diagnostics_report.json containing a per-recording diagnostic entry. Each entry must include:
Recording type classification — classify each recording into one of at least 3 categories (e.g., fullscreen slides, picture-in-picture, wide-angle room) based on extraction metrics (frame-to-unique ratio, slide region detection)
Dedup quality assessment — for case_wide_angle (1200 frames → 900 "unique"), detect that the 1.33:1 ratio indicates dedup failure from speaker movement, and recommend:
Transcript quality assessment for each case:
case_no_transcript: flag as missing, recommend Whisper fallbackcase_wrong_language: detect Russian transcript when English was expected — but do NOT flag as an error if the talk was actually delivered in Russian (the speaker is bilingual)case_whisper_hallucination: detect the 0.45 repetition ratio and the repeated "Thank you for watching" loop as hallucination artifactsSpeaker identity verification — for case_wrong_speaker, flag that detected speaker "James Gosling" doesn't match expected "Baruch Sadogursky" (possible wrong recording from a playlist)
Clean pass-through — for case_clean (50 frames → 45 unique, valid transcript, correct speaker), produce NO warnings. Classify as healthy.
Produce diagnostics_report.json with this structure per case:
recording_type: classification stringdedup_quality: assessment with optional recommended_thresholdtranscript_quality: assessment with source, status, and optional warningsspeaker_match: boolean + detailsrecommendations: list of actionable strings (empty for clean recordings)Also produce recommendations_log.txt — a human-readable summary of all flagged issues and recommendations, suitable for a production operator to review.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
rules
skills
presentation-creator
references
patterns
build
deliver
prepare
scripts
vault-clarification
vault-ingress
vault-profile