Two-skill presentation system: analyze your speaking style into a rhetoric knowledge vault, then create new presentations that match your documented patterns. Includes an 88-entry Presentation Patterns taxonomy for scoring, brainstorming, and go-live preparation.
95
95%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Build a rhetoric and style knowledge base by analyzing presentation talks. Each run
processes unprocessed talks, extracts rhetoric/style observations, and updates the
running summary. The vault lives at ~/.claude/rhetoric-knowledge-vault/ (may be a
symlink to a custom location). All paths are relative to this vault root.
| File / Reference | Purpose |
|---|---|
tracking-database.json | Source of truth — talks, status, config, confirmed intents |
rhetoric-style-summary.md | Running rhetoric & style narrative (the constitution) |
slide-design-spec.md | Visual design rules from PDF + PPTX analysis |
speaker-profile.json | Machine-readable bridge to presentation-creator |
sessions-catalog.md | Submission-ready titles, abstracts, and outlines for active talks |
analyses/{talk_filename}.md | Per-talk rhetoric analysis (one file per processed talk) |
transcripts/{youtube_id}.txt | Downloaded/cleaned transcripts |
slides/{id}.pdf | Slide PDFs (from Google Drive, PPTX export, or video extraction) |
references/schemas.md | DB + subagent schemas; full config field list |
references/rhetoric-dimensions.md | 14 analysis dimensions |
references/pptx-extraction.md | Visual extraction script |
references/speaker-profile-schema.md | Profile JSON schema |
references/download-commands.md | yt-dlp + gdown commands |
references/video-slide-extraction.md | Extract slides from video when no PDF/PPTX exists |
A talk is processable when it has video_url. Slide sources, in order of preference:
pptx_path → richest data (exact colors, fonts, shapes via python-pptx)slides_url → download PDF from Google Drivevideo_url → extract slides from the video using ffmpeg + perceptual dedupprocessed_partial)The slide_source field tracks which path: "pptx", "pdf", "both",
"video_extracted", or "none". The pptx_catalog array fuzzy-matches .pptx
files to shownotes entries.
Vault discovery — the canonical vault path is always ~/.claude/rhetoric-knowledge-vault/.
On every run, check this path first:
vault_root, read tracking-database.json.~/.claude/rhetoric-knowledge-vault/ by default."
b. Ask via AskUserQuestion: "Want a different location? (e.g., Google Drive for backup)
Enter a custom path, or press Enter / say 'default' to use the default."
c. Default chosen: mkdir -p ~/.claude/rhetoric-knowledge-vault
d. Custom path chosen: mkdir -p {custom_path} then
ln -s {custom_path} ~/.claude/rhetoric-knowledge-vault — the symlink makes the
canonical path always work. Store vault_storage_path as the custom path in config
(for display/debugging).
e. Create empty tracking-database.json with empty config, talks, pptx_catalog.Set vault_root to ~/.claude/rhetoric-knowledge-vault in config (always the canonical path).
Config bootstrapping — ask once per missing field and persist to the tracking database.
Remaining core fields: talks_source_dir, pptx_source_dir, python_path
(auto-detect: {vault_root}/.venv/bin/python3, then python3 on PATH),
template_skip_patterns (default: ["template"]).
See references/schemas.md for the full config field list (including speaker infrastructure fields).
Scan for new talks: Glob *.md in talks_source_dir. For each file not in the
talks array, parse and add (extract title, conference, date, URLs, IDs, status "pending").
Scan for .pptx files: Glob **/*.pptx in pptx_source_dir (skip *static*,
conflict copies, template matches). Fuzzy-match to talks[] entries. Report counts.
Pattern taxonomy migration: If the pattern taxonomy exists
(skills/presentation-creator/references/patterns/_index.md) but any talks with
status "processed" or "processed_partial" have no pattern_observations (or
pattern_observations.pattern_ids is empty), mark them "needs-reprocessing" with
reprocess_reason: "pattern_scoring_added". Report: "N talks need reprocessing for
pattern scoring."
Read rhetoric-style-summary.md and slide-design-spec.md. Report state:
"X processed, Y remaining. PPTX: A cataloged, B matched, C extracted."
pending or needs-reprocessing.slide_source per the hierarchy above. Mark "skipped_no_sources" only if
missing video_url entirely.$ARGUMENTS specifies a talk filename or title, process ONLY that one.Per batch: launch 5 subagents in parallel, wait, run Step 4, then next batch.
Each subagent receives the talk's DB entry and current rhetoric-style-summary.md.
A. Download transcript and acquire slides:
YouTube talks (default — try ALL likely languages, not just English):
yt-dlp --write-auto-sub --sub-lang "en,ru,he,fr,de,es,ja" --skip-download --sub-format vtt \
-o "{vault_root}/transcripts/{youtube_id}" "https://www.youtube.com/watch?v={youtube_id}"Non-YouTube talks (InfoQ, conference platforms, etc.): yt-dlp supports many
sites beyond YouTube. When video_url is not a YouTube link:
yt-dlp -f http_audio to download audio (MP3/M4A)import mlx_whisper
result = mlx_whisper.transcribe(audio_path,
path_or_hf_repo='mlx-community/whisper-large-v3-turbo',
language='en') # or 'ru', etc.{vault_root}/transcripts/{talk_id}.txttranscript_source: "whisper" on the talk entry (vs "youtube_auto" for YouTube)This enables ingestion from InfoQ, Vimeo, conference-hosted video, or any source
yt-dlp supports. Falls back to processed_partial (slides only) if audio extraction fails.
Slide acquisition per slide_source (see hierarchy above):
pptx/both: Run references/pptx-extraction.md script. Store in structured_data.pptx_visual.pdf: Download via gdown or use locally provided PDF.video_extracted: Run references/video-slide-extraction.md pipeline (download
720p video → ffmpeg frames → auto-detect slide region → perceptual dedup → PDF).
Delete video after extraction; keep only the PDF. Analyze like any other slide PDF.none: Transcript-only analysis, processed_partial.B. Analyze for Rhetoric & Style (NOT content). Apply all 14 dimensions (including dimension 14: Areas for Improvement).
Language policy — the vault is English-only. All analysis output, rhetoric summary updates, tracking DB entries, and profile data MUST be written in English regardless of the talk's delivery language. For non-English talks:
"English text" (оригинальный текст).
Example: "That's the whole point" (В этом весь смысл) — NOT
"В этом весь смысл" (That's the whole point)[ru] "получается что") — do NOT merge into the main English signature listdelivery_language in the tracking DBB2. Tag Presentation Patterns. Scan observations against the pattern taxonomy
index at skills/presentation-creator/references/patterns/_index.md (path relative
to tile root). Skip patterns marked observable: false — these are pre-event logistics
and physical stage behaviors that cannot be detected from transcripts or slides. For each
observable pattern/antipattern, determine if the talk exhibits it (strong/moderate/weak
confidence), record evidence, and compute per-talk pattern score:
count(patterns) - count(antipatterns). Return in the pattern_observations field.
C. Return JSON per the subagent return schema (see references/schemas.md).
Process PPTX files not already extracted during Step 3: unmatched catalog entries,
talks that used PDF as primary source but have a PPTX available, or any entry with
pptx_visual_status: "pending". Skip if already "extracted".
After extraction: Store in structured_data.pptx_visual, set status to
"extracted". After 3+ extractions, fill slide-design-spec.md; after 5+, analyze
cross-talk patterns (colors, fonts, footers).
After each batch:
status, processed_date, all result fields.
Backfill empty structured_data from earlier runs using rhetoric_notes.
Persist pattern_observations IDs + score to each talk entry.
Structured field extraction: When the analysis identifies co-presenters,
delivery language, or other structured metadata, populate the corresponding
DB fields (co_presenter, delivery_language, etc.) — do NOT leave
structured data buried only in rhetoric_notes free text.{vault_root}/analyses/{talk_filename}.md containing the full
rhetoric analysis (all 14 dimensions, structured data, verbatim examples, and
a "Presentation Patterns Scoring" section listing detected patterns/antipatterns
with confidence levels, evidence, and the per-talk pattern score).
These files are read by the presentation-creator when adapting existing talks.
Create the analyses/ directory if it doesn't exist.new_patterns and summary_updates.
Be additive; never delete. Sections 1-14 map to the 14 rhetoric dimensions; Section 15
aggregates areas for improvement; Section 16 captures speaker-confirmed intent.
Recount status from DB every time. The summary's ## Status block must be
rewritten by counting the tracking DB — never increment manually, never trust the
existing status line. Count: total talks, processed, skipped (by reason), languages,
co-presenters. The DB is the source of truth; the summary is a derived view.| Transcript | Slides (PPTX/PDF) | Video | Status | Action |
|---|---|---|---|---|
| OK | OK | — | processed | Full analysis (best quality) |
| OK | FAIL | OK | processed | Extract slides from video, then full analysis |
| OK | FAIL | FAIL | processed_partial | Transcript only (no visual analysis) |
| FAIL | OK | — | processed_partial | Slides only |
| FAIL | FAIL | OK | processed_partial | Extract slides from video, visual only |
| FAIL | FAIL | FAIL | skipped_download_failed | Skip, move on |
After all batches complete. Purpose: resolve ambiguities, validate findings, capture intent.
5A. Rhetoric Clarification: For each surprising, contradictory, or ambiguous observation from this run, ask one topic at a time via AskUserQuestion:
Update the summary and tracking DB after each answer.
5A-bis. Blind Spot Moments: The skill can only analyze transcripts (speech) and slides (visuals). It CANNOT observe audience reactions, physical performance, stage movement, costume/prop moments, room energy, or laughter/applause. During analysis, flag moments where the transcript or slides suggest something happened that the skill cannot measure — then ask the speaker about each one. Examples:
These blind spots are inherent to transcript+slides analysis. Asking about them captures
data that no amount of parsing can recover. Store responses as blind_spot_observations
in the talk's tracking DB entry and integrate into the rhetoric summary.
5A-ter. Humor Post-Mortem: The skill can identify jokes from transcripts and slides but CANNOT hear laughter. For every talk processed in this run, compile the humor beats detected in dimension 3 and walk through them with the speaker:
humor_grade: "hit"|"nod"|"flat"|"spontaneous_hit". Over time this builds a
corpus-wide humor effectiveness map — which joke TYPES land (self-deprecating,
industry snark, meme-as-punchline, callback) and which fall flat.This is particularly important for recent talks where memory is fresh. For older talks (2+ years), compress to: "Any jokes you remember landing particularly well or badly?"
Store results in humor_postmortem on the talk's DB entry and update the rhetoric
summary Section 3 (Humor & Wit) with confirmed effectiveness data.
5B. Speaker Infrastructure (first session only): Ask for any empty config fields
(speaker_name through publishing_process.*). See references/schemas.md for the full field list.
5C. Structured Intent Capture: Compile confirmed intents from 5A into structured entries and store in confirmed_intents array in the tracking DB:
{"pattern": "delayed_self_introduction", "intent": "deliberate",
"rule": "Two-phase intro: brief bio slide 3, full re-intro mid-talk",
"note": "Confirmed intentional rhetorical device"}5D. Mark session complete: Increment config.clarification_sessions_completed in
the tracking DB. This counter gates profile generation (Step 6).
When: 10+ talks parsed AND config.clarification_sessions_completed >= 1. Also on explicit request.
Process:
rhetoric-style-summary.md, slide-design-spec.md, and confirmed_intents.structured_data from processed talks (skip empty entries, fall back to prose).template_pptx_path is set, extract layouts via python-pptx:
from pptx import Presentation
prs = Presentation(template_path)
for i, layout in enumerate(prs.slide_layouts):
print(f"{i}: {layout.name} — {[p.placeholder_format.type for p in layout.placeholders]}")speaker-profile.json per references/speaker-profile-schema.md. Map config →
speaker/infrastructure, summary sections → instrument_catalog/presentation_modes,
confirmed intents → rhetoric_defaults, aggregated data → pacing/guardrail_sources,
pattern observations → pattern_profile.{vault_root}/speaker-profile.json.pattern_profile data. Personalize to THIS speaker.Auto-trigger: Step 4 calls this after every vault update (if profile exists).
transcripts/, slides/, analyses/ dirs if missing.Wide-angle room recordings defeat perceptual hash dedup. When the camera captures the full stage (speaker moving + slides projected on a screen behind them), every frame looks different because speaker position changes. The pipeline produces 800-1500 "unique" frames instead of 40-80 actual slides. Mitigation options:
hash_threshold to 14-16 (loose dedup tolerates speaker movement)slide_region crop coordinates to isolate the projected screenThe pipeline works best for recordings that show slides fullscreen (Devoxx, JFokus, most modern conference recordings). Wide-angle audience-camera recordings from meetups and DevOpsDays are the worst case.
Whisper hallucination on bad audio. When conference recordings have poor audio (distant mics, room echo, music tags), Whisper large-v3-turbo recovers ~60% of speech but hallucinates through silent/noisy sections — generating plausible-sounding but fabricated text. Always:
transcript_source: "whisper" so the analysis knows the sourcetranscript_quality: "partial")Non-speaker talks slip into playlists. Conference playlists include ALL speakers,
not just the vault's target speaker. The subagent should verify speaker identity early
in analysis — check video frames for the expected speaker, check transcript for
self-identification. Flag is_baruch_talk: false and set status to skipped if the
speaker doesn't match.
Step 5 timing matters. Run the full clarification session (especially the humor post-mortem and blind spot moments) IMMEDIATELY for talks delivered within the past week. Memory is freshest right after delivery — room energy, audience reactions, and spontaneous moments fade fast. For older talks (2+ years), use the compressed version: "Any jokes you remember landing well or badly? Anything about the room context?"
skills
presentation-creator
references
patterns
build
deliver
prepare
rhetoric-knowledge-vault