Four-skill presentation system: ingest talks into a rhetoric vault, run interactive clarification, generate a speaker profile, then create new presentations that match your documented patterns. Includes a 102-entry Presentation Patterns taxonomy (91 observable, 11 unobservable go-live items) for scoring, brainstorming, and go-live preparation.
97
94%
Does it follow best practices?
Impact
98%
1.24xAverage score across 30 eval scenarios
Advisory
Suggest reviewing before use
Process the steps below in order; each step's output (tracking DB state, batch results, per-talk artifacts) feeds the next. Do not skip ahead.
Build a rhetoric and style knowledge base by analyzing presentation talks. Each run
processes unprocessed talks, extracts rhetoric/style observations, and updates the
running summary. The vault lives at ~/.claude/rhetoric-knowledge-vault/ (may be a
symlink to a custom location). All paths are relative to this vault root.
| File / Reference | Purpose |
|---|---|
tracking-database.json | Source of truth — talks, status, config, confirmed intents |
rhetoric-style-summary.md | Running rhetoric & style narrative |
slide-design-spec.md | Visual design rules from PDF + PPTX analysis |
speaker-profile.json | Machine-readable bridge to presentation-creator |
analyses/{talk_filename}.md | Per-talk rhetoric analysis (one file per processed talk) |
transcripts/{youtube_id}.txt | Downloaded/cleaned transcripts |
slides/{id}.pdf | Slide PDFs (from Google Drive, PPTX export, or video extraction) |
| references/schemas-db.md | DB + subagent schemas; extraction output schemas |
| references/rhetoric-dimensions.md | 14 analysis dimensions |
| references/video-slide-extraction.md | Video-to-slides pipeline — layout heuristics, tuning, limitations |
| references/processing-rules.md | Language policy, pattern migration logic, structured field rules |
scripts/pptx-extraction.py | Extract visual design data from .pptx files |
scripts/video-slide-extraction.py | Extract slides from video via ffmpeg + perceptual dedup |
scripts/batch-download-videos.sh | Parallel video download for batch processing |
scripts/vtt-cleanup.py | Clean VTT subtitles into plain transcript text |
A talk is processable when it has video_url. Slide sources, in order of preference:
pptx_path — richest data (exact colors, fonts, shapes via python-pptx)slides_url — download PDF from Google Drivevideo_url — extract slides from the video using ffmpeg + perceptual dedupprocessed_partial)The slide_source field tracks which path: "pptx", "pdf", "both",
"video_extracted", or "none". The pptx_catalog array fuzzy-matches .pptx
files to shownotes entries.
Vault discovery — canonical path is always ~/.claude/rhetoric-knowledge-vault/.
vault_root, read tracking-database.json.AskUserQuestion,
create directory (and symlink if custom path chosen), initialize empty
tracking-database.json with empty config, talks, pptx_catalog.Config bootstrapping — ask once per missing field and persist to the tracking
database. Core fields: shownotes (enabled, source.type, source.path_or_url,
source.talks_subdir, url.base, url.template, thumbnail_path_template,
slug_convention), pptx_source_dir, python_path, template_skip_patterns.
See references/schemas-db.md for the full schema
and ../vault-profile/references/schemas-config.md
for field-by-field semantics and migration notes.
Scan for new talks: Build the talks directory path as
{shownotes.source.path_or_url}/{shownotes.source.talks_subdir}; glob *.md
there; parse and add any file not yet in talks[] (title, conference, date,
URLs, status "pending"). For remote_url or none source types, skip the
scan — the vault ingests only the talks the speaker has already registered
elsewhere. Extract
video_url, slides_url from frontmatter/links. Parse IDs from URLs:
youtube_id: extract the v= parameter from YouTube URLs
(e.g., https://www.youtube.com/watch?v=aBcDeFg → youtube_id: "aBcDeFg")google_drive_id: extract the file ID from Google Drive URLs
(e.g., https://drive.google.com/file/d/1AbCdEfGhIjK/view → google_drive_id: "1AbCdEfGhIjK")Default status is always "pending" for new entries.
Scan for .pptx files: Recursively glob **/*.pptx in pptx_source_dir; fuzzy-match
to talks[] entries. Report counts. See references/schemas-db.md
for the PPTX extraction output schema (per-slide visual data, shape types, global design stats).
Run scripts/pptx-extraction.py for extraction.
Pattern taxonomy migration: See references/processing-rules.md for migration
logic. In brief: talks with status "processed" or "processed_partial" that
lack pattern_observations are marked "needs-reprocessing".
Read rhetoric-style-summary.md and slide-design-spec.md. Report:
"X processed, Y remaining. PPTX: A cataloged, B matched, C extracted."
pending or needs-reprocessing.slide_source per the hierarchy above. Mark "skipped_no_sources" only if
video_url is entirely absent — a talk with no video_url is not processable
regardless of whether slides exist.$ARGUMENTS specifies a talk filename or title, process ONLY that one.Per batch: launch 5 subagents in parallel, wait, run Step 4 (Persist Subagent Results), then run Step 5 (Update Rhetoric Summary), then move to the next batch. When all batches have finished, proceed to Step 6.
Each subagent receives the talk's DB entry and current rhetoric-style-summary.md.
A. Download transcript and acquire slides.
Transcript download:
yt-dlp --write-auto-sub --sub-lang "en,ru,he,fr,de,es,ja" --skip-download --sub-format vtt \
-o "{vault_root}/transcripts/{youtube_id}" "https://www.youtube.com/watch?v={youtube_id}"{youtube_id}.ru.vtt). Record
the detected language as delivery_language. After download, clean the VTT:
python3 scripts/vtt-cleanup.py "{vault_root}/transcripts/{youtube_id}.{lang}.vtt"transcripts/{youtube_id}.txt."{python_path}" -c "
from youtube_transcript_api import YouTubeTranscriptApi
transcript = YouTubeTranscriptApi.get_transcript('{youtube_id}', languages=['en','ru','he','fr','de'])
for entry in transcript:
print(entry['text'])
" > "{vault_root}/transcripts/{youtube_id}.txt"ffmpeg -i "{vault_root}/slides-rebuild/{youtube_id}/{youtube_id}.mp4" \
-vn -acodec libmp3lame "{vault_root}/slides-rebuild/{youtube_id}/{youtube_id}.mp3"mlx_whisper.transcribe()) or OpenAI Whisper.
Set transcript_source: "whisper".processed_partial
if audio fails.Slide acquisition per slide_source:
pptx/both: run scripts/pptx-extraction.py <path.pptx>.pdf: download via gdown:
"{python_path}" -m gdown "https://drive.google.com/uc?id={google_drive_id}" \
-O "{vault_root}/slides/{google_drive_id}.pdf"video_extracted: download video at 720p then run scripts/video-slide-extraction.py:
yt-dlp -f "bestvideo[height<=720][ext=mp4]+bestaudio[ext=m4a]/best[height<=720][ext=mp4]/best[height<=720]" \
--merge-output-format mp4 \
-o "{vault_root}/slides-rebuild/{youtube_id}/{youtube_id}.mp4" \
"https://www.youtube.com/watch?v={youtube_id}"
python3 scripts/video-slide-extraction.py \
"{vault_root}/slides-rebuild/{youtube_id}/{youtube_id}.mp4" \
"{vault_root}/slides-rebuild/{youtube_id}" "{youtube_id}"slides/{youtube_id}.pdf. Delete video after extraction.
For batch downloads, use scripts/batch-download-videos.sh <vault_root> ID1 ID2 ....none: transcript-only, processed_partial.video_url exists, fall back to video
extraction. A talk can still reach "processed" status this way.Set transcript_source on the talk entry: youtube_auto (yt-dlp captions),
whisper (local transcription), or manual. This field is required — downstream
tools use it to gauge transcript reliability.
B. Analyze for Rhetoric & Style (NOT content). Apply all 14 dimensions from
references/rhetoric-dimensions.md (including dimension 14: Areas for Improvement).
Follow language policy and verbatim-quote rules in references/processing-rules.md.
Key rule: all verbatim quotes must be English-first — "English translation" (original text). Never quote non-English text without an English translation preceding it.
B2. Tag Presentation Patterns. Scan observations against the pattern taxonomy
at skills/presentation-creator/references/patterns/_index.md. Skip patterns
marked observable: false. Record confidence (strong/moderate/weak) and evidence per
pattern. Compute per-talk score: count(patterns) − count(antipatterns). Store in
pattern_observations. See references/processing-rules.md for full tagging rules.
C. Return JSON per the subagent return schema in references/schemas-db.md. Minimal structure:
{
"talk_id": "...",
"status": "processed",
"transcript_source": "youtube",
"slide_source": "pdf",
"pattern_observations": [
{"pattern_id": "...", "confidence": "strong", "evidence": "..."}
],
"new_patterns": ["..."],
"summary_updates": [{"section": 1, "content": "..."}],
"structured_data": {"delivery_language": "en", "co_presenter": false}
}Runs after each batch inside Step 3's loop (not as a separate post-loop phase). Mechanical persistence of the batch's subagent JSON returns:
status, processed_date, all result fields.
Persist pattern_observations IDs + score. Populate structured fields
(co_presenter, delivery_language, etc.) — do not leave structured data
buried in free-text prose. See
references/processing-rules.md for field
extraction rules.{vault_root}/analyses/{talk_filename}.md for each processed talk: all 14
dimensions, structured data, verbatim examples, and a "Presentation Patterns
Scoring" section. Create analyses/ directory if missing.Proceed immediately to Step 5.
Still per-batch (continues Step 3's loop). The summary update is a separate
step from Step 4's persistence because it requires a speaker-review gate —
unlike DB writes, edits to rhetoric-style-summary.md change the speaker's
ground-truth narrative and must not be applied silently.
summary_updates
and new_patterns as a section-by-section diff and wait for explicit
speaker confirmation. Silent application erodes the speaker's sense of
ownership of their own style summary; pattern-taxonomy additions in
particular drift if applied unreviewed. Only bypass the gate if the
speaker pre-authorized this batch ("just apply everything, don't ask").new_patterns and
summary_updates into rhetoric-style-summary.md. Sections 1–14 map to
the 14 dimensions; Section 15 aggregates improvement areas; Section 16
captures speaker-confirmed intent. Recount status from the DB every
time — never increment manually.When Step 3's batch loop finishes, proceed to Step 6.
Runs once after all Step 3 batches have completed.
Process PPTX files not yet extracted during Step 3: unmatched catalog entries, talks
that used PDF as primary but have a PPTX available, or entries with
pptx_visual_status: "pending". Skip if already "extracted".
Run scripts/pptx-extraction.py <path.pptx> for each file.
PPTX matching rules: The .pptx files are in Conference/Year/TalkName.pptx and
shownotes entries have conference and title fields. Fuzzy-match by: normalize
conference names (strip year, "Days", "Conference"), match by date proximity and title
substring. Skip files with "static" in name, conflict copies matching (N).pptx, and
files matching config.template_skip_patterns. Some talks have multiple .pptx files
(one per delivery) — match to the closest date.
After 3+ extractions, populate slide-design-spec.md; after 5+, analyze cross-talk
patterns (colors, fonts, footers).
Proceed immediately to Step 7.
If {vault_root}/speaker-profile.json exists, invoke Skill(skill: "vault-profile")
with the updated tracking database. Report the diff of changes (added fields,
changed values) so the speaker can verify.
If the profile doesn't exist, skip this step silently.
Proceed immediately to Step 8.
If no talks were newly processed in this run, finish here without further action.
Otherwise, scan the newly-processed talks for delivery date. For any talk whose
date is within the past 7 days, explicitly recommend running
Skill(skill: "vault-clarification") NOW — memory of the delivery is freshest
right after the talk, and verbal beats that didn't appear in auto-captions
(bilingual jokes rendered in a non-primary language, improvised asides, fly-bys
that weren't in the deck) need speaker confirmation while they're still
recoverable.
Surface these as candidate clarification topics in the recommendation:
areas_for_improvement entry.pattern_observations the subagent flagged as unverifiable from
transcript alone (low confidence, heavy reliance on visual cues, non-English
dialogue without captions).For older talks (30+ days), recommend the compressed clarification session instead of the full one — memory has decayed and detailed recall is unreliable. For talks in the 7–30 day window, recommend the full session but note that some verbatim details may be lost.
| Transcript | Slides (PPTX/PDF) | Video | Status | Action |
|---|---|---|---|---|
| OK | OK | — | processed | Full analysis |
| OK | FAIL | OK | processed | Extract slides from video, then full analysis |
| OK | FAIL | FAIL | processed_partial | Transcript only (no visual analysis) |
| FAIL | OK | — | processed_partial | Slides only |
| FAIL | FAIL | OK | processed_partial | Extract slides from video, visual only |
| FAIL | FAIL | FAIL | skipped_download_failed | Skip, move on |
transcripts/, slides/, analyses/ dirs if missing.--threshold to 14-16, manually specifying
slide_region crop coordinates, or accepting the bloated PDF and having the analysis
subagent sample frames at intervals. Best results with fullscreen slide recordings
(Devoxx, JFokus); worst with meetup/DevOpsDays audience-camera recordings.transcript_source: "whisper", cross-reference against visible slide text, and note
quality issues (e.g., transcript_quality: "partial").is_baruch_talk: false
and set status to skipped if the speaker doesn't match.evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
rules
skills
presentation-creator
references
patterns
build
deliver
prepare
scripts
vault-clarification
vault-ingress
vault-profile