Two-skill presentation system: analyze your speaking style into a rhetoric knowledge vault, then create new presentations that match your documented patterns. Includes an 88-entry Presentation Patterns taxonomy for scoring, brainstorming, and go-live preparation.
96
Quality
96%
Does it follow best practices?
Impact
96%
1.57xAverage score across 15 eval scenarios
Build a rhetoric and style knowledge base by analyzing presentation talks. Each run
processes unprocessed talks, extracts rhetoric/style observations, and updates the
running summary. All paths are relative to vault root (config.vault_root).
| File / Reference | Purpose |
|---|---|
tracking-database.json | Source of truth — talks, status, config, confirmed intents |
rhetoric-style-summary.md | Running rhetoric & style narrative (the constitution) |
slide-design-spec.md | Visual design rules from PDF + PPTX analysis |
speaker-profile.json | Machine-readable bridge to presentation-creator |
analyses/{talk_filename}.md | Per-talk rhetoric analysis (one file per processed talk) |
transcripts/{youtube_id}.txt | Downloaded/cleaned transcripts |
slides/{id}.pdf | Slide PDFs (from Google Drive, PPTX export, or video extraction) |
references/schemas.md | DB + subagent schemas; full config field list |
references/rhetoric-dimensions.md | 14 analysis dimensions |
references/pptx-extraction.md | Visual extraction script |
references/speaker-profile-schema.md | Profile JSON schema |
references/download-commands.md | yt-dlp + gdown commands |
references/video-slide-extraction.md | Extract slides from video when no PDF/PPTX exists |
A talk is processable when it has video_url. Slide sources, in order of preference:
pptx_path → richest data (exact colors, fonts, shapes via python-pptx)slides_url → download PDF from Google Drivevideo_url → extract slides from the video using ffmpeg + perceptual dedupprocessed_partial)The slide_source field tracks which path: "pptx", "pdf", "both",
"video_extracted", or "none". The pptx_catalog array fuzzy-matches .pptx
files to shownotes entries.
Read tracking-database.json (create with empty config, talks, pptx_catalog if missing).
Config bootstrapping — ask once per missing field and persist to the tracking database.
Core fields: vault_root, talks_source_dir, pptx_source_dir, python_path
(auto-detect: {vault_root}/.venv/bin/python3, then python3 on PATH),
template_skip_patterns (default: ["template"]).
See references/schemas.md for the full config field list (including speaker infrastructure fields).
Scan for new talks: Glob *.md in talks_source_dir. For each file not in the
talks array, parse and add (extract title, conference, date, URLs, IDs, status "pending").
Scan for .pptx files: Glob **/*.pptx in pptx_source_dir (skip *static*,
conflict copies, template matches). Fuzzy-match to talks[] entries. Report counts.
Pattern taxonomy migration: If the pattern taxonomy exists
(skills/presentation-creator/references/patterns/_index.md) but any talks with
status "processed" or "processed_partial" have no pattern_observations (or
pattern_observations.pattern_ids is empty), mark them "needs-reprocessing" with
reprocess_reason: "pattern_scoring_added". Report: "N talks need reprocessing for
pattern scoring."
Read rhetoric-style-summary.md and slide-design-spec.md. Report state:
"X processed, Y remaining. PPTX: A cataloged, B matched, C extracted."
pending or needs-reprocessing.slide_source per the hierarchy above. Mark "skipped_no_sources" only if
missing video_url entirely.$ARGUMENTS specifies a talk filename or title, process ONLY that one.Per batch: launch 5 subagents in parallel, wait, run Step 4, then next batch.
Each subagent receives the talk's DB entry and current rhetoric-style-summary.md.
A. Download transcript and acquire slides:
YouTube talks (default):
yt-dlp --write-auto-sub --sub-lang en --skip-download --sub-format vtt \
-o "{vault_root}/transcripts/{youtube_id}" "https://www.youtube.com/watch?v={youtube_id}"Non-YouTube talks (InfoQ, conference platforms, etc.): yt-dlp supports many
sites beyond YouTube. When video_url is not a YouTube link:
yt-dlp -f http_audio to download audio (MP3/M4A)import mlx_whisper
result = mlx_whisper.transcribe(audio_path,
path_or_hf_repo='mlx-community/whisper-large-v3-turbo',
language='en') # or 'ru', etc.{vault_root}/transcripts/{talk_id}.txttranscript_source: "whisper" on the talk entry (vs "youtube_auto" for YouTube)This enables ingestion from InfoQ, Vimeo, conference-hosted video, or any source
yt-dlp supports. Falls back to processed_partial (slides only) if audio extraction fails.
Slide acquisition per slide_source (see hierarchy above):
pptx/both: Run references/pptx-extraction.md script. Store in structured_data.pptx_visual.pdf: Download via gdown or use locally provided PDF.video_extracted: Run references/video-slide-extraction.md pipeline (download
720p video → ffmpeg frames → auto-detect slide region → perceptual dedup → PDF).
Delete video after extraction; keep only the PDF. Analyze like any other slide PDF.none: Transcript-only analysis, processed_partial.B. Analyze for Rhetoric & Style (NOT content). Apply all 14 dimensions (including dimension 14: Areas for Improvement).
Language policy — the vault is English-only. All analysis output, rhetoric summary updates, tracking DB entries, and profile data MUST be written in English regardless of the talk's delivery language. For non-English talks:
"That's the whole point" (В этом весь смысл)[ru] "получается что") — do NOT merge into the main English signature listdelivery_language in the tracking DBB2. Tag Presentation Patterns. Scan observations against the pattern taxonomy
index at skills/presentation-creator/references/patterns/_index.md (path relative
to tile root). Skip patterns marked observable: false — these are pre-event logistics
and physical stage behaviors that cannot be detected from transcripts or slides. For each
observable pattern/antipattern, determine if the talk exhibits it (strong/moderate/weak
confidence), record evidence, and compute per-talk pattern score:
count(patterns) - count(antipatterns). Return in the pattern_observations field.
C. Return JSON per the subagent return schema (see references/schemas.md).
Process PPTX files not already extracted during Step 3: unmatched catalog entries,
talks that used PDF as primary source but have a PPTX available, or any entry with
pptx_visual_status: "pending". Skip if already "extracted".
After extraction: Store in structured_data.pptx_visual, set status to
"extracted". After 3+ extractions, fill slide-design-spec.md; after 5+, analyze
cross-talk patterns (colors, fonts, footers).
After each batch:
status, processed_date, all result fields.
Backfill empty structured_data from earlier runs using rhetoric_notes.
Persist pattern_observations IDs + score to each talk entry.
Structured field extraction: When the analysis identifies co-presenters,
delivery language, or other structured metadata, populate the corresponding
DB fields (co_presenter, delivery_language, etc.) — do NOT leave
structured data buried only in rhetoric_notes free text.{vault_root}/analyses/{talk_filename}.md containing the full
rhetoric analysis (all 14 dimensions, structured data, verbatim examples, and
a "Presentation Patterns Scoring" section listing detected patterns/antipatterns
with confidence levels, evidence, and the per-talk pattern score).
These files are read by the presentation-creator when adapting existing talks.
Create the analyses/ directory if it doesn't exist.new_patterns and summary_updates.
Be additive; never delete. Sections 1-14 map to the 14 rhetoric dimensions; Section 15
aggregates areas for improvement; Section 16 captures speaker-confirmed intent.
Recount status from DB every time. The summary's ## Status block must be
rewritten by counting the tracking DB — never increment manually, never trust the
existing status line. Count: total talks, processed, skipped (by reason), languages,
co-presenters. The DB is the source of truth; the summary is a derived view.| Transcript | Slides (PPTX/PDF) | Video | Status | Action |
|---|---|---|---|---|
| OK | OK | — | processed | Full analysis (best quality) |
| OK | FAIL | OK | processed | Extract slides from video, then full analysis |
| OK | FAIL | FAIL | processed_partial | Transcript only (no visual analysis) |
| FAIL | OK | — | processed_partial | Slides only |
| FAIL | FAIL | OK | processed_partial | Extract slides from video, visual only |
| FAIL | FAIL | FAIL | skipped_download_failed | Skip, move on |
After all batches complete. Purpose: resolve ambiguities, validate findings, capture intent.
5A. Rhetoric Clarification: For each surprising, contradictory, or ambiguous observation from this run, ask one topic at a time via AskUserQuestion:
Update the summary and tracking DB after each answer.
5A-bis. Blind Spot Moments: The skill can only analyze transcripts (speech) and slides (visuals). It CANNOT observe audience reactions, physical performance, stage movement, costume/prop moments, room energy, or laughter/applause. During analysis, flag moments where the transcript or slides suggest something happened that the skill cannot measure — then ask the speaker about each one. Examples:
These blind spots are inherent to transcript+slides analysis. Asking about them captures
data that no amount of parsing can recover. Store responses as blind_spot_observations
in the talk's tracking DB entry and integrate into the rhetoric summary.
5B. Speaker Infrastructure (first session only): Ask for any empty config fields
(speaker_name through publishing_process.*). See references/schemas.md for the full field list.
5C. Structured Intent Capture: Compile confirmed intents from 5A into structured entries and store in confirmed_intents array in the tracking DB:
{"pattern": "delayed_self_introduction", "intent": "deliberate",
"rule": "Two-phase intro: brief bio slide 3, full re-intro mid-talk",
"note": "Confirmed intentional rhetorical device"}5D. Mark session complete: Increment config.clarification_sessions_completed in
the tracking DB. This counter gates profile generation (Step 6).
When: 10+ talks parsed AND config.clarification_sessions_completed >= 1. Also on explicit request.
Process:
rhetoric-style-summary.md, slide-design-spec.md, and confirmed_intents.structured_data from processed talks (skip empty entries, fall back to prose).template_pptx_path is set, extract layouts via python-pptx:
from pptx import Presentation
prs = Presentation(template_path)
for i, layout in enumerate(prs.slide_layouts):
print(f"{i}: {layout.name} — {[p.placeholder_format.type for p in layout.placeholders]}")speaker-profile.json per references/speaker-profile-schema.md. Map config →
speaker/infrastructure, summary sections → instrument_catalog/presentation_modes,
confirmed intents → rhetoric_defaults, aggregated data → pacing/guardrail_sources,
pattern observations → pattern_profile.{vault_root}/speaker-profile.json.pattern_profile data. Personalize to THIS speaker.Auto-trigger: Step 4 calls this after every vault update (if profile exists).
transcripts/, slides/, analyses/ dirs if missing.Install with Tessl CLI
npx tessl i jbaruch/speaker-toolkit@0.6.2evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
skills
presentation-creator
references
patterns
build
deliver
prepare
rhetoric-knowledge-vault