Summarize any video by analyzing both audio and visuals. Downloads via yt-dlp, extracts transcript (YouTube captions or Whisper), pulls scene-detected keyframes, and produces a multimodal summary with clickable timestamped YouTube links. Use this skill whenever the user wants to summarize a YouTube video, digest a talk or tutorial, get notes from a video, extract key points from a recording, or says things like "tl;dw", "summarize this video", "what's in this video", or pastes a YouTube URL and asks for a summary. Also triggers for non-YouTube URLs that yt-dlp supports.
93
93%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that hits all the marks. It provides specific concrete actions (downloading, transcript extraction, keyframe pulling, summary generation), includes a comprehensive set of natural trigger terms users would actually say, and clearly delineates both what the skill does and when it should be used. The description is distinctive enough to avoid conflicts with other skills while being thorough without being unnecessarily verbose.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: downloads via yt-dlp, extracts transcript (YouTube captions or Whisper), pulls scene-detected keyframes, produces multimodal summary with clickable timestamped YouTube links. | 3 / 3 |
Completeness | Clearly answers both 'what' (downloads, extracts transcript, pulls keyframes, produces multimodal summary) and 'when' with an explicit 'Use this skill whenever...' clause listing multiple trigger scenarios and exact user phrases. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural user terms: 'summarize a YouTube video', 'digest a talk or tutorial', 'get notes from a video', 'extract key points from a recording', 'tl;dw', 'summarize this video', 'what's in this video', 'pastes a YouTube URL'. These are highly natural phrases users would actually say. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche focused on video summarization with specific tooling (yt-dlp, Whisper, keyframe extraction). The combination of video-specific triggers and YouTube URL mentions makes it very unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted, highly actionable skill for a genuinely complex multi-step pipeline. The workflow is clearly sequenced with appropriate validation gates, parallel execution is explicitly marked, and every step has concrete commands. The main weakness is that the skill is quite long for a single file — some sections (output template, asset preparation) could be extracted to reference documents to improve progressive disclosure and reduce token cost.
Suggestions
Extract the full markdown output template and depth-level variations into a separate reference file (e.g., references/OUTPUT-FORMAT.md) to reduce the main skill's token footprint.
Move the asset preparation details (thumbnail conversion, screenshot selection logic) into a reference file, keeping only a brief summary in the main skill.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly long but most content is necessary for the complex multi-step pipeline. Some sections could be tightened — e.g., the detailed explanation of contact sheet mapping, the auto-fallback behavior description, and the markdown template could be more compact. However, it avoids explaining basic concepts Claude already knows. | 2 / 3 |
Actionability | Every step includes concrete, executable bash commands with the exact script invocations and arguments. The markdown output template is fully specified, YouTube deep link format is explicit, and the flag table provides clear defaults. The contact sheet logic, frame file naming convention, and asset preparation steps are all copy-paste actionable. | 3 / 3 |
Workflow Clarity | The 8-step pipeline is clearly sequenced with explicit validation checkpoints: Step 0 blocks on dependency verification with re-run instruction, Step 3 suggests threshold adjustment if too few frames are detected, Step 5 handles subagent failures with re-run and gap flagging, and the short-video vs long-video branching logic is clearly specified. Parallel execution points are explicitly called out. | 3 / 3 |
Progressive Disclosure | The skill references one external file (references/SUBAGENT-PROMPT.md) appropriately, and the pipeline overview provides a good high-level map. However, the main file is quite long (~200+ lines of detailed instructions) and some content — like the full markdown output template, the asset preparation details, or the depth-level variations — could be split into reference files to keep the main skill leaner. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents