CtrlK
BlogDocsLog inGet started
Tessl Logo

catalan-adobe/video-digest

Summarize any video by analyzing both audio and visuals. Downloads via yt-dlp, extracts transcript (YouTube captions or Whisper), pulls scene-detected keyframes, and produces a multimodal summary with clickable timestamped YouTube links. Use this skill whenever the user wants to summarize a YouTube video, digest a talk or tutorial, get notes from a video, extract key points from a recording, or says things like "tl;dw", "summarize this video", "what's in this video", or pastes a YouTube URL and asks for a summary. Also triggers for non-YouTube URLs that yt-dlp supports.

93

Quality

93%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that hits all the marks. It provides specific concrete actions (downloading, transcript extraction, keyframe pulling, summary generation), includes a comprehensive set of natural trigger terms users would actually say, and clearly delineates both what the skill does and when it should be used. The description is distinctive enough to avoid conflicts with other skills while being thorough without being unnecessarily verbose.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: downloads via yt-dlp, extracts transcript (YouTube captions or Whisper), pulls scene-detected keyframes, produces multimodal summary with clickable timestamped YouTube links.

3 / 3

Completeness

Clearly answers both 'what' (downloads, extracts transcript, pulls keyframes, produces multimodal summary) and 'when' with an explicit 'Use this skill whenever...' clause listing multiple trigger scenarios and exact user phrases.

3 / 3

Trigger Term Quality

Excellent coverage of natural user terms: 'summarize a YouTube video', 'digest a talk or tutorial', 'get notes from a video', 'extract key points from a recording', 'tl;dw', 'summarize this video', 'what's in this video', 'pastes a YouTube URL'. These are highly natural phrases users would actually say.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche focused on video summarization with specific tooling (yt-dlp, Whisper, keyframe extraction). The combination of video-specific triggers and YouTube URL mentions makes it very unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-crafted, highly actionable skill for a genuinely complex multi-step pipeline. The workflow is clearly sequenced with appropriate validation gates, parallel execution is explicitly marked, and every step has concrete commands. The main weakness is that the skill is quite long for a single file — some sections (output template, asset preparation) could be extracted to reference documents to improve progressive disclosure and reduce token cost.

Suggestions

Extract the full markdown output template and depth-level variations into a separate reference file (e.g., references/OUTPUT-FORMAT.md) to reduce the main skill's token footprint.

Move the asset preparation details (thumbnail conversion, screenshot selection logic) into a reference file, keeping only a brief summary in the main skill.

DimensionReasoningScore

Conciseness

The skill is fairly long but most content is necessary for the complex multi-step pipeline. Some sections could be tightened — e.g., the detailed explanation of contact sheet mapping, the auto-fallback behavior description, and the markdown template could be more compact. However, it avoids explaining basic concepts Claude already knows.

2 / 3

Actionability

Every step includes concrete, executable bash commands with the exact script invocations and arguments. The markdown output template is fully specified, YouTube deep link format is explicit, and the flag table provides clear defaults. The contact sheet logic, frame file naming convention, and asset preparation steps are all copy-paste actionable.

3 / 3

Workflow Clarity

The 8-step pipeline is clearly sequenced with explicit validation checkpoints: Step 0 blocks on dependency verification with re-run instruction, Step 3 suggests threshold adjustment if too few frames are detected, Step 5 handles subagent failures with re-run and gap flagging, and the short-video vs long-video branching logic is clearly specified. Parallel execution points are explicitly called out.

3 / 3

Progressive Disclosure

The skill references one external file (references/SUBAGENT-PROMPT.md) appropriately, and the pipeline overview provides a good high-level map. However, the main file is quite long (~200+ lines of detailed instructions) and some content — like the full markdown output template, the asset preparation details, or the depth-level variations — could be split into reference files to keep the main skill leaner.

2 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

Table of Contents