Implement specialized video understanding capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to analyze video content, understand motion and temporal sequences, extract information from video frames, describe video scenes, or perform video-based AI analysis. Optimized for MP4, AVI, MOV, and other common video formats.
57
66%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/video-understand/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid skill description that clearly communicates its purpose and when to use it. The explicit 'Use this skill when...' clause with multiple trigger scenarios and the inclusion of specific file formats strengthen discoverability. The main weakness is that the capability descriptions are somewhat generic—terms like 'analyze video content' and 'understand motion' could be more concrete with specific operations.
Suggestions
Replace generic phrases like 'analyze video content' and 'understand motion and temporal sequences' with more concrete actions such as 'detect objects across frames', 'summarize video narratives', or 'generate frame-by-frame descriptions'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | It names the domain (video understanding) and mentions some actions like 'analyze video content', 'extract information from video frames', 'describe video scenes', but these are somewhat generic and not as concrete as listing truly specific operations (e.g., 'detect objects in frames', 'generate transcripts', 'track motion between frames'). | 2 / 3 |
Completeness | Clearly answers both 'what' (video understanding capabilities including analyzing content, extracting frame info, describing scenes) and 'when' (explicit 'Use this skill when...' clause with multiple trigger scenarios). The 'Use when' clause is explicit and detailed. | 3 / 3 |
Trigger Term Quality | Good coverage of natural terms users would say: 'video', 'analyze video', 'video frames', 'video scenes', 'MP4', 'AVI', 'MOV', 'motion', 'temporal sequences'. These are terms users would naturally use when requesting video analysis tasks. | 3 / 3 |
Distinctiveness Conflict Risk | The combination of video-specific triggers, the named SDK (z-ai-web-dev-sdk), specific video formats (MP4, AVI, MOV), and temporal/motion analysis creates a clear niche that is unlikely to conflict with image analysis, audio processing, or general file handling skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
42%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill is highly actionable with executable, complete code examples and clear CLI usage, but it is severely bloated with repetitive patterns—nearly every example is the same createVision call with a different prompt string. The content would benefit enormously from showing the pattern once and listing prompt variations concisely. The monolithic structure with no progressive disclosure compounds the token waste.
Suggestions
Consolidate the 10+ nearly-identical createVision examples into one canonical pattern, then provide a concise table or list of prompt templates for different use cases (sports, education, moderation, etc.)
Split advanced use cases (multi-turn conversation, batch processing, Express/Next.js integration) into separate referenced files to reduce the main SKILL.md to an overview with navigation
Remove the overview bullet list of capabilities and the 'Common Use Cases' numbered list—these describe what Claude already understands and add no actionable value
Add concrete validation steps for batch processing (e.g., verify response structure, implement retry logic for failed videos) to improve workflow clarity
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. Massive repetition of the same API pattern (createVision call) across 10+ examples that differ only in the prompt string. The overview lists capabilities Claude already understands, and many use cases (sports analysis, educational summarization, content moderation) are just prompt variations wrapped in identical boilerplate. Could be reduced to ~20% of its size. | 1 / 3 |
Actionability | All code examples are fully executable with proper imports, async/await patterns, error handling, and concrete usage examples. CLI commands are copy-paste ready with clear flag explanations. The Express.js and Next.js integration examples are complete and functional. | 3 / 3 |
Workflow Clarity | The skill is mostly single-step (call the API), so complex workflows aren't strictly needed. However, the batch processing section lacks validation/verification steps (no check that results are valid, no retry logic for failures beyond catching errors). The recommended approach section mentions chunking long videos but doesn't provide a concrete workflow for it. | 2 / 3 |
Progressive Disclosure | Monolithic wall of text with no references to external files despite mentioning a scripts directory. All content is inline—the advanced use cases, integration examples, and troubleshooting could easily be split into separate files. The reference to `{Skill Location}/scripts/video-understand.ts` is mentioned but no bundle files exist to support it. | 1 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (917 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
52b2597
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.