Use this skill when you need to read, inspect, or extract content from PDF files — especially when file content is NOT in your context and you need to read it from disk. Covers content inventory, text extraction, page rasterization for visual inspection, embedded image/attachment/table/form-field extraction, and choosing the right reading strategy for different document types (text-heavy, scanned, slide-decks, forms, data-heavy). Do NOT use this skill for PDF creation, form filling, merging, splitting, watermarking, or encryption — use the pdf skill instead.
90
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly defines its scope (reading and extracting from PDFs), provides rich trigger terms, and explicitly delineates boundaries with a related skill. The inclusion of specific document types (scanned, slide-decks, forms, data-heavy) and the negative boundary ('Do NOT use this skill for...') make it highly effective for skill selection. The only minor note is the use of second-person 'you' in the opening, but since it's addressed to Claude (the agent selecting the skill) rather than the user, this is contextually appropriate.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: content inventory, text extraction, page rasterization for visual inspection, embedded image/attachment/table/form-field extraction, and choosing reading strategies for different document types (text-heavy, scanned, slide-decks, forms, data-heavy). | 3 / 3 |
Completeness | Clearly answers both 'what' (content inventory, text extraction, page rasterization, embedded content extraction, reading strategy selection) and 'when' ('when you need to read, inspect, or extract content from PDF files — especially when file content is NOT in your context and you need to read it from disk'). Also explicitly states when NOT to use it, which further clarifies scope. | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms: 'read', 'inspect', 'extract content', 'PDF files', 'text extraction', 'scanned', 'forms', 'tables', 'images', 'attachments'. These cover many natural ways a user would phrase requests about reading PDFs. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive — explicitly differentiates itself from a sibling 'pdf skill' for creation/modification tasks. The focus on reading/extraction vs. creation/manipulation creates a clear boundary, and the 'Do NOT use this skill for...' clause directly addresses potential conflicts. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, highly actionable skill that provides comprehensive coverage of PDF reading operations with executable code and CLI commands throughout. Its main weakness is length — the body content is thorough but could benefit from moving some detailed sections (image extraction with PyMuPDF, rare media, font diagnostics) to REFERENCE.md to keep the main skill leaner. The decision-tree approach for choosing reading strategies and the token cost awareness section are particularly valuable additions.
Suggestions
Move detailed/edge-case sections (PyMuPDF image extraction, rare embedded media, font diagnostics) to REFERENCE.md and add brief pointers from the main skill to reduce body length by ~30%.
Trim explanatory context that Claude already knows (e.g., 'PDFs can contain embedded files — spreadsheets, data files, other documents' and the paragraph explaining two attachment mechanisms) to improve conciseness.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is generally well-written but includes some unnecessary explanations that Claude would already know (e.g., explaining what PDF attachments are, what vector graphics are, what CMYK is). The 'Two attachment mechanisms exist' paragraph and some contextual explanations could be trimmed. However, most content is practical and earns its place. | 2 / 3 |
Actionability | Excellent actionability throughout — every section provides fully executable code snippets and CLI commands that are copy-paste ready. The content covers multiple tools with concrete examples for each use case, including specific flags, output handling, and edge cases like the pdftoppm filename padding gotcha. | 3 / 3 |
Workflow Clarity | The skill provides a clear diagnostic-first workflow (content inventory → choose strategy → extract), with explicit decision trees for when to rasterize vs. text-extract. The 'Choosing your reading strategy' section acts as a clear routing guide. Validation is addressed through the content inventory step and font diagnostics for troubleshooting garbled output. | 3 / 3 |
Progressive Disclosure | The skill references REFERENCE.md for advanced features (pypdfium2, OCR fallback, encrypted PDFs) and correctly points to the pdf skill for non-reading operations. However, the main file is quite long (~200+ lines of detailed content) and some sections like embedded images extraction with PyMuPDF, rare media content, and font diagnostics could be moved to the reference file. The quick reference table at the end is a nice touch but the body could be leaner with more content offloaded. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
b27906e
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.