Straightforward text extraction from document files (text-based PDF only for now, no OCR or docx). Use when you just need to read/extract text from binary documents.
84
76%
Does it follow best practices?
Impact
100%
2.04xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./read-bin-docs/SKILL.mdNeed to extract text from a PDF? Use this Python snippet:
from pypdf import PdfReader
reader = PdfReader("document.pdf")
text = "".join(page.extract_text() for page in reader.pages)
print(text)Or from the command line:
uvx --with pypdf python /path/to/extract_pdf_text.py document.pdffrom pypdf import PdfReader
# Read all pages
reader = PdfReader("file.pdf")
for page in reader.pages:
text = page.extract_text()
print(text)from pypdf import PdfReader
reader = PdfReader("file.pdf")
# Get pages 1-5 (0-indexed)
for page in reader.pages[0:5]:
print(page.extract_text())This skill includes scripts/extract_pdf_text.py for command-line extraction:
# Extract all pages to stdout
python extract_pdf_text.py document.pdf
# Extract to file
python extract_pdf_text.py document.pdf --output text.txt
# Extract specific pages
python extract_pdf_text.py document.pdf --pages 1-5
python extract_pdf_text.py document.pdf --pages 1,3,5uvx --with pypdf python <script>"No text extracted": The PDF may be scanned (image-based) without OCR. OCR support requires additional tools.
"Encoding errors": pypdf handles most encodings, but some PDFs may have encoding issues. Use page.extract_text(layout=True) for layout-aware extraction if available.
Future: Support for DOCX, XLSX, and other formats coming soon.
aed1afb
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.