read-bin-docs

Straightforward text extraction from document files (text-based PDF only for now, no OCR or docx). Use when you just need to read/extract text from binary documents.

2.04x

Quality

87%

Does it follow best practices?

Impact

100%

2.04x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Doc Formats

Quick Start: Extract Text from PDF

Need to extract text from a PDF? Use this Python snippet:

from pypdf import PdfReader

reader = PdfReader("document.pdf")
text = "".join(page.extract_text() for page in reader.pages)
print(text)

Or from the command line:

uvx --with pypdf python /path/to/extract_pdf_text.py document.pdf

PDF Text Extraction

Basic Usage

from pypdf import PdfReader

# Read all pages
reader = PdfReader("file.pdf")
for page in reader.pages:
    text = page.extract_text()
    print(text)

Extract Specific Pages

from pypdf import PdfReader

reader = PdfReader("file.pdf")
# Get pages 1-5 (0-indexed)
for page in reader.pages[0:5]:
    print(page.extract_text())

Using the Script

This skill includes scripts/extract_pdf_text.py for command-line extraction:

# Extract all pages to stdout
python extract_pdf_text.py document.pdf

# Extract to file
python extract_pdf_text.py document.pdf --output text.txt

# Extract specific pages
python extract_pdf_text.py document.pdf --pages 1-5
python extract_pdf_text.py document.pdf --pages 1,3,5

Requirements

pypdf: uvx --with pypdf python <script>
Works with most text-based PDFs
Scanned PDFs without OCR won't extract text

Common Issues

"No text extracted": The PDF may be scanned (image-based) without OCR. OCR support requires additional tools.

"Encoding errors": pypdf handles most encodings, but some PDFs may have encoding issues. Use page.extract_text(layout=True) for layout-aware extraction if available.

Future: Support for DOCX, XLSX, and other formats coming soon.

Repository: YPares/agent-skills
Commit: ac633fa

Last updated: 5 days ago
Created: 5 days ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.