Guide for implementing Google Gemini API document processing - analyze PDFs with native vision to extract text, images, diagrams, charts, and tables. Use when processing documents, extracting structured data, summarizing PDFs, answering questions about document content, or converting documents to structured formats. (project)
84
Quality
82%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Process and analyze PDF documents using Google Gemini's native vision capabilities. Extract structured information, summarize content, answer questions, and understand complex documents with text, images, diagrams, charts, and tables.
Use this skill when you need to:
The skill checks for GEMINI_API_KEY in this priority order:
.env file in skill directory (.claude/skills/gemini-document-processing/.env).env file in project rootGet your API key: https://aistudio.google.com/apikey
Option A: Environment Variable (Recommended)
export GEMINI_API_KEY="your-api-key-here"Option B: Skill Directory
cd .claude/skills/gemini-document-processing
echo "GEMINI_API_KEY=your-api-key-here" > .envOption C: Project Root
echo "GEMINI_API_KEY=your-api-key-here" > .envpip install google-genai python-dotenv# Use the provided script
python .claude/skills/gemini-document-processing/scripts/process-document.py \
--file invoice.pdf \
--prompt "Extract invoice details as JSON" \
--format json# Process and summarize
python .claude/skills/gemini-document-processing/scripts/process-document.py \
--file report.pdf \
--prompt "Provide a concise executive summary"# Q&A on document content
python .claude/skills/gemini-document-processing/scripts/process-document.py \
--file contract.pdf \
--prompt "What are the key terms and conditions?"from google import genai
client = genai.Client()
# Read PDF
with open('document.pdf', 'rb') as f:
pdf_data = f.read()
# Process document
response = client.models.generate_content(
model='gemini-2.5-flash',
contents=[
'Extract key information from this document',
genai.types.Part.from_bytes(
data=pdf_data,
mime_type='application/pdf'
)
]
)
print(response.text)from google import genai
from pydantic import BaseModel
class InvoiceData(BaseModel):
invoice_number: str
date: str
total: float
vendor: str
client = genai.Client()
response = client.models.generate_content(
model='gemini-2.5-flash',
contents=[
'Extract invoice details',
genai.types.Part.from_bytes(
data=open('invoice.pdf', 'rb').read(),
mime_type='application/pdf'
)
],
config=genai.types.GenerateContentConfig(
response_mime_type='application/json',
response_schema=InvoiceData
)
)
invoice_data = InvoiceData.model_validate_json(response.text)PDF < 20MB?
├─ Yes → Use inline base64 encoding
└─ No → Use File API
Need structured JSON output?
├─ Yes → Define response_schema with Pydantic
└─ No → Get text response
Multiple queries on same PDF?
├─ Yes → Use File API + Context Caching
└─ No → Inline encoding is sufficientThe skill includes a ready-to-use processing script:
# Basic usage
python scripts/process-document.py --file document.pdf --prompt "Your prompt"
# With JSON output
python scripts/process-document.py --file document.pdf --prompt "Extract data" --format json
# With File API (for large files)
python scripts/process-document.py --file large-document.pdf --prompt "Summarize" --use-file-api
# Multiple prompts
python scripts/process-document.py --file document.pdf --prompt "Question 1" --prompt "Question 2"For comprehensive documentation, see:
references/gemini-document-processing-report.md - Complete API referencereferences/quick-reference.md - Quick lookup guidereferences/code-examples.md - Additional code patternsAPI Key Not Found:
# Check API key is set
./scripts/check-api-key.shFile Too Large:
--use-file-api flag to the scriptVision Not Working:
b1b2fe0
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.