Tessl Tile for pypi/pymupdf@1.26.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

annotations-forms.md document-creation-modification.md document-operations.md document-rendering.md geometry-transformations.md index.md page-content-extraction.md table-extraction.md

index.mddocs/

0
# PyMuPDF
1

2
A high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. PyMuPDF provides comprehensive PDF processing capabilities built on top of the MuPDF C++ library, enabling developers to extract text, images, and metadata, manipulate document content, and render pages to various formats.
3

4
## Package Information
5

6
- **Package Name**: PyMuPDF  
7
- **Language**: Python
8
- **Installation**: `pip install PyMuPDF`
9
- **Minimum Python Version**: 3.9+
10

11
## Core Imports
12

13
```python
14
import pymupdf
15
```
16

17
Legacy compatibility (still supported):
18

19
```python
20
import fitz  # Maps to pymupdf
21
```
22

23
## Basic Usage
24

25
```python
26
import pymupdf
27

28
# Open a document  
29
doc = pymupdf.open("document.pdf")  # Same as pymupdf.Document("document.pdf")
30

31
# Extract text from all pages using standalone function
32
text = ""
33
for page in doc:
34
    text += pymupdf.get_text(page)
35

36
# Get document metadata
37
metadata = doc.metadata
38

39
# Save and close
40
doc.save("output.pdf")
41
doc.close()
42
```
43

44
## Architecture
45

46
PyMuPDF follows a hierarchical document model:
47

48
- **Document**: Top-level container representing the entire document (PDF, XPS, EPUB, etc.)
49
- **Page**: Individual pages containing content, annotations, and links
50
- **Pixmap**: Raster image representation for rendering and image processing
51
- **TextPage**: Text extraction and analysis with layout information
52
- **Geometry Classes**: Matrix, Rect, Point, Quad for coordinate transformations and positioning
53

54
The library provides both high-level convenience methods and low-level access to document structures, enabling everything from simple text extraction to complex document manipulation and rendering.
55

56
## Capabilities
57

58
### Document Operations
59

60
Core document handling including opening, saving, and metadata management. Supports PDF, XPS, EPUB, MOBI, CBZ, SVG and other formats with comprehensive document manipulation capabilities.
61

62
```python { .api }
63
# Note: open() is an alias for Document constructor
64
open = Document
65

66
class Document:
67
    def __init__(self, filename: str = None, stream: bytes = None, filetype: str = None, 
68
                 rect: Rect = None, width: int = 0, height: int = 0, fontsize: int = 11): ...
69
    def save(self, filename: str, **kwargs) -> None: ...
70
    def close(self) -> None: ...
71
    def load_page(self, page_num: int) -> Page: ...
72
    @property
73
    def page_count(self) -> int: ...
74
    @property
75
    def metadata(self) -> dict: ...
76
```
77

78
[Document Operations](./document-operations.md)
79

80
### Page Content Extraction
81

82
Text and image extraction from document pages with multiple output formats, search capabilities, and layout analysis. Includes support for structured text extraction with formatting information.
83

84
```python { .api }
85
# Standalone text extraction functions
86
def get_text(page: Page, option: str = "text", **kwargs) -> str: ...
87
def get_text_blocks(page: Page, **kwargs) -> list: ...
88
def get_text_words(page: Page, **kwargs) -> list: ...
89
def get_textbox(page: Page, rect: Rect, **kwargs) -> str: ...
90

91
class Page:
92
    def get_textpage(self, **kwargs) -> TextPage: ...
93
    def search_for(self, needle: str, **kwargs) -> list: ...
94
    def get_images(self, **kwargs) -> list: ...
95
    def get_links(self) -> list: ...
96
```
97

98
[Page Content Extraction](./page-content-extraction.md)
99

100
### Document Rendering
101

102
High-performance rendering of document pages to various formats including PNG, JPEG, and other image formats. Supports custom resolutions, color spaces, and rendering options.
103

104
```python { .api }
105
class Page:
106
    def get_pixmap(self, **kwargs) -> Pixmap: ...
107
    
108
class Pixmap:
109
    def save(self, filename: str, **kwargs) -> None: ...
110
    def tobytes(self, output: str = "png") -> bytes: ...
111
    @property
112
    def width(self) -> int: ...
113
    @property
114
    def height(self) -> int: ...
115
```
116

117
[Document Rendering](./document-rendering.md)
118

119
### Annotations and Forms
120

121
Comprehensive annotation handling including creation, modification, and deletion of various annotation types. Support for interactive forms and form field manipulation.
122

123
```python { .api }
124
class Annot:
125
    def set_info(self, content: str = None, **kwargs) -> None: ...
126
    def set_rect(self, rect: Rect) -> None: ...
127
    def update(self) -> None: ...
128
    def delete(self) -> None: ...
129
    @property
130
    def type(self) -> list: ...
131
```
132

133
[Annotations and Forms](./annotations-forms.md)
134

135
### Geometry and Transformations
136

137
Coordinate system handling with matrices, rectangles, points, and quads for precise positioning and transformations. Essential for layout manipulation and coordinate calculations.
138

139
```python { .api }
140
class Matrix:
141
    def __init__(self, a: float = 1.0, b: float = 0.0, c: float = 0.0, 
142
                 d: float = 1.0, e: float = 0.0, f: float = 0.0): ...
143
    def prerotate(self, deg: float) -> Matrix: ...
144
    def prescale(self, sx: float, sy: float) -> Matrix: ...
145
    
146
class Rect:
147
    def __init__(self, x0: float, y0: float, x1: float, y1: float): ...
148
    def transform(self, matrix: Matrix) -> Rect: ...
149
    @property
150
    def width(self) -> float: ...
151
    @property
152
    def height(self) -> float: ...
153
```
154

155
[Geometry and Transformations](./geometry-transformations.md)
156

157
### Table Extraction
158

159
Advanced table detection and extraction capabilities with support for table structure analysis, cell content extraction, and export to various formats including pandas DataFrames.
160

161
```python { .api }
162
class Table:
163
    def extract(self) -> list: ...
164
    def to_pandas(self) -> 'pandas.DataFrame': ...
165

166
class TableFinder:
167
    def __init__(self, page: Page): ...
168
    def find_tables(self, **kwargs) -> list: ...
169
```
170

171
[Table Extraction](./table-extraction.md)
172

173
### Document Creation and Modification
174

175
Creating new documents and modifying existing ones including page insertion, deletion, and content manipulation. Support for adding text, images, and other content elements.
176

177
```python { .api }  
178
class Document:
179
    def new_page(self, width: float = 595, height: float = 842, **kwargs) -> Page: ...
180
    def delete_page(self, pno: int) -> None: ...
181
    def insert_pdf(self, docsrc: Document, **kwargs) -> int: ...
182

183
class Page:
184
    def insert_text(self, point: Point, text: str, **kwargs) -> int: ...
185
    def insert_image(self, rect: Rect, **kwargs) -> None: ...
186
```
187

188
[Document Creation and Modification](./document-creation-modification.md)
189

190
## Types
191

192
```python { .api }
193
class Document:
194
    """Main document class for PDF and other document formats."""
195

196
class Page:
197
    """Represents a single page in a document."""
198

199
class Pixmap:
200
    """Raster image representation with pixel data."""
201

202
class TextPage:
203
    """Text extraction with layout and formatting information."""
204

205
class Annot:
206
    """Document annotation (note, highlight, etc.)."""
207

208
class Matrix:
209
    """2D transformation matrix for coordinate transformations."""
210

211
class Rect:
212
    """Rectangle defined by four coordinates (x0, y0, x1, y1)."""
213

214
class Point:
215
    """2D point with x and y coordinates."""
216

217
class Quad:
218
    """Quadrilateral defined by four corner points."""
219

220
class Font:
221
    """Font representation for text operations."""
222

223
class Archive:
224
    """Archive file handling for compressed documents."""
225

226
class TextWriter:
227
    """Utility for writing text with advanced formatting."""
228

229
class Shape:
230
    """Drawing operations for vector graphics."""
231

232
# Exception types
233
class FileDataError(RuntimeError):
234
    """Raised when file data is corrupted or invalid."""
235

236
class FileNotFoundError(RuntimeError):
237
    """Raised when requested file cannot be found."""
238

239
class EmptyFileError(FileDataError):
240
    """Raised when file is empty or contains no data."""
241
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/