0
# File Handling
1
2
Comprehensive file and resource management system for handling various types of content sources including local files, URLs, data URIs, and byte streams. The file handling system provides unified access to resources with automatic MIME type detection, caching, and path resolution.
3
4
## Capabilities
5
6
### Unified File Access
7
8
Main function for getting file objects from various sources with automatic type detection and path resolution.
9
10
```python { .api }
11
def getFile(*a, **kw):
12
"""
13
Get file object from various sources (paths, URLs, data URIs).
14
15
Args:
16
*a: Positional arguments passed to pisaFileObject
17
**kw: Keyword arguments passed to pisaFileObject
18
19
Returns:
20
pisaFileObject: Unified file object for resource access
21
"""
22
```
23
24
### File Object Handler
25
26
Unified file object that handles different types of URI sources with consistent interface for content access and MIME type detection.
27
28
```python { .api }
29
class pisaFileObject:
30
def __init__(self, uri, basepath=None, callback=None):
31
"""
32
Initialize file object for various URI types.
33
34
Args:
35
uri (str): File URI - can be:
36
- Local file path: "/path/to/file.jpg"
37
- HTTP/HTTPS URL: "https://example.com/image.png"
38
- Data URI: "..."
39
- File URI: "file:///path/to/file.css"
40
basepath (str): Base path for resolving relative paths
41
callback (callable): Custom URI resolution callback
42
Signature: callback(uri, rel) -> resolved_uri
43
"""
44
45
def getFileContent(self):
46
"""
47
Get raw file content as bytes.
48
49
Returns:
50
bytes: Raw file content
51
52
Raises:
53
IOError: If file cannot be accessed
54
urllib.error.URLError: If URL cannot be fetched
55
"""
56
57
def getNamedFile(self):
58
"""
59
Get named file object for the resource.
60
61
Returns:
62
file-like object: Named file object with read() method
63
"""
64
65
def getData(self):
66
"""
67
Get file data with potential processing.
68
69
Returns:
70
bytes or str: Processed file data
71
"""
72
73
def getFile(self):
74
"""
75
Get file-like object for reading.
76
77
Returns:
78
file-like object: Object with read(), seek(), tell() methods
79
"""
80
81
def getMimeType(self):
82
"""
83
Get MIME type of the file content.
84
85
Returns:
86
str: MIME type (e.g., 'text/css', 'image/png', 'text/html')
87
"""
88
89
def notFound(self):
90
"""
91
Handle file not found cases.
92
93
Returns:
94
bool: True if file was not found
95
"""
96
97
def getAbsPath(self):
98
"""
99
Get absolute path for the file.
100
101
Returns:
102
str: Absolute file path (empty string for non-file URIs)
103
"""
104
105
def getBytesIO(self):
106
"""
107
Get BytesIO object containing file content.
108
109
Returns:
110
io.BytesIO: BytesIO object with file content
111
"""
112
```
113
114
#### Usage Examples
115
116
**Load local file:**
117
118
```python
119
from xhtml2pdf.files import pisaFileObject
120
121
# Load local CSS file
122
css_file = pisaFileObject("/path/to/styles.css")
123
content = css_file.getFileContent().decode('utf-8')
124
mime_type = css_file.getMimeType() # 'text/css'
125
```
126
127
**Load from URL:**
128
129
```python
130
# Load image from URL
131
img_file = pisaFileObject("https://example.com/logo.png")
132
if not img_file.notFound():
133
image_data = img_file.getFileContent()
134
mime_type = img_file.getMimeType() # 'image/png'
135
```
136
137
**Load data URI:**
138
139
```python
140
# Load embedded data
141
data_uri = "data:text/css;base64,Ym9keSB7IGZvbnQtZmFtaWx5OiBBcmlhbDsgfQ=="
142
css_file = pisaFileObject(data_uri)
143
content = css_file.getFileContent().decode('utf-8') # "body { font-family: Arial; }"
144
```
145
146
**Custom callback for path resolution:**
147
148
```python
149
def resolve_path(uri, rel):
150
"""Custom resolution for application-specific paths."""
151
if uri.startswith('app://'):
152
return '/app/assets/' + uri[6:] # Convert app:// to local path
153
return uri
154
155
file_obj = pisaFileObject("app://images/logo.png", callback=resolve_path)
156
```
157
158
### Temporary File Management
159
160
Temporary file handler for managing intermediate files during PDF generation with automatic cleanup and memory management.
161
162
```python { .api }
163
class pisaTempFile:
164
def __init__(self, buffer="", capacity=CAPACITY):
165
"""
166
Initialize temporary file for PDF generation.
167
168
Args:
169
buffer (str): Initial buffer content
170
capacity (int): Maximum memory capacity before switching to disk
171
"""
172
173
def makeTempFile(self):
174
"""
175
Create actual temporary file on disk.
176
177
Returns:
178
file object: Temporary file object
179
"""
180
181
def getFileName(self):
182
"""
183
Get temporary file name.
184
185
Returns:
186
str: Temporary file path
187
"""
188
189
def fileno(self):
190
"""
191
Get file descriptor number.
192
193
Returns:
194
int: File descriptor
195
"""
196
197
def getvalue(self):
198
"""
199
Get current file content as bytes.
200
201
Returns:
202
bytes: File content
203
"""
204
205
def write(self, value):
206
"""
207
Write data to temporary file.
208
209
Args:
210
value (str or bytes): Data to write
211
"""
212
```
213
214
### Specialized File Handlers
215
216
Base classes and specialized handlers for different types of file sources.
217
218
```python { .api }
219
class BaseFile:
220
def __init__(self, path, basepath):
221
"""
222
Base class for file handlers.
223
224
Args:
225
path (str): File path or URI
226
basepath (str): Base path for resolution
227
"""
228
229
class B64InlineURI(BaseFile):
230
"""Handler for base64-encoded data URIs."""
231
232
class LocalProtocolURI(BaseFile):
233
"""Handler for local protocol URIs (file://)."""
234
235
class NetworkFileUri(BaseFile):
236
"""Handler for network URIs (http://, https://)."""
237
238
class LocalFileURI(BaseFile):
239
"""Handler for local file system paths."""
240
241
class BytesFileUri(BaseFile):
242
"""Handler for byte stream content."""
243
244
class LocalTmpFile(BaseFile):
245
"""Handler for local temporary files."""
246
```
247
248
### Network and File Management
249
250
Network manager and temporary file system for handling downloads and caching.
251
252
```python { .api }
253
class FileNetworkManager:
254
"""Manager for network file operations and caching."""
255
256
class TmpFiles(threading.local):
257
"""Thread-local temporary files manager with automatic cleanup."""
258
```
259
260
### Cleanup Utilities
261
262
Utility functions for cleaning up temporary files and resources.
263
264
```python { .api }
265
def cleanFiles():
266
"""
267
Clean up temporary files created during processing.
268
269
This function should be called after PDF generation is complete
270
to free up disk space and system resources.
271
"""
272
```
273
274
#### Usage Example
275
276
```python
277
from xhtml2pdf.files import cleanFiles
278
from xhtml2pdf import pisa
279
280
try:
281
# Process multiple documents
282
for html_file in html_files:
283
with open(html_file) as source:
284
with open(f"{html_file}.pdf", "wb") as dest:
285
pisa.pisaDocument(source, dest)
286
finally:
287
# Clean up all temporary files
288
cleanFiles()
289
```
290
291
## File Type Support
292
293
The file handling system automatically detects and processes various file types:
294
295
### Supported MIME Types
296
297
- **Text**: `text/html`, `text/css`, `text/plain`, `text/xml`
298
- **Images**: `image/png`, `image/jpeg`, `image/gif`, `image/bmp`, `image/svg+xml`
299
- **Fonts**: `font/ttf`, `font/otf`, `application/font-woff`, `font/woff2`
300
- **Data**: `application/pdf`, `application/octet-stream`
301
302
### Path Resolution
303
304
The system supports various path formats:
305
306
```python
307
# Absolute paths
308
file_obj = pisaFileObject("/absolute/path/to/file.css")
309
310
# Relative paths (with basepath)
311
file_obj = pisaFileObject("styles/main.css", basepath="/project/assets")
312
313
# URLs
314
file_obj = pisaFileObject("https://cdn.example.com/font.ttf")
315
316
# Data URIs
317
file_obj = pisaFileObject("data:text/css;charset=utf-8,body{margin:0}")
318
319
# File URIs
320
file_obj = pisaFileObject("file:///local/path/image.png")
321
```
322
323
## Error Handling
324
325
File operations include comprehensive error handling:
326
327
```python
328
from xhtml2pdf.files import pisaFileObject
329
330
file_obj = pisaFileObject("https://example.com/missing.png")
331
332
if file_obj.notFound():
333
print("File not found, using fallback")
334
# Handle missing file case
335
else:
336
try:
337
content = file_obj.getFileContent()
338
# Process file content
339
except (IOError, urllib.error.URLError) as e:
340
print(f"Error loading file: {e}")
341
# Handle network or I/O errors
342
```
343
344
## Performance Considerations
345
346
### Caching
347
348
The file system implements automatic caching for network resources:
349
350
- **Memory caching**: Small files cached in memory
351
- **Disk caching**: Large files cached on disk temporarily
352
- **Cache invalidation**: Automatic cleanup after processing
353
354
### Memory Management
355
356
Temporary files switch between memory and disk based on size:
357
358
```python
359
# Small files stay in memory (default capacity)
360
temp_file = pisaTempFile(capacity=64*1024) # 64KB threshold
361
362
# Large files use disk immediately
363
temp_file = pisaTempFile(capacity=1024) # 1KB threshold
364
```
365
366
## Types
367
368
```python { .api }
369
class pisaFileObject:
370
"""
371
Unified file object for various URI types.
372
373
Attributes:
374
uri (str): Original URI string
375
basepath (str): Base path for resolution
376
callback (callable): Custom resolution callback
377
378
Handles local files, URLs, data URIs, and byte streams
379
with automatic MIME type detection and content processing.
380
"""
381
382
class pisaTempFile:
383
"""
384
Temporary file handler for PDF generation.
385
386
Attributes:
387
capacity (int): Memory capacity threshold
388
buffer (str): Current buffer content
389
390
Manages temporary storage during conversion process
391
with automatic cleanup and memory management.
392
"""
393
```