or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

command-line.mdcontext-management.mdcss-processing.mddocument-processing.mdfile-handling.mdindex.mdpdf-features.mdutilities.mdwsgi-integration.md

file-handling.mddocs/

0

# File Handling

1

2

Comprehensive file and resource management system for handling various types of content sources including local files, URLs, data URIs, and byte streams. The file handling system provides unified access to resources with automatic MIME type detection, caching, and path resolution.

3

4

## Capabilities

5

6

### Unified File Access

7

8

Main function for getting file objects from various sources with automatic type detection and path resolution.

9

10

```python { .api }

11

def getFile(*a, **kw):

12

"""

13

Get file object from various sources (paths, URLs, data URIs).

14

15

Args:

16

*a: Positional arguments passed to pisaFileObject

17

**kw: Keyword arguments passed to pisaFileObject

18

19

Returns:

20

pisaFileObject: Unified file object for resource access

21

"""

22

```

23

24

### File Object Handler

25

26

Unified file object that handles different types of URI sources with consistent interface for content access and MIME type detection.

27

28

```python { .api }

29

class pisaFileObject:

30

def __init__(self, uri, basepath=None, callback=None):

31

"""

32

Initialize file object for various URI types.

33

34

Args:

35

uri (str): File URI - can be:

36

- Local file path: "/path/to/file.jpg"

37

- HTTP/HTTPS URL: "https://example.com/image.png"

38

- Data URI: "data:image/png;base64,iVBORw0KGgo..."

39

- File URI: "file:///path/to/file.css"

40

basepath (str): Base path for resolving relative paths

41

callback (callable): Custom URI resolution callback

42

Signature: callback(uri, rel) -> resolved_uri

43

"""

44

45

def getFileContent(self):

46

"""

47

Get raw file content as bytes.

48

49

Returns:

50

bytes: Raw file content

51

52

Raises:

53

IOError: If file cannot be accessed

54

urllib.error.URLError: If URL cannot be fetched

55

"""

56

57

def getNamedFile(self):

58

"""

59

Get named file object for the resource.

60

61

Returns:

62

file-like object: Named file object with read() method

63

"""

64

65

def getData(self):

66

"""

67

Get file data with potential processing.

68

69

Returns:

70

bytes or str: Processed file data

71

"""

72

73

def getFile(self):

74

"""

75

Get file-like object for reading.

76

77

Returns:

78

file-like object: Object with read(), seek(), tell() methods

79

"""

80

81

def getMimeType(self):

82

"""

83

Get MIME type of the file content.

84

85

Returns:

86

str: MIME type (e.g., 'text/css', 'image/png', 'text/html')

87

"""

88

89

def notFound(self):

90

"""

91

Handle file not found cases.

92

93

Returns:

94

bool: True if file was not found

95

"""

96

97

def getAbsPath(self):

98

"""

99

Get absolute path for the file.

100

101

Returns:

102

str: Absolute file path (empty string for non-file URIs)

103

"""

104

105

def getBytesIO(self):

106

"""

107

Get BytesIO object containing file content.

108

109

Returns:

110

io.BytesIO: BytesIO object with file content

111

"""

112

```

113

114

#### Usage Examples

115

116

**Load local file:**

117

118

```python

119

from xhtml2pdf.files import pisaFileObject

120

121

# Load local CSS file

122

css_file = pisaFileObject("/path/to/styles.css")

123

content = css_file.getFileContent().decode('utf-8')

124

mime_type = css_file.getMimeType() # 'text/css'

125

```

126

127

**Load from URL:**

128

129

```python

130

# Load image from URL

131

img_file = pisaFileObject("https://example.com/logo.png")

132

if not img_file.notFound():

133

image_data = img_file.getFileContent()

134

mime_type = img_file.getMimeType() # 'image/png'

135

```

136

137

**Load data URI:**

138

139

```python

140

# Load embedded data

141

data_uri = "data:text/css;base64,Ym9keSB7IGZvbnQtZmFtaWx5OiBBcmlhbDsgfQ=="

142

css_file = pisaFileObject(data_uri)

143

content = css_file.getFileContent().decode('utf-8') # "body { font-family: Arial; }"

144

```

145

146

**Custom callback for path resolution:**

147

148

```python

149

def resolve_path(uri, rel):

150

"""Custom resolution for application-specific paths."""

151

if uri.startswith('app://'):

152

return '/app/assets/' + uri[6:] # Convert app:// to local path

153

return uri

154

155

file_obj = pisaFileObject("app://images/logo.png", callback=resolve_path)

156

```

157

158

### Temporary File Management

159

160

Temporary file handler for managing intermediate files during PDF generation with automatic cleanup and memory management.

161

162

```python { .api }

163

class pisaTempFile:

164

def __init__(self, buffer="", capacity=CAPACITY):

165

"""

166

Initialize temporary file for PDF generation.

167

168

Args:

169

buffer (str): Initial buffer content

170

capacity (int): Maximum memory capacity before switching to disk

171

"""

172

173

def makeTempFile(self):

174

"""

175

Create actual temporary file on disk.

176

177

Returns:

178

file object: Temporary file object

179

"""

180

181

def getFileName(self):

182

"""

183

Get temporary file name.

184

185

Returns:

186

str: Temporary file path

187

"""

188

189

def fileno(self):

190

"""

191

Get file descriptor number.

192

193

Returns:

194

int: File descriptor

195

"""

196

197

def getvalue(self):

198

"""

199

Get current file content as bytes.

200

201

Returns:

202

bytes: File content

203

"""

204

205

def write(self, value):

206

"""

207

Write data to temporary file.

208

209

Args:

210

value (str or bytes): Data to write

211

"""

212

```

213

214

### Specialized File Handlers

215

216

Base classes and specialized handlers for different types of file sources.

217

218

```python { .api }

219

class BaseFile:

220

def __init__(self, path, basepath):

221

"""

222

Base class for file handlers.

223

224

Args:

225

path (str): File path or URI

226

basepath (str): Base path for resolution

227

"""

228

229

class B64InlineURI(BaseFile):

230

"""Handler for base64-encoded data URIs."""

231

232

class LocalProtocolURI(BaseFile):

233

"""Handler for local protocol URIs (file://)."""

234

235

class NetworkFileUri(BaseFile):

236

"""Handler for network URIs (http://, https://)."""

237

238

class LocalFileURI(BaseFile):

239

"""Handler for local file system paths."""

240

241

class BytesFileUri(BaseFile):

242

"""Handler for byte stream content."""

243

244

class LocalTmpFile(BaseFile):

245

"""Handler for local temporary files."""

246

```

247

248

### Network and File Management

249

250

Network manager and temporary file system for handling downloads and caching.

251

252

```python { .api }

253

class FileNetworkManager:

254

"""Manager for network file operations and caching."""

255

256

class TmpFiles(threading.local):

257

"""Thread-local temporary files manager with automatic cleanup."""

258

```

259

260

### Cleanup Utilities

261

262

Utility functions for cleaning up temporary files and resources.

263

264

```python { .api }

265

def cleanFiles():

266

"""

267

Clean up temporary files created during processing.

268

269

This function should be called after PDF generation is complete

270

to free up disk space and system resources.

271

"""

272

```

273

274

#### Usage Example

275

276

```python

277

from xhtml2pdf.files import cleanFiles

278

from xhtml2pdf import pisa

279

280

try:

281

# Process multiple documents

282

for html_file in html_files:

283

with open(html_file) as source:

284

with open(f"{html_file}.pdf", "wb") as dest:

285

pisa.pisaDocument(source, dest)

286

finally:

287

# Clean up all temporary files

288

cleanFiles()

289

```

290

291

## File Type Support

292

293

The file handling system automatically detects and processes various file types:

294

295

### Supported MIME Types

296

297

- **Text**: `text/html`, `text/css`, `text/plain`, `text/xml`

298

- **Images**: `image/png`, `image/jpeg`, `image/gif`, `image/bmp`, `image/svg+xml`

299

- **Fonts**: `font/ttf`, `font/otf`, `application/font-woff`, `font/woff2`

300

- **Data**: `application/pdf`, `application/octet-stream`

301

302

### Path Resolution

303

304

The system supports various path formats:

305

306

```python

307

# Absolute paths

308

file_obj = pisaFileObject("/absolute/path/to/file.css")

309

310

# Relative paths (with basepath)

311

file_obj = pisaFileObject("styles/main.css", basepath="/project/assets")

312

313

# URLs

314

file_obj = pisaFileObject("https://cdn.example.com/font.ttf")

315

316

# Data URIs

317

file_obj = pisaFileObject("data:text/css;charset=utf-8,body{margin:0}")

318

319

# File URIs

320

file_obj = pisaFileObject("file:///local/path/image.png")

321

```

322

323

## Error Handling

324

325

File operations include comprehensive error handling:

326

327

```python

328

from xhtml2pdf.files import pisaFileObject

329

330

file_obj = pisaFileObject("https://example.com/missing.png")

331

332

if file_obj.notFound():

333

print("File not found, using fallback")

334

# Handle missing file case

335

else:

336

try:

337

content = file_obj.getFileContent()

338

# Process file content

339

except (IOError, urllib.error.URLError) as e:

340

print(f"Error loading file: {e}")

341

# Handle network or I/O errors

342

```

343

344

## Performance Considerations

345

346

### Caching

347

348

The file system implements automatic caching for network resources:

349

350

- **Memory caching**: Small files cached in memory

351

- **Disk caching**: Large files cached on disk temporarily

352

- **Cache invalidation**: Automatic cleanup after processing

353

354

### Memory Management

355

356

Temporary files switch between memory and disk based on size:

357

358

```python

359

# Small files stay in memory (default capacity)

360

temp_file = pisaTempFile(capacity=64*1024) # 64KB threshold

361

362

# Large files use disk immediately

363

temp_file = pisaTempFile(capacity=1024) # 1KB threshold

364

```

365

366

## Types

367

368

```python { .api }

369

class pisaFileObject:

370

"""

371

Unified file object for various URI types.

372

373

Attributes:

374

uri (str): Original URI string

375

basepath (str): Base path for resolution

376

callback (callable): Custom resolution callback

377

378

Handles local files, URLs, data URIs, and byte streams

379

with automatic MIME type detection and content processing.

380

"""

381

382

class pisaTempFile:

383

"""

384

Temporary file handler for PDF generation.

385

386

Attributes:

387

capacity (int): Memory capacity threshold

388

buffer (str): Current buffer content

389

390

Manages temporary storage during conversion process

391

with automatic cleanup and memory management.

392

"""

393

```