or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-xhtml2pdf

PDF generator using HTML and CSS

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/xhtml2pdf@0.2.x

To install, run

npx @tessl/cli install tessl/pypi-xhtml2pdf@0.2.0

0

# xhtml2pdf

1

2

A comprehensive HTML to PDF converter for Python that transforms HTML and CSS content into high-quality PDF documents. Built on the ReportLab Toolkit, html5lib, and pypdf, xhtml2pdf supports HTML5 and CSS 2.1 (with some CSS 3 features) and is completely written in pure Python for platform independence.

3

4

## Package Information

5

6

- **Package Name**: xhtml2pdf

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Python Version**: 3.8+

10

- **License**: Apache 2.0

11

- **Installation**: `pip install xhtml2pdf`

12

- **Optional Dependencies**:

13

- `pip install xhtml2pdf[pycairo]` (recommended for better graphics)

14

- `pip install xhtml2pdf[renderpm]` (legacy rendering)

15

- **Documentation**: https://xhtml2pdf.readthedocs.io/

16

17

## Core Imports

18

19

Basic import for main functionality:

20

21

```python

22

from xhtml2pdf import pisa

23

```

24

25

Complete document processing import:

26

27

```python

28

from xhtml2pdf.document import pisaDocument

29

```

30

31

Backward compatibility import:

32

33

```python

34

from xhtml2pdf.pisa import CreatePDF # Alias for pisaDocument

35

```

36

37

Advanced imports for specific features:

38

39

```python

40

from xhtml2pdf.context import pisaContext

41

from xhtml2pdf.files import getFile, pisaFileObject

42

from xhtml2pdf.pdf import pisaPDF

43

from xhtml2pdf.util import getColor, getSize, getBool

44

```

45

46

## Basic Usage

47

48

### Simple HTML to PDF Conversion

49

50

```python

51

from xhtml2pdf import pisa

52

import io

53

54

# HTML content

55

html_content = """

56

<html>

57

<head>

58

<style>

59

body { font-family: Arial, sans-serif; }

60

h1 { color: #333; }

61

</style>

62

</head>

63

<body>

64

<h1>Hello World</h1>

65

<p>This is a simple PDF generated from HTML.</p>

66

</body>

67

</html>

68

"""

69

70

# Create PDF

71

output = io.BytesIO()

72

result = pisa.pisaDocument(html_content, dest=output)

73

74

# Check for errors

75

if result.err:

76

print("Error generating PDF")

77

else:

78

# Save or use the PDF

79

with open("output.pdf", "wb") as f:

80

f.write(output.getvalue())

81

```

82

83

### File-to-File Conversion

84

85

```python

86

from xhtml2pdf import pisa

87

88

# Convert HTML file to PDF file

89

with open("input.html", "r") as source:

90

with open("output.pdf", "wb") as dest:

91

result = pisa.pisaDocument(source, dest)

92

93

if not result.err:

94

print("PDF generated successfully")

95

```

96

97

## Architecture

98

99

xhtml2pdf operates through a multi-stage processing pipeline:

100

101

- **HTML Parser**: Uses html5lib for HTML5-compliant parsing

102

- **CSS Engine**: Complete CSS 2.1 cascade and processing system

103

- **Context Management**: pisaContext handles fonts, resources, and conversion state

104

- **ReportLab Bridge**: Converts parsed content to ReportLab document format

105

- **PDF Generation**: Creates final PDF using ReportLab's PDF engine

106

107

The library provides both high-level convenience functions and low-level APIs for advanced customization, making it suitable for simple conversions as well as complex document generation systems.

108

109

## Capabilities

110

111

### Core Document Processing

112

113

Main conversion functions for transforming HTML to PDF, including the primary pisaDocument function and lower-level story creation capabilities.

114

115

```python { .api }

116

def pisaDocument(

117

src,

118

dest=None,

119

dest_bytes=False,

120

path="",

121

link_callback=None,

122

debug=0,

123

default_css=None,

124

xhtml=False,

125

encoding=None,

126

xml_output=None,

127

raise_exception=True,

128

capacity=100 * 1024,

129

context_meta=None,

130

encrypt=None,

131

signature=None,

132

**kwargs

133

):

134

"""

135

Convert HTML to PDF.

136

137

Args:

138

src: HTML source (string, file-like object, or filename)

139

dest: Output destination (file-like object or filename)

140

dest_bytes: Return PDF as bytes if True

141

path: Base path for relative resources

142

link_callback: Function to resolve URLs and file paths

143

debug: Debug level (0-2)

144

default_css: Custom default CSS string

145

xhtml: Force XHTML parsing

146

encoding: Character encoding for source

147

xml_output: XML output options

148

raise_exception: Raise exceptions on errors

149

capacity: Memory capacity for temp files

150

context_meta: Additional context metadata

151

encrypt: PDF encryption settings

152

signature: PDF signature settings

153

154

Returns:

155

pisaContext: Processing context with results and errors

156

"""

157

```

158

159

[Document Processing](./document-processing.md)

160

161

### Context and Configuration Management

162

163

Advanced processing context management for controlling fonts, CSS, resources, and conversion behavior throughout the HTML-to-PDF pipeline.

164

165

```python { .api }

166

class pisaContext:

167

def __init__(self, path="", debug=0, capacity=-1): ...

168

def addCSS(self, value): ...

169

def parseCSS(self): ...

170

def addFrag(self, text="", frag=None): ...

171

def getFile(self, name, relative=None): ...

172

def getFontName(self, names, default="helvetica"): ...

173

def registerFont(self, fontname, alias=None): ...

174

```

175

176

[Context Management](./context-management.md)

177

178

### File and Resource Handling

179

180

Comprehensive file and resource management system supporting local files, URLs, data URIs, and various resource types with automatic MIME type detection.

181

182

```python { .api }

183

def getFile(*a, **kw): ...

184

class pisaFileObject:

185

def __init__(self, uri, basepath=None, callback=None): ...

186

def getFileContent(self): ...

187

def getMimeType(self): ...

188

```

189

190

[File Handling](./file-handling.md)

191

192

### CSS Processing and Styling

193

194

Advanced CSS parsing, cascade processing, and style application system supporting CSS 2.1 and select CSS 3 features for precise document styling.

195

196

```python { .api }

197

class pisaCSSBuilder:

198

def atFontFace(self, declarations): ...

199

def atPage(self): ...

200

def atFrame(self): ...

201

202

class pisaCSSParser:

203

def parseExternal(self, cssResourceName): ...

204

```

205

206

[CSS Processing](./css-processing.md)

207

208

### Utility Functions and Helpers

209

210

Collection of utility functions for size conversion, color handling, coordinate calculation, text processing, and other common operations.

211

212

```python { .api }

213

def getColor(value, default=None): ...

214

def getSize(value, relative=0, base=None, default=0.0): ...

215

def getBool(s): ...

216

def getAlign(value, default=TA_LEFT): ...

217

def arabic_format(text, language): ...

218

```

219

220

[Utilities](./utilities.md)

221

222

### PDF Manipulation and Advanced Features

223

224

PDF document manipulation, joining, encryption, digital signatures, and watermark capabilities for advanced PDF processing.

225

226

```python { .api }

227

class pisaPDF:

228

def __init__(self, capacity=-1): ...

229

def addFromURI(self, url, basepath=None): ...

230

def join(self, file=None): ...

231

232

class PDFSignature:

233

@staticmethod

234

def sign(): ...

235

```

236

237

[PDF Features](./pdf-features.md)

238

239

### Command Line Interface

240

241

Complete command-line interface for batch processing and integration with shell scripts and automated workflows.

242

243

```python { .api }

244

def command(): ...

245

def execute(): ...

246

def usage(): ...

247

def showLogging(*, debug=False): ...

248

```

249

250

[Command Line](./command-line.md)

251

252

### WSGI Integration

253

254

WSGI middleware components for integrating PDF generation directly into web applications with automatic HTML-to-PDF conversion.

255

256

```python { .api }

257

class PisaMiddleware:

258

def __init__(self, app): ...

259

def __call__(self, environ, start_response): ...

260

```

261

262

[WSGI Integration](./wsgi-integration.md)

263

264

## Error Handling

265

266

xhtml2pdf uses a context-based error handling system:

267

268

```python

269

result = pisa.pisaDocument(html_content, dest=output)

270

271

# Check for errors

272

if result.err:

273

print(f"Errors occurred during conversion: {result.log}")

274

275

# Check for warnings

276

if result.warn:

277

print(f"Warnings: {result.log}")

278

```

279

280

Common exceptions that may be raised:

281

- `IOError`: File access issues when reading HTML files or writing PDF output

282

- `FileNotFoundError`: Missing HTML files, CSS files, or image resources

283

- `PermissionError`: Insufficient permissions to read/write files

284

- `UnicodeDecodeError`: Character encoding problems in HTML/CSS content

285

- `ImportError`: Missing optional dependencies (pycairo, renderpm, pyHanko)

286

- `ValueError`: Invalid configuration parameters or malformed HTML/CSS

287

- `MemoryError`: Insufficient memory for large document processing

288

- Various ReportLab exceptions:

289

- `reportlab.platypus.doctemplate.LayoutError`: Page layout issues

290

- `reportlab.lib.colors.ColorError`: Invalid color specifications

291

- PDF generation and rendering errors

292

293

Network-related exceptions (for URL resources):

294

- `urllib.error.URLError`: Network connectivity issues

295

- `urllib.error.HTTPError`: HTTP errors when fetching remote resources

296

- `ssl.SSLError`: SSL certificate issues for HTTPS resources

297

298

## Types

299

300

```python { .api }

301

class pisaContext:

302

"""

303

Main processing context for HTML-to-PDF conversion.

304

305

Attributes:

306

err (int): Error count

307

warn (int): Warning count

308

log (list): Processing log messages

309

cssText (str): Accumulated CSS text

310

cssParser: CSS parser instance

311

fontList (list): Available fonts

312

path (str): Base path for resources

313

"""

314

315

class pisaFileObject:

316

"""

317

Unified file object for various URI types.

318

319

Handles local files, URLs, data URIs, and byte streams

320

with automatic MIME type detection and content processing.

321

"""

322

323

class pisaTempFile:

324

"""

325

Temporary file handler for PDF generation.

326

327

Manages temporary storage during conversion process

328

with automatic cleanup and memory management.

329

"""

330

```