or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

command-line.mdcontext-management.mdcss-processing.mddocument-processing.mdfile-handling.mdindex.mdpdf-features.mdutilities.mdwsgi-integration.md

document-processing.mddocs/

0

# Document Processing

1

2

Core document processing functions for converting HTML and CSS content to PDF documents. These functions provide the main entry points for xhtml2pdf's conversion capabilities, handling everything from simple HTML strings to complex documents with external resources.

3

4

## Capabilities

5

6

### Main Document Conversion

7

8

The primary function for converting HTML to PDF with comprehensive configuration options for handling various input sources, output destinations, and processing parameters.

9

10

```python { .api }

11

def pisaDocument(

12

src,

13

dest=None,

14

dest_bytes=False,

15

path="",

16

link_callback=None,

17

debug=0,

18

default_css=None,

19

xhtml=False,

20

encoding=None,

21

xml_output=None,

22

raise_exception=True,

23

capacity=100 * 1024,

24

context_meta=None,

25

encrypt=None,

26

signature=None,

27

**kwargs

28

):

29

"""

30

Convert HTML to PDF with full control over processing options.

31

32

Args:

33

src: HTML source - can be:

34

- str: HTML content as string

35

- file-like object: Open file or BytesIO

36

- filename: Path to HTML file

37

dest: Output destination - can be:

38

- file-like object: Open file or BytesIO for writing

39

- filename: Path for output PDF file

40

- None: Return PDF content in context

41

dest_bytes (bool): If True and dest is None, return bytes

42

path (str): Base path for resolving relative URLs and file paths

43

link_callback (callable): Custom function to resolve URLs and file paths

44

Signature: callback(uri, rel) -> resolved_uri

45

debug (int): Debug level 0-2, higher values provide more logging

46

default_css (str): Custom default CSS to apply before document CSS

47

xhtml (bool): Force XHTML parsing mode instead of HTML5

48

encoding (str): Character encoding for source document

49

If None, encoding is auto-detected from HTML meta tags

50

xml_output: XML output configuration options

51

raise_exception (bool): Raise exceptions on conversion errors

52

capacity (int): Memory capacity in bytes for temporary files

53

context_meta (dict): Additional metadata to add to PDF context

54

encrypt (dict): PDF encryption settings with keys:

55

- userPassword: User password for PDF

56

- ownerPassword: Owner password for PDF

57

- canPrint: Allow printing (bool)

58

- canModify: Allow modifications (bool)

59

- canCopy: Allow copying content (bool)

60

- canAnnotate: Allow annotations (bool)

61

signature (dict): PDF digital signature settings

62

**kwargs: Additional processing options

63

64

Returns:

65

pisaContext: Processing context object with attributes:

66

- err (int): Number of errors encountered

67

- warn (int): Number of warnings encountered

68

- log (list): List of log messages

69

- dest: Output destination (if dest_bytes=True, contains PDF bytes)

70

"""

71

```

72

73

#### Usage Examples

74

75

**Basic HTML string to PDF file:**

76

77

```python

78

from xhtml2pdf import pisa

79

80

html = "<html><body><h1>Hello World</h1></body></html>"

81

with open("output.pdf", "wb") as dest:

82

result = pisa.pisaDocument(html, dest)

83

if result.err:

84

print(f"Errors: {result.log}")

85

```

86

87

**Convert with custom CSS and base path:**

88

89

```python

90

from xhtml2pdf import pisa

91

92

custom_css = """

93

@page {

94

size: A4;

95

margin: 2cm;

96

}

97

body { font-family: Arial; }

98

"""

99

100

html = """

101

<html>

102

<body>

103

<h1>Report</h1>

104

<img src="chart.png" />

105

</body>

106

</html>

107

"""

108

109

with open("report.pdf", "wb") as dest:

110

result = pisa.pisaDocument(

111

html,

112

dest,

113

path="/path/to/resources/", # Base path for resolving chart.png

114

default_css=custom_css,

115

debug=1

116

)

117

```

118

119

**Convert with custom link callback:**

120

121

```python

122

from xhtml2pdf import pisa

123

import os

124

125

def link_callback(uri, rel):

126

"""

127

Resolve relative URLs to absolute file paths.

128

"""

129

if uri.startswith(('http://', 'https://')):

130

return uri

131

132

# Convert relative paths to absolute paths

133

if not os.path.isabs(uri):

134

return os.path.join('/path/to/assets/', uri)

135

return uri

136

137

html = '<html><body><img src="images/logo.png" /></body></html>'

138

with open("output.pdf", "wb") as dest:

139

result = pisa.pisaDocument(html, dest, link_callback=link_callback)

140

```

141

142

**Return PDF as bytes:**

143

144

```python

145

from xhtml2pdf import pisa

146

import io

147

148

html = "<html><body><h1>Document</h1></body></html>"

149

output = io.BytesIO()

150

result = pisa.pisaDocument(html, dest=output)

151

152

if not result.err:

153

pdf_bytes = output.getvalue()

154

# Use pdf_bytes as needed

155

```

156

157

### Document Story Creation

158

159

Lower-level function for creating ReportLab story objects from HTML content, providing more granular control over the conversion process.

160

161

```python { .api }

162

def pisaStory(

163

src,

164

path="",

165

link_callback=None,

166

debug=0,

167

default_css=None,

168

xhtml=False,

169

encoding=None,

170

context=None,

171

xml_output=None,

172

**kwargs

173

):

174

"""

175

Create ReportLab story from HTML source without generating PDF.

176

177

This function provides lower-level access to the conversion process,

178

allowing you to work with the ReportLab story directly before PDF generation.

179

180

Args:

181

src: HTML source (string, file-like object, or filename)

182

path (str): Base path for relative resource resolution

183

link_callback (callable): Custom URL/file resolution function

184

debug (int): Debug level for logging (0-2)

185

default_css (str): Custom default CSS stylesheet

186

xhtml (bool): Use XHTML parsing mode

187

encoding (str): Character encoding for source

188

context (pisaContext): Existing context to use (creates new if None)

189

xml_output: XML output options

190

**kwargs: Additional processing options

191

192

Returns:

193

pisaContext: Processing context with story in context.story attribute

194

"""

195

```

196

197

#### Usage Example

198

199

```python

200

from xhtml2pdf.document import pisaStory

201

from reportlab.pdfgen import canvas

202

from reportlab.lib.pagesizes import A4

203

204

html = """

205

<html>

206

<body>

207

<h1>Chapter 1</h1>

208

<p>Content here...</p>

209

</body>

210

</html>

211

"""

212

213

# Create story from HTML

214

context = pisaStory(html, debug=1)

215

216

if not context.err:

217

# Use the story with ReportLab directly

218

pdf_canvas = canvas.Canvas("custom.pdf", pagesize=A4)

219

# ... custom processing with context.story

220

pdf_canvas.save()

221

```

222

223

### Error Document Generation

224

225

Utility function for generating error documents when conversion fails, providing user-friendly error reporting.

226

227

```python { .api }

228

def pisaErrorDocument(dest, c):

229

"""

230

Generate a PDF document containing error information.

231

232

Args:

233

dest: Output destination for error PDF

234

c (pisaContext): Context containing error information

235

236

Returns:

237

pisaContext: Updated context after error document generation

238

"""

239

```

240

241

### PDF Encryption Helper

242

243

Utility function for creating PDF encryption instances from encryption configuration data.

244

245

```python { .api }

246

def get_encrypt_instance(data):

247

"""

248

Create PDF encryption instance from configuration data.

249

250

Args:

251

data (dict): Encryption configuration with keys:

252

- userPassword (str): User password

253

- ownerPassword (str): Owner password

254

- canPrint (bool): Allow printing

255

- canModify (bool): Allow modifications

256

- canCopy (bool): Allow copying

257

- canAnnotate (bool): Allow annotations

258

259

Returns:

260

Encryption instance for PDF generation

261

"""

262

```

263

264

#### Usage Example

265

266

```python

267

from xhtml2pdf import pisa

268

269

html = "<html><body><h1>Confidential</h1></body></html>"

270

271

encrypt_config = {

272

'userPassword': 'user123',

273

'ownerPassword': 'owner456',

274

'canPrint': True,

275

'canModify': False,

276

'canCopy': False,

277

'canAnnotate': False

278

}

279

280

with open("secure.pdf", "wb") as dest:

281

result = pisa.pisaDocument(html, dest, encrypt=encrypt_config)

282

```

283

284

## Advanced Processing Options

285

286

### Memory Management

287

288

The `capacity` parameter controls memory usage during conversion:

289

290

- **Default**: 100KB - suitable for most documents

291

- **Large documents**: Increase to 1MB+ for better performance

292

- **Memory-constrained**: Decrease to 50KB or less

293

294

```python

295

# For large documents

296

result = pisa.pisaDocument(html, dest, capacity=1024*1024) # 1MB

297

298

# For memory-constrained environments

299

result = pisa.pisaDocument(html, dest, capacity=50*1024) # 50KB

300

```

301

302

### Debug Levels

303

304

Debug levels provide different amounts of processing information:

305

306

- **0**: No debug output (default)

307

- **1**: Basic processing information and warnings

308

- **2**: Detailed processing steps and CSS parsing information

309

310

```python

311

result = pisa.pisaDocument(html, dest, debug=2)

312

for log_entry in result.log:

313

print(log_entry)

314

```

315

316

### Context Metadata

317

318

Additional metadata can be embedded in the PDF:

319

320

```python

321

metadata = {

322

'author': 'John Doe',

323

'title': 'My Document',

324

'subject': 'Sample PDF',

325

'creator': 'My Application'

326

}

327

328

result = pisa.pisaDocument(html, dest, context_meta=metadata)

329

```

330

331

## Return Values and Error Handling

332

333

All document processing functions return a `pisaContext` object with these key attributes:

334

335

- **`err`** (int): Number of errors encountered (0 = success)

336

- **`warn`** (int): Number of warnings generated

337

- **`log`** (list): Detailed log messages for debugging

338

- **`dest`**: Output destination or PDF bytes (if dest_bytes=True)

339

340

```python

341

result = pisa.pisaDocument(html, dest)

342

343

# Check for success

344

if result.err:

345

print(f"Conversion failed with {result.err} errors")

346

for msg in result.log:

347

if 'ERROR' in str(msg):

348

print(f"Error: {msg}")

349

else:

350

print("PDF generated successfully")

351

352

# Handle warnings

353

if result.warn:

354

print(f"Generated with {result.warn} warnings")

355

```

356

357

## Backward Compatibility

358

359

The legacy `CreatePDF` alias is still available for backward compatibility:

360

361

```python { .api }

362

CreatePDF = pisaDocument # Backward compatibility alias

363

```

364

365

```python

366

from xhtml2pdf.pisa import CreatePDF

367

368

# Legacy usage (deprecated but still works)

369

result = CreatePDF(html, dest)

370

```

371

372

## Types

373

374

```python { .api }

375

class pisaContext:

376

"""

377

Processing context returned by document processing functions.

378

379

Attributes:

380

err (int): Error count

381

warn (int): Warning count

382

log (list): Processing log messages

383

dest: Output destination or PDF content

384

story (list): ReportLab story elements (from pisaStory)

385

cssText (str): Processed CSS content

386

path (str): Base path for resources

387

"""

388

```