or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

command-line.mdcontext-management.mdcss-processing.mddocument-processing.mdfile-handling.mdindex.mdpdf-features.mdutilities.mdwsgi-integration.md
tile.json

tessl/pypi-xhtml2pdf

PDF generator using HTML and CSS

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/xhtml2pdf@0.2.x

To install, run

npx @tessl/cli install tessl/pypi-xhtml2pdf@0.2.0

index.mddocs/

xhtml2pdf

A comprehensive HTML to PDF converter for Python that transforms HTML and CSS content into high-quality PDF documents. Built on the ReportLab Toolkit, html5lib, and pypdf, xhtml2pdf supports HTML5 and CSS 2.1 (with some CSS 3 features) and is completely written in pure Python for platform independence.

Package Information

  • Package Name: xhtml2pdf
  • Package Type: pypi
  • Language: Python
  • Python Version: 3.8+
  • License: Apache 2.0
  • Installation: pip install xhtml2pdf
  • Optional Dependencies:
    • pip install xhtml2pdf[pycairo] (recommended for better graphics)
    • pip install xhtml2pdf[renderpm] (legacy rendering)
  • Documentation: https://xhtml2pdf.readthedocs.io/

Core Imports

Basic import for main functionality:

from xhtml2pdf import pisa

Complete document processing import:

from xhtml2pdf.document import pisaDocument

Backward compatibility import:

from xhtml2pdf.pisa import CreatePDF  # Alias for pisaDocument

Advanced imports for specific features:

from xhtml2pdf.context import pisaContext
from xhtml2pdf.files import getFile, pisaFileObject
from xhtml2pdf.pdf import pisaPDF
from xhtml2pdf.util import getColor, getSize, getBool

Basic Usage

Simple HTML to PDF Conversion

from xhtml2pdf import pisa
import io

# HTML content
html_content = """
<html>
    <head>
        <style>
            body { font-family: Arial, sans-serif; }
            h1 { color: #333; }
        </style>
    </head>
    <body>
        <h1>Hello World</h1>
        <p>This is a simple PDF generated from HTML.</p>
    </body>
</html>
"""

# Create PDF
output = io.BytesIO()
result = pisa.pisaDocument(html_content, dest=output)

# Check for errors
if result.err:
    print("Error generating PDF")
else:
    # Save or use the PDF
    with open("output.pdf", "wb") as f:
        f.write(output.getvalue())

File-to-File Conversion

from xhtml2pdf import pisa

# Convert HTML file to PDF file
with open("input.html", "r") as source:
    with open("output.pdf", "wb") as dest:
        result = pisa.pisaDocument(source, dest)
        
if not result.err:
    print("PDF generated successfully")

Architecture

xhtml2pdf operates through a multi-stage processing pipeline:

  • HTML Parser: Uses html5lib for HTML5-compliant parsing
  • CSS Engine: Complete CSS 2.1 cascade and processing system
  • Context Management: pisaContext handles fonts, resources, and conversion state
  • ReportLab Bridge: Converts parsed content to ReportLab document format
  • PDF Generation: Creates final PDF using ReportLab's PDF engine

The library provides both high-level convenience functions and low-level APIs for advanced customization, making it suitable for simple conversions as well as complex document generation systems.

Capabilities

Core Document Processing

Main conversion functions for transforming HTML to PDF, including the primary pisaDocument function and lower-level story creation capabilities.

def pisaDocument(
    src,
    dest=None,
    dest_bytes=False,
    path="",
    link_callback=None,
    debug=0,
    default_css=None,
    xhtml=False,
    encoding=None,
    xml_output=None,
    raise_exception=True,
    capacity=100 * 1024,
    context_meta=None,
    encrypt=None,
    signature=None,
    **kwargs
):
    """
    Convert HTML to PDF.
    
    Args:
        src: HTML source (string, file-like object, or filename)
        dest: Output destination (file-like object or filename)
        dest_bytes: Return PDF as bytes if True
        path: Base path for relative resources
        link_callback: Function to resolve URLs and file paths
        debug: Debug level (0-2)
        default_css: Custom default CSS string
        xhtml: Force XHTML parsing
        encoding: Character encoding for source
        xml_output: XML output options
        raise_exception: Raise exceptions on errors
        capacity: Memory capacity for temp files
        context_meta: Additional context metadata
        encrypt: PDF encryption settings
        signature: PDF signature settings
    
    Returns:
        pisaContext: Processing context with results and errors
    """

Document Processing

Context and Configuration Management

Advanced processing context management for controlling fonts, CSS, resources, and conversion behavior throughout the HTML-to-PDF pipeline.

class pisaContext:
    def __init__(self, path="", debug=0, capacity=-1): ...
    def addCSS(self, value): ...
    def parseCSS(self): ...
    def addFrag(self, text="", frag=None): ...
    def getFile(self, name, relative=None): ...
    def getFontName(self, names, default="helvetica"): ...
    def registerFont(self, fontname, alias=None): ...

Context Management

File and Resource Handling

Comprehensive file and resource management system supporting local files, URLs, data URIs, and various resource types with automatic MIME type detection.

def getFile(*a, **kw): ...
class pisaFileObject:
    def __init__(self, uri, basepath=None, callback=None): ...
    def getFileContent(self): ...
    def getMimeType(self): ...

File Handling

CSS Processing and Styling

Advanced CSS parsing, cascade processing, and style application system supporting CSS 2.1 and select CSS 3 features for precise document styling.

class pisaCSSBuilder:
    def atFontFace(self, declarations): ...
    def atPage(self): ...
    def atFrame(self): ...

class pisaCSSParser:
    def parseExternal(self, cssResourceName): ...

CSS Processing

Utility Functions and Helpers

Collection of utility functions for size conversion, color handling, coordinate calculation, text processing, and other common operations.

def getColor(value, default=None): ...
def getSize(value, relative=0, base=None, default=0.0): ...
def getBool(s): ...
def getAlign(value, default=TA_LEFT): ...
def arabic_format(text, language): ...

Utilities

PDF Manipulation and Advanced Features

PDF document manipulation, joining, encryption, digital signatures, and watermark capabilities for advanced PDF processing.

class pisaPDF:
    def __init__(self, capacity=-1): ...
    def addFromURI(self, url, basepath=None): ...
    def join(self, file=None): ...

class PDFSignature:
    @staticmethod
    def sign(): ...

PDF Features

Command Line Interface

Complete command-line interface for batch processing and integration with shell scripts and automated workflows.

def command(): ...
def execute(): ...
def usage(): ...
def showLogging(*, debug=False): ...

Command Line

WSGI Integration

WSGI middleware components for integrating PDF generation directly into web applications with automatic HTML-to-PDF conversion.

class PisaMiddleware:
    def __init__(self, app): ...
    def __call__(self, environ, start_response): ...

WSGI Integration

Error Handling

xhtml2pdf uses a context-based error handling system:

result = pisa.pisaDocument(html_content, dest=output)

# Check for errors
if result.err:
    print(f"Errors occurred during conversion: {result.log}")
    
# Check for warnings  
if result.warn:
    print(f"Warnings: {result.log}")

Common exceptions that may be raised:

  • IOError: File access issues when reading HTML files or writing PDF output
  • FileNotFoundError: Missing HTML files, CSS files, or image resources
  • PermissionError: Insufficient permissions to read/write files
  • UnicodeDecodeError: Character encoding problems in HTML/CSS content
  • ImportError: Missing optional dependencies (pycairo, renderpm, pyHanko)
  • ValueError: Invalid configuration parameters or malformed HTML/CSS
  • MemoryError: Insufficient memory for large document processing
  • Various ReportLab exceptions:
    • reportlab.platypus.doctemplate.LayoutError: Page layout issues
    • reportlab.lib.colors.ColorError: Invalid color specifications
    • PDF generation and rendering errors

Network-related exceptions (for URL resources):

  • urllib.error.URLError: Network connectivity issues
  • urllib.error.HTTPError: HTTP errors when fetching remote resources
  • ssl.SSLError: SSL certificate issues for HTTPS resources

Types

class pisaContext:
    """
    Main processing context for HTML-to-PDF conversion.
    
    Attributes:
        err (int): Error count
        warn (int): Warning count  
        log (list): Processing log messages
        cssText (str): Accumulated CSS text
        cssParser: CSS parser instance
        fontList (list): Available fonts
        path (str): Base path for resources
    """

class pisaFileObject:
    """
    Unified file object for various URI types.
    
    Handles local files, URLs, data URIs, and byte streams
    with automatic MIME type detection and content processing.
    """

class pisaTempFile:
    """
    Temporary file handler for PDF generation.
    
    Manages temporary storage during conversion process
    with automatic cleanup and memory management.
    """