or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

command-line.md context-management.md css-processing.md document-processing.md file-handling.md index.md pdf-features.md utilities.md wsgi-integration.md

tile.json

tessl/pypi-xhtml2pdf

PDF generator using HTML and CSS

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/xhtml2pdf@0.2.x

To install, run

npx @tessl/cli install tessl/pypi-xhtml2pdf@0.2.0

xhtml2pdf

A comprehensive HTML to PDF converter for Python that transforms HTML and CSS content into high-quality PDF documents. Built on the ReportLab Toolkit, html5lib, and pypdf, xhtml2pdf supports HTML5 and CSS 2.1 (with some CSS 3 features) and is completely written in pure Python for platform independence.

Package Information

Package Name: xhtml2pdf
Package Type: pypi
Language: Python
Python Version: 3.8+
License: Apache 2.0
Installation: pip install xhtml2pdf
Optional Dependencies:
- pip install xhtml2pdf[pycairo] (recommended for better graphics)
- pip install xhtml2pdf[renderpm] (legacy rendering)
Documentation: https://xhtml2pdf.readthedocs.io/

Core Imports

Basic import for main functionality:

from xhtml2pdf import pisa

Complete document processing import:

from xhtml2pdf.document import pisaDocument

Backward compatibility import:

from xhtml2pdf.pisa import CreatePDF  # Alias for pisaDocument

Advanced imports for specific features:

from xhtml2pdf.context import pisaContext
from xhtml2pdf.files import getFile, pisaFileObject
from xhtml2pdf.pdf import pisaPDF
from xhtml2pdf.util import getColor, getSize, getBool

Basic Usage

Simple HTML to PDF Conversion

from xhtml2pdf import pisa
import io

# HTML content
html_content = """
<html>
    <head>
        <style>
            body { font-family: Arial, sans-serif; }
            h1 { color: #333; }
        </style>
    </head>
    <body>
        <h1>Hello World</h1>
        <p>This is a simple PDF generated from HTML.</p>
    </body>
</html>
"""

# Create PDF
output = io.BytesIO()
result = pisa.pisaDocument(html_content, dest=output)

# Check for errors
if result.err:
    print("Error generating PDF")
else:
    # Save or use the PDF
    with open("output.pdf", "wb") as f:
        f.write(output.getvalue())

File-to-File Conversion

from xhtml2pdf import pisa

# Convert HTML file to PDF file
with open("input.html", "r") as source:
    with open("output.pdf", "wb") as dest:
        result = pisa.pisaDocument(source, dest)
        
if not result.err:
    print("PDF generated successfully")

Architecture

xhtml2pdf operates through a multi-stage processing pipeline:

HTML Parser: Uses html5lib for HTML5-compliant parsing
CSS Engine: Complete CSS 2.1 cascade and processing system
Context Management: pisaContext handles fonts, resources, and conversion state
ReportLab Bridge: Converts parsed content to ReportLab document format
PDF Generation: Creates final PDF using ReportLab's PDF engine

The library provides both high-level convenience functions and low-level APIs for advanced customization, making it suitable for simple conversions as well as complex document generation systems.

Capabilities

Core Document Processing

Main conversion functions for transforming HTML to PDF, including the primary pisaDocument function and lower-level story creation capabilities.

def pisaDocument(
    src,
    dest=None,
    dest_bytes=False,
    path="",
    link_callback=None,
    debug=0,
    default_css=None,
    xhtml=False,
    encoding=None,
    xml_output=None,
    raise_exception=True,
    capacity=100 * 1024,
    context_meta=None,
    encrypt=None,
    signature=None,
    **kwargs
):
    """
    Convert HTML to PDF.
    
    Args:
        src: HTML source (string, file-like object, or filename)
        dest: Output destination (file-like object or filename)
        dest_bytes: Return PDF as bytes if True
        path: Base path for relative resources
        link_callback: Function to resolve URLs and file paths
        debug: Debug level (0-2)
        default_css: Custom default CSS string
        xhtml: Force XHTML parsing
        encoding: Character encoding for source
        xml_output: XML output options
        raise_exception: Raise exceptions on errors
        capacity: Memory capacity for temp files
        context_meta: Additional context metadata
        encrypt: PDF encryption settings
        signature: PDF signature settings
    
    Returns:
        pisaContext: Processing context with results and errors
    """

Document Processing

Context and Configuration Management

Advanced processing context management for controlling fonts, CSS, resources, and conversion behavior throughout the HTML-to-PDF pipeline.

class pisaContext:
    def __init__(self, path="", debug=0, capacity=-1): ...
    def addCSS(self, value): ...
    def parseCSS(self): ...
    def addFrag(self, text="", frag=None): ...
    def getFile(self, name, relative=None): ...
    def getFontName(self, names, default="helvetica"): ...
    def registerFont(self, fontname, alias=None): ...

Context Management

File and Resource Handling

Comprehensive file and resource management system supporting local files, URLs, data URIs, and various resource types with automatic MIME type detection.

def getFile(*a, **kw): ...
class pisaFileObject:
    def __init__(self, uri, basepath=None, callback=None): ...
    def getFileContent(self): ...
    def getMimeType(self): ...

File Handling

CSS Processing and Styling

Advanced CSS parsing, cascade processing, and style application system supporting CSS 2.1 and select CSS 3 features for precise document styling.

class pisaCSSBuilder:
    def atFontFace(self, declarations): ...
    def atPage(self): ...
    def atFrame(self): ...

class pisaCSSParser:
    def parseExternal(self, cssResourceName): ...

CSS Processing

Utility Functions and Helpers

Collection of utility functions for size conversion, color handling, coordinate calculation, text processing, and other common operations.

def getColor(value, default=None): ...
def getSize(value, relative=0, base=None, default=0.0): ...
def getBool(s): ...
def getAlign(value, default=TA_LEFT): ...
def arabic_format(text, language): ...

Utilities

PDF Manipulation and Advanced Features

PDF document manipulation, joining, encryption, digital signatures, and watermark capabilities for advanced PDF processing.

class pisaPDF:
    def __init__(self, capacity=-1): ...
    def addFromURI(self, url, basepath=None): ...
    def join(self, file=None): ...

class PDFSignature:
    @staticmethod
    def sign(): ...

PDF Features

Command Line Interface

Complete command-line interface for batch processing and integration with shell scripts and automated workflows.

def command(): ...
def execute(): ...
def usage(): ...
def showLogging(*, debug=False): ...

Command Line

WSGI Integration

WSGI middleware components for integrating PDF generation directly into web applications with automatic HTML-to-PDF conversion.

class PisaMiddleware:
    def __init__(self, app): ...
    def __call__(self, environ, start_response): ...

WSGI Integration

Error Handling

xhtml2pdf uses a context-based error handling system:

result = pisa.pisaDocument(html_content, dest=output)

# Check for errors
if result.err:
    print(f"Errors occurred during conversion: {result.log}")
    
# Check for warnings  
if result.warn:
    print(f"Warnings: {result.log}")

Common exceptions that may be raised:

IOError: File access issues when reading HTML files or writing PDF output
FileNotFoundError: Missing HTML files, CSS files, or image resources
PermissionError: Insufficient permissions to read/write files
UnicodeDecodeError: Character encoding problems in HTML/CSS content
ImportError: Missing optional dependencies (pycairo, renderpm, pyHanko)
ValueError: Invalid configuration parameters or malformed HTML/CSS
MemoryError: Insufficient memory for large document processing
Various ReportLab exceptions:
- reportlab.platypus.doctemplate.LayoutError: Page layout issues
- reportlab.lib.colors.ColorError: Invalid color specifications
- PDF generation and rendering errors

Network-related exceptions (for URL resources):

urllib.error.URLError: Network connectivity issues
urllib.error.HTTPError: HTTP errors when fetching remote resources
ssl.SSLError: SSL certificate issues for HTTPS resources

Types

class pisaContext:
    """
    Main processing context for HTML-to-PDF conversion.
    
    Attributes:
        err (int): Error count
        warn (int): Warning count  
        log (list): Processing log messages
        cssText (str): Accumulated CSS text
        cssParser: CSS parser instance
        fontList (list): Available fonts
        path (str): Base path for resources
    """

class pisaFileObject:
    """
    Unified file object for various URI types.
    
    Handles local files, URLs, data URIs, and byte streams
    with automatic MIME type detection and content processing.
    """

class pisaTempFile:
    """
    Temporary file handler for PDF generation.
    
    Manages temporary storage during conversion process
    with automatic cleanup and memory management.
    """

Version

Tile

Files

tessl/pypi-xhtml2pdf

To install, run

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

xhtml2pdf

Package Information

Core Imports

Basic Usage

Simple HTML to PDF Conversion

File-to-File Conversion

Architecture

Capabilities

Core Document Processing

Context and Configuration Management

File and Resource Handling

CSS Processing and Styling

Utility Functions and Helpers

PDF Manipulation and Advanced Features

Command Line Interface

WSGI Integration

Error Handling

Types

index.mddocs/