tessl/pypi-python-multipart

A streaming multipart parser for Python that enables efficient handling of file uploads and form data in web applications

—

Pending

Overview

Eval results

Files

High-Level Form Parsing

Name: tessl/pypi-python-multipart
Author: tessl

Complete form parsing solution that automatically detects content types and creates appropriate parsers. Provides the simplest interface for handling form data, file uploads, and various content encodings with automatic Field and File object creation.

Capabilities

Convenience Functions

parse_form

Convenience function that handles the complete form parsing workflow with minimal setup. Automatically detects content type, creates appropriate parser, and processes the entire input stream.

def parse_form(
    headers: dict[str, bytes],
    input_stream: SupportsRead,
    on_field: OnFieldCallback | None,
    on_file: OnFileCallback | None,
    chunk_size: int = 1048576
) -> None:
    """
    Parse a request body with minimal setup.
    
    Parameters:
    - headers: HTTP headers dict containing Content-Type
    - input_stream: Input stream to read form data from
    - on_field: Callback for each parsed field
    - on_file: Callback for each parsed file
    - chunk_size: Size of chunks to read from input stream
    """

Usage Example:

def handle_form_submission(environ):
    fields = []
    files = []
    
    def on_field(field):
        fields.append({
            'name': field.field_name.decode('utf-8'),
            'value': field.value.decode('utf-8') if field.value else None
        })
    
    def on_file(file):
        files.append({
            'field_name': file.field_name.decode('utf-8'),
            'file_name': file.file_name.decode('utf-8') if file.file_name else None,
            'size': file.size,
            'in_memory': file.in_memory
        })
        file.close()  # Important: close file when done
    
    headers = {'Content-Type': environ['CONTENT_TYPE'].encode()}
    parse_form(headers, environ['wsgi.input'], on_field, on_file)
    
    return {'fields': fields, 'files': files}

create_form_parser

Factory function that creates FormParser instances from HTTP headers with automatic content type detection and parser configuration.

def create_form_parser(
    headers: dict[str, bytes],
    on_field: OnFieldCallback | None,
    on_file: OnFileCallback | None,
    trust_x_headers: bool = False,
    config: dict = {}
) -> FormParser:
    """
    Create FormParser instance from HTTP headers.
    
    Parameters:
    - headers: HTTP headers dict
    - on_field: Callback for each parsed field
    - on_file: Callback for each parsed file
    - trust_x_headers: Whether to trust X-File-Name header
    - config: Configuration options for parser
    
    Returns:
    FormParser instance ready for data processing
    """

FormParser Class

High-level parser that instantiates appropriate underlying parser based on content type and manages Field/File object lifecycle.

class FormParser:
    """
    All-in-one form parser that handles multiple content types.
    """
    
    DEFAULT_CONFIG: FormParserConfig = {
        "MAX_BODY_SIZE": float("inf"),
        "MAX_MEMORY_FILE_SIZE": 1 * 1024 * 1024,
        "UPLOAD_DIR": None,
        "UPLOAD_KEEP_FILENAME": False,
        "UPLOAD_KEEP_EXTENSIONS": False,
        "UPLOAD_ERROR_ON_BAD_CTE": False,
    }
    
    def __init__(
        self,
        content_type: str,
        on_field: OnFieldCallback | None,
        on_file: OnFileCallback | None,
        on_end: Callable[[], None] | None = None,
        boundary: bytes | str | None = None,
        file_name: bytes | None = None,
        FileClass: type[FileProtocol] = File,
        FieldClass: type[FieldProtocol] = Field,
        config: dict[Any, Any] = {}
    ):
        """
        Initialize FormParser.
        
        Parameters:
        - content_type: MIME type of request body
        - on_field: Callback for each parsed field
        - on_file: Callback for each parsed file  
        - on_end: Callback when parsing completes
        - boundary: Multipart boundary for multipart/form-data
        - file_name: File name for octet-stream content
        - FileClass: Class to use for file objects
        - FieldClass: Class to use for field objects
        - config: Configuration options
        """
    
    def write(self, data: bytes) -> int:
        """
        Write data to parser for processing.
        
        Parameters:
        - data: Bytes to process
        
        Returns:
        Number of bytes processed
        """
    
    def finalize(self) -> None:
        """
        Finalize parsing and flush any remaining data.
        Call when no more data will be written.
        """
    
    def close(self) -> None:
        """
        Close parser and clean up resources.
        """

Configuration Options:

UPLOAD_DIR: Directory for temporary file storage (None = system default)
UPLOAD_KEEP_FILENAME: Preserve original filenames when saving to disk
UPLOAD_KEEP_EXTENSIONS: Preserve file extensions
UPLOAD_ERROR_ON_BAD_CTE: Raise error on bad Content-Transfer-Encoding
MAX_MEMORY_FILE_SIZE: Maximum file size to keep in memory (bytes)
MAX_BODY_SIZE: Maximum total request body size

Usage Example:

import python_multipart
from python_multipart.multipart import parse_options_header

def handle_multipart_upload(request_headers, input_stream):
    # Parse Content-Type header
    content_type, params = parse_options_header(request_headers['Content-Type'])
    boundary = params.get(b'boundary')
    
    uploaded_files = []
    form_fields = []
    
    def on_field(field):
        form_fields.append({
            'name': field.field_name.decode('utf-8'),
            'value': field.value.decode('utf-8') if field.value else ''
        })
    
    def on_file(file):
        # Save file info and move to permanent location if needed
        file_info = {
            'field_name': file.field_name.decode('utf-8'),
            'file_name': file.file_name.decode('utf-8') if file.file_name else 'unnamed',
            'size': file.size,
            'content_type': getattr(file, 'content_type', 'application/octet-stream')
        }
        
        if file.in_memory:
            # File is in memory, can read directly
            file_info['data'] = file.file_object.getvalue()
        else:
            # File is on disk, need to move or copy
            file_info['temp_path'] = file.actual_file_name
        
        uploaded_files.append(file_info)
        file.close()
    
    # Configure parser
    config = {
        'MAX_MEMORY_FILE_SIZE': 5 * 1024 * 1024,  # 5MB
        'UPLOAD_DIR': '/tmp/uploads'
    }
    
    # Create and use parser
    parser = python_multipart.FormParser(
        content_type.decode('utf-8'),
        on_field,
        on_file,
        config=config
    )
    
    # Process data in chunks
    while True:
        chunk = input_stream.read(8192)
        if not chunk:
            break
        parser.write(chunk)
    
    parser.finalize()
    parser.close()
    
    return {
        'fields': form_fields,
        'files': uploaded_files
    }

Content Type Support

FormParser automatically detects and handles multiple content types:

multipart/form-data: Uses MultipartParser for file uploads and form fields
application/x-www-form-urlencoded: Uses QuerystringParser for URL-encoded forms
application/octet-stream: Uses OctetStreamParser for binary data uploads

Each content type is processed with appropriate parsing logic while providing a unified interface through Field and File objects.

Install with Tessl CLI