tessl/pypi-treq

High-level Twisted HTTP Client API for asynchronous HTTP requests in Python

Overview

Eval results

Files

Response Content Processing

Name: tessl/pypi-treq
Author: tessl

Functions for processing HTTP response content with support for streaming, buffered access, automatic encoding detection, and JSON parsing. These functions handle the asynchronous nature of Twisted's response system.

Capabilities

Incremental Content Collection

Collects response body data incrementally as it arrives, useful for streaming large responses or processing data in chunks.

def collect(response, collector):
    """
    Incrementally collect the body of the response.
    
    This function may only be called once for a given response.
    If the collector raises an exception, it will be set as the error
    value on the response Deferred and the HTTP transport will be closed.
    
    Parameters:
    - response: IResponse - The HTTP response to collect body from
    - collector: callable - Function called with each data chunk (bytes)
    
    Returns:
    Deferred that fires with None when entire body has been read
    """

Complete Content Retrieval

Gets the complete response content as bytes, caching the result for multiple calls.

def content(response):
    """
    Read the complete contents of an HTTP response.
    
    This function may be called multiple times for a response, it uses
    a WeakKeyDictionary to cache the contents of the response.
    
    Parameters:
    - response: IResponse - The HTTP response to get contents of
    
    Returns:
    Deferred that fires with complete content as bytes
    """

Text Content Decoding

Decodes response content as text using automatic charset detection from Content-Type headers or a specified encoding.

def text_content(response, encoding="ISO-8859-1"):
    """
    Read and decode HTTP response contents as text.
    
    The charset is automatically detected from the Content-Type header.
    If no charset is specified, the provided encoding is used as fallback.
    
    Parameters:
    - response: IResponse - The HTTP response to decode
    - encoding: str - Fallback encoding if none detected (default: ISO-8859-1)
    
    Returns:
    Deferred that fires with decoded text as str
    """

JSON Content Parsing

Parses response content as JSON, automatically handling UTF-8 encoding for JSON data.

def json_content(response, **kwargs):
    """
    Read and parse HTTP response contents as JSON.
    
    This function relies on text_content() and may be called multiple
    times for a given response. JSON content is automatically decoded
    as UTF-8 per RFC 7159.
    
    Parameters:
    - response: IResponse - The HTTP response to parse
    - **kwargs: Additional keyword arguments for json.loads()
    
    Returns:
    Deferred that fires with parsed JSON data
    """

Usage Examples

Basic Content Access

import treq
from twisted.internet import defer

@defer.inlineCallbacks
def get_content():
    response = yield treq.get('https://httpbin.org/get')
    
    # Get raw bytes
    raw_data = yield treq.content(response)
    print(f"Raw data: {raw_data[:100]}...")
    
    # Get decoded text
    text_data = yield treq.text_content(response)
    print(f"Text data: {text_data[:100]}...")
    
    # Parse as JSON
    json_data = yield treq.json_content(response)
    print(f"JSON data: {json_data}")

Streaming Large Responses

@defer.inlineCallbacks
def stream_large_file():
    response = yield treq.get('https://httpbin.org/bytes/10000')
    
    chunks = []
    def collector(data):
        chunks.append(data)
        print(f"Received chunk of {len(data)} bytes")
    
    yield treq.collect(response, collector)
    total_data = b''.join(chunks)
    print(f"Total received: {len(total_data)} bytes")

Processing Different Content Types

@defer.inlineCallbacks
def handle_different_types():
    # JSON API response  
    json_response = yield treq.get('https://httpbin.org/json')
    data = yield treq.json_content(json_response)
    
    # Plain text response
    text_response = yield treq.get('https://httpbin.org/robots.txt')
    text = yield treq.text_content(text_response)
    
    # Binary data (image, file, etc.)
    binary_response = yield treq.get('https://httpbin.org/bytes/1024')
    binary_data = yield treq.content(binary_response)
    
    # Custom JSON parsing with parameters
    json_response = yield treq.get('https://httpbin.org/json')
    # Parse with custom options
    data = yield treq.json_content(json_response, parse_float=float, parse_int=int)

Error Handling

@defer.inlineCallbacks
def handle_content_errors():
    try:
        response = yield treq.get('https://httpbin.org/status/500')
        
        # Content functions work regardless of HTTP status
        content = yield treq.text_content(response)
        print(f"Error response content: {content}")
        
    except Exception as e:
        print(f"Request failed: {e}")
    
    try:
        response = yield treq.get('https://httpbin.org/html')
        
        # This will raise an exception if content is not valid JSON
        json_data = yield treq.json_content(response)
        
    except ValueError as e:
        print(f"JSON parsing failed: {e}")
        # Fall back to text content
        text_data = yield treq.text_content(response)

Response Object Methods

The _Response object also provides convenient methods for content access:

@defer.inlineCallbacks
def use_response_methods():
    response = yield treq.get('https://httpbin.org/json')
    
    # These are equivalent to the module-level functions
    content_bytes = yield response.content()
    text_content = yield response.text()
    json_data = yield response.json()
    
    # Incremental collection
    chunks = []
    yield response.collect(chunks.append)

Types

Content-related types:

# Collector function type for incremental processing
CollectorFunction = Callable[[bytes], None]

# Encoding detection return type
Optional[str]  # Charset name or None if not detected

# Content function return types
Deferred[bytes]    # For content()
Deferred[str]      # For text_content()
Deferred[Any]      # For json_content()
Deferred[None]     # For collect()

Encoding Detection

treq automatically detects character encoding from HTTP headers:

Content-Type header parsing: Extracts charset parameter from Content-Type
JSON default: Uses UTF-8 for application/json responses per RFC 7159
Fallback encoding: Uses provided encoding parameter (default: ISO-8859-1)
Charset validation: Validates charset names against RFC 2978 specification

The encoding detection handles edge cases like:

Multiple Content-Type headers (uses last one)
Quoted charset values (charset="utf-8")
Case-insensitive charset names
Invalid charset characters (falls back to default)

Performance Considerations

Buffering: By default, treq buffers complete responses in memory
Unbuffered responses: Use unbuffered=True in request to stream large responses
Multiple access: Content functions cache results for repeated access to same response
Streaming: Use collect() for processing large responses incrementally
Memory usage: Consider streaming for responses larger than available memory