CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-w3lib

Library of web-related functions for HTML manipulation, HTTP processing, URL handling, and encoding detection

84

0.91x
Overview
Eval results
Files

http-utilities.mddocs/

HTTP Utilities

HTTP header processing utilities for converting between raw header formats and dictionaries, plus HTTP Basic Authentication header generation. These functions handle the complexities of HTTP header parsing and formatting according to HTTP specifications.

Capabilities

Header Format Conversion

Convert between raw HTTP headers (multi-line byte strings) and structured dictionaries for easy manipulation.

def headers_raw_to_dict(headers_raw):
    """
    Convert raw HTTP headers to dictionary format.
    
    Args:
        headers_raw (bytes|None): Raw headers as multi-line byte string
    
    Returns:
        HeadersDictOutput|None: Dictionary mapping header names to lists of values,
                               or None if input is None
    """

def headers_dict_to_raw(headers_dict):
    """
    Convert headers dictionary to raw HTTP format.
    
    Args:
        headers_dict (HeadersDictInput|None): Dictionary mapping header names to values
    
    Returns:
        bytes|None: Raw headers formatted for HTTP transmission,
                   or None if input is None
    """

Usage Examples:

from w3lib.http import headers_raw_to_dict, headers_dict_to_raw

# Parse raw headers
raw = b"Content-Type: text/html\r\nAccept: gzip\r\nAccept: deflate"
headers = headers_raw_to_dict(raw)
# Returns: {b'Content-Type': [b'text/html'], b'Accept': [b'gzip', b'deflate']}

# Convert back to raw format
headers_dict = {b'Content-Type': b'text/html', b'Accept': [b'gzip', b'deflate']}
raw_headers = headers_dict_to_raw(headers_dict)
# Returns: b'Content-Type: text/html\r\nAccept: gzip\r\nAccept: deflate'

# Handle None input gracefully
headers_raw_to_dict(None)  # Returns: None
headers_dict_to_raw(None)  # Returns: None

HTTP Basic Authentication

Generate HTTP Basic Authentication header values according to RFC 2617.

def basic_auth_header(username, password, encoding='ISO-8859-1'):
    """
    Generate HTTP Basic Authentication header value.
    
    Args:
        username (str|bytes): Username for authentication
        password (str|bytes): Password for authentication  
        encoding (str): Character encoding for credentials (default: 'ISO-8859-1')
    
    Returns:
        bytes: Authorization header value formatted as 'Basic <base64-encoded-credentials>'
    """

Usage Examples:

from w3lib.http import basic_auth_header

# Generate basic auth header
auth = basic_auth_header('user', 'password')
# Returns: b'Basic dXNlcjpwYXNzd29yZA=='

# Use in HTTP request
import requests
headers = {'Authorization': auth}
response = requests.get('https://api.example.com', headers=headers)

# Handle different encodings
auth_latin1 = basic_auth_header('user', 'password', encoding='ISO-8859-1')
auth_utf8 = basic_auth_header('user', 'password', encoding='UTF-8')

Type Definitions

# Input type for headers dictionary - flexible value types
HeadersDictInput = Mapping[bytes, Union[Any, Sequence[bytes]]]

# Output type for headers dictionary - normalized to lists
HeadersDictOutput = MutableMapping[bytes, list[bytes]]

Header Processing Details

Raw Header Format

Raw headers are expected as byte strings with:

  • Headers separated by \r\n (CRLF)
  • Header name and value separated by : (colon followed by space)
  • Multiple values for the same header on separate lines
  • Whitespace around header names and values is stripped

Dictionary Format

Header dictionaries use:

  • Byte string keys for header names (case-sensitive)
  • Lists of byte string values for each header
  • Multiple values stored as separate list items
  • Input accepts single values or sequences, output always uses lists

Authentication Header Encoding

The basic_auth_header function:

  • Concatenates username and password with : separator
  • Encodes the credentials using specified encoding (default ISO-8859-1)
  • Base64 encodes the result
  • Prepends Basic to create the complete header value
  • Returns bytes suitable for HTTP Authorization header

Error Handling

  • Functions handle None input by returning None
  • Malformed raw headers (missing colons) are silently skipped
  • Invalid header lines don't cause exceptions, they're simply ignored
  • String inputs are converted to bytes using UTF-8 encoding
  • Base64 encoding errors in authentication would raise exceptions (very rare)

Install with Tessl CLI

npx tessl i tessl/pypi-w3lib

docs

encoding-detection.md

html-processing.md

http-utilities.md

index.md

url-handling.md

utilities.md

tile.json