or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

async-processing.mdbackends.mdcoroutines.mdhigh-level-parsing.mdindex.md
tile.json

tessl/pypi-ijson

Iterative JSON parser with standard Python iterator interfaces for processing large JSON data streams without loading entire documents into memory

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/ijson@3.4.x

To install, run

npx @tessl/cli install tessl/pypi-ijson@3.4.0

index.mddocs/

ijson

An iterative JSON parser with standard Python iterator interfaces. ijson allows you to process large JSON data streams without loading entire documents into memory, making it ideal for handling massive JSON files, streaming APIs, and memory-constrained environments.

Package Information

  • Package Name: ijson
  • Language: Python
  • Installation: pip install ijson
  • Version: 3.4.0

Core Imports

import ijson

For specific parsing functions:

from ijson import parse, items, kvitems, basic_parse

For exceptions and utilities:

from ijson.common import JSONError, IncompleteJSONError, ObjectBuilder
from ijson.utils import coroutine, sendable_list
from ijson import __version__

Basic Usage

import ijson

# Parse a JSON file iteratively
with open('large_file.json', 'rb') as file:
    # Extract all items from an array under 'data'
    objects = ijson.items(file, 'data.item')
    for obj in objects:
        print(obj)

# Parse streaming JSON data
json_data = '{"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}'
users = ijson.items(json_data, 'users.item')
for user in users:
    print(f"Name: {user['name']}, Age: {user['age']}")

# Get key-value pairs from JSON objects
json_data = '{"config": {"debug": true, "timeout": 30, "retries": 3}}'
config_items = ijson.kvitems(json_data, 'config')
for key, value in config_items:
    print(f"{key}: {value}")

Architecture

ijson uses a multi-backend architecture for optimal performance across different environments:

  • Backend System: Automatically selects the fastest available backend (yajl2_c, yajl2_cffi, yajl2, yajl, python)
  • Event-Driven Parsing: Low-level events bubble up through coroutine pipelines to higher-level interfaces
  • Coroutine Pipeline: Modular design allows chaining of parsing, filtering, and transformation coroutines
  • Memory Efficiency: Streaming approach processes JSON incrementally without loading full documents

The library provides multiple parsing levels from low-level events to high-level Python objects, supporting both synchronous and asynchronous operation modes.

Capabilities

High-Level Parsing

Core parsing functions that yield Python objects and key-value pairs from JSON streams. These functions handle the most common use cases for processing JSON data without memory constraints.

def items(source, prefix, map_type=None, buf_size=64*1024, **config):
    """Yield complete Python objects found under specified prefix."""

def kvitems(source, prefix, map_type=None, buf_size=64*1024, **config):
    """Yield (key, value) pairs from JSON objects under prefix."""

def parse(source, buf_size=64*1024, **config):
    """Yield (prefix, event, value) tuples with path context."""

def basic_parse(source, buf_size=64*1024, **config):
    """Yield low-level (event, value) parsing events."""

High-Level Parsing

Asynchronous Processing

Async variants of all parsing functions for use with asyncio and async file objects. Enables non-blocking JSON processing in concurrent applications.

async def items_async(source, prefix, map_type=None, buf_size=64*1024, **config):
    """Async version of items() for async file objects."""

async def kvitems_async(source, prefix, map_type=None, buf_size=64*1024, **config):
    """Async version of kvitems() for async file objects."""

async def parse_async(source, buf_size=64*1024, **config):
    """Async version of parse() for async file objects."""

async def basic_parse_async(source, buf_size=64*1024, **config):
    """Async version of basic_parse() for async file objects."""

Asynchronous Processing

Low-Level Coroutines

Coroutine-based parsing pipeline components for building custom JSON processing workflows. These provide maximum flexibility for advanced use cases.

def basic_parse_coro(target, **config):
    """Coroutine for low-level parsing events."""

def parse_coro(target, **config):
    """Coroutine for parsing with path context."""

def items_coro(target, prefix, map_type=None, **config):
    """Coroutine for extracting objects under prefix."""

def kvitems_coro(target, prefix, map_type=None, **config):
    """Coroutine for extracting key-value pairs under prefix."""

Low-Level Coroutines

Backend Management

Backend selection and configuration utilities for optimizing performance based on available libraries and specific requirements.

def get_backend(backend):
    """Import and return specified backend module."""

ALL_BACKENDS: tuple  # All available backends in speed order
backend: object      # Currently selected backend instance
backend_name: str    # Name of current backend

Backend Management

Types

class JSONError(Exception):
    """Base exception for all parsing errors."""

class IncompleteJSONError(JSONError):
    """Raised when parser can't read expected data from stream."""

class ObjectBuilder:
    """Incrementally builds objects from JSON parser events."""
    def __init__(self, map_type=None): ...
    def event(self, event, value): ...
    value: object  # The object being built

__version__: str  # Package version string (e.g., "3.4.0")

Configuration Options

Global configuration parameters affecting parsing behavior:

  • buf_size (int): Buffer size for reading data (default: 64*1024)
  • multiple_values (bool): Allow multiple top-level JSON values
  • use_float (bool): Use float instead of Decimal for numbers (backend-dependent)
  • map_type (type): Custom mapping type for JSON objects (default: dict)

Prefix Syntax

Path expressions for targeting specific parts of JSON documents:

  • Root level: "" (empty string)
  • Object properties: "property"
  • Nested properties: "parent.child"
  • Array items: "array.item"
  • Complex paths: "data.users.item.address.street"

The prefix system enables precise extraction of data from deeply nested JSON structures without parsing unnecessary parts of the document.

Command-Line Utility

ijson includes a command-line utility for dumping JSON parsing events:

def dump():
    """Command-line utility entry point for dumping ijson events."""

Usage:

# Basic event dumping
python -c "from ijson.dump import dump; dump()" < data.json

# Parse with specific method and prefix
python -c "from ijson.dump import dump; import sys; sys.argv=['dump', '-m', 'items', '-p', 'data.item']; dump()" < data.json

The utility supports:

  • Methods: basic_parse, parse, items, kvitems
  • Prefix filtering: For items and kvitems methods
  • Multiple values: Support for multiple top-level JSON values