0
# ijson
1
2
An iterative JSON parser with standard Python iterator interfaces. ijson allows you to process large JSON data streams without loading entire documents into memory, making it ideal for handling massive JSON files, streaming APIs, and memory-constrained environments.
3
4
## Package Information
5
6
- **Package Name**: ijson
7
- **Language**: Python
8
- **Installation**: `pip install ijson`
9
- **Version**: 3.4.0
10
11
## Core Imports
12
13
```python
14
import ijson
15
```
16
17
For specific parsing functions:
18
19
```python
20
from ijson import parse, items, kvitems, basic_parse
21
```
22
23
For exceptions and utilities:
24
25
```python
26
from ijson.common import JSONError, IncompleteJSONError, ObjectBuilder
27
from ijson.utils import coroutine, sendable_list
28
from ijson import __version__
29
```
30
31
## Basic Usage
32
33
```python
34
import ijson
35
36
# Parse a JSON file iteratively
37
with open('large_file.json', 'rb') as file:
38
# Extract all items from an array under 'data'
39
objects = ijson.items(file, 'data.item')
40
for obj in objects:
41
print(obj)
42
43
# Parse streaming JSON data
44
json_data = '{"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}'
45
users = ijson.items(json_data, 'users.item')
46
for user in users:
47
print(f"Name: {user['name']}, Age: {user['age']}")
48
49
# Get key-value pairs from JSON objects
50
json_data = '{"config": {"debug": true, "timeout": 30, "retries": 3}}'
51
config_items = ijson.kvitems(json_data, 'config')
52
for key, value in config_items:
53
print(f"{key}: {value}")
54
```
55
56
## Architecture
57
58
ijson uses a multi-backend architecture for optimal performance across different environments:
59
60
- **Backend System**: Automatically selects the fastest available backend (yajl2_c, yajl2_cffi, yajl2, yajl, python)
61
- **Event-Driven Parsing**: Low-level events bubble up through coroutine pipelines to higher-level interfaces
62
- **Coroutine Pipeline**: Modular design allows chaining of parsing, filtering, and transformation coroutines
63
- **Memory Efficiency**: Streaming approach processes JSON incrementally without loading full documents
64
65
The library provides multiple parsing levels from low-level events to high-level Python objects, supporting both synchronous and asynchronous operation modes.
66
67
## Capabilities
68
69
### High-Level Parsing
70
71
Core parsing functions that yield Python objects and key-value pairs from JSON streams. These functions handle the most common use cases for processing JSON data without memory constraints.
72
73
```python { .api }
74
def items(source, prefix, map_type=None, buf_size=64*1024, **config):
75
"""Yield complete Python objects found under specified prefix."""
76
77
def kvitems(source, prefix, map_type=None, buf_size=64*1024, **config):
78
"""Yield (key, value) pairs from JSON objects under prefix."""
79
80
def parse(source, buf_size=64*1024, **config):
81
"""Yield (prefix, event, value) tuples with path context."""
82
83
def basic_parse(source, buf_size=64*1024, **config):
84
"""Yield low-level (event, value) parsing events."""
85
```
86
87
[High-Level Parsing](./high-level-parsing.md)
88
89
### Asynchronous Processing
90
91
Async variants of all parsing functions for use with asyncio and async file objects. Enables non-blocking JSON processing in concurrent applications.
92
93
```python { .api }
94
async def items_async(source, prefix, map_type=None, buf_size=64*1024, **config):
95
"""Async version of items() for async file objects."""
96
97
async def kvitems_async(source, prefix, map_type=None, buf_size=64*1024, **config):
98
"""Async version of kvitems() for async file objects."""
99
100
async def parse_async(source, buf_size=64*1024, **config):
101
"""Async version of parse() for async file objects."""
102
103
async def basic_parse_async(source, buf_size=64*1024, **config):
104
"""Async version of basic_parse() for async file objects."""
105
```
106
107
[Asynchronous Processing](./async-processing.md)
108
109
### Low-Level Coroutines
110
111
Coroutine-based parsing pipeline components for building custom JSON processing workflows. These provide maximum flexibility for advanced use cases.
112
113
```python { .api }
114
def basic_parse_coro(target, **config):
115
"""Coroutine for low-level parsing events."""
116
117
def parse_coro(target, **config):
118
"""Coroutine for parsing with path context."""
119
120
def items_coro(target, prefix, map_type=None, **config):
121
"""Coroutine for extracting objects under prefix."""
122
123
def kvitems_coro(target, prefix, map_type=None, **config):
124
"""Coroutine for extracting key-value pairs under prefix."""
125
```
126
127
[Low-Level Coroutines](./coroutines.md)
128
129
### Backend Management
130
131
Backend selection and configuration utilities for optimizing performance based on available libraries and specific requirements.
132
133
```python { .api }
134
def get_backend(backend):
135
"""Import and return specified backend module."""
136
137
ALL_BACKENDS: tuple # All available backends in speed order
138
backend: object # Currently selected backend instance
139
backend_name: str # Name of current backend
140
```
141
142
[Backend Management](./backends.md)
143
144
## Types
145
146
```python { .api }
147
class JSONError(Exception):
148
"""Base exception for all parsing errors."""
149
150
class IncompleteJSONError(JSONError):
151
"""Raised when parser can't read expected data from stream."""
152
153
class ObjectBuilder:
154
"""Incrementally builds objects from JSON parser events."""
155
def __init__(self, map_type=None): ...
156
def event(self, event, value): ...
157
value: object # The object being built
158
159
__version__: str # Package version string (e.g., "3.4.0")
160
```
161
162
## Configuration Options
163
164
Global configuration parameters affecting parsing behavior:
165
166
- **buf_size** (int): Buffer size for reading data (default: 64*1024)
167
- **multiple_values** (bool): Allow multiple top-level JSON values
168
- **use_float** (bool): Use float instead of Decimal for numbers (backend-dependent)
169
- **map_type** (type): Custom mapping type for JSON objects (default: dict)
170
171
## Prefix Syntax
172
173
Path expressions for targeting specific parts of JSON documents:
174
175
- **Root level**: `""` (empty string)
176
- **Object properties**: `"property"`
177
- **Nested properties**: `"parent.child"`
178
- **Array items**: `"array.item"`
179
- **Complex paths**: `"data.users.item.address.street"`
180
181
The prefix system enables precise extraction of data from deeply nested JSON structures without parsing unnecessary parts of the document.
182
183
## Command-Line Utility
184
185
ijson includes a command-line utility for dumping JSON parsing events:
186
187
```python { .api }
188
def dump():
189
"""Command-line utility entry point for dumping ijson events."""
190
```
191
192
**Usage:**
193
194
```bash
195
# Basic event dumping
196
python -c "from ijson.dump import dump; dump()" < data.json
197
198
# Parse with specific method and prefix
199
python -c "from ijson.dump import dump; import sys; sys.argv=['dump', '-m', 'items', '-p', 'data.item']; dump()" < data.json
200
```
201
202
The utility supports:
203
- **Methods**: `basic_parse`, `parse`, `items`, `kvitems`
204
- **Prefix filtering**: For `items` and `kvitems` methods
205
- **Multiple values**: Support for multiple top-level JSON values