0
# High-Level Parsing
1
2
Core parsing functions that provide the most common and convenient ways to extract data from JSON streams. These functions handle JSON parsing at the object and key-value level, abstracting away low-level parsing details.
3
4
## Capabilities
5
6
### Object Extraction with items()
7
8
Extracts complete Python objects from JSON streams under a specified prefix path. This is the most commonly used function for processing JSON arrays and nested objects.
9
10
```python { .api }
11
def items(source, prefix, map_type=None, buf_size=64*1024, **config):
12
"""
13
Yield complete Python objects found under specified prefix.
14
15
Parameters:
16
- source: File-like object, string, bytes, or iterable containing JSON data
17
- prefix (str): JSON path prefix targeting the objects to extract
18
- map_type (type, optional): Custom mapping type for objects (default: dict)
19
- buf_size (int): Buffer size for reading file data (default: 64*1024)
20
- **config: Backend-specific configuration options
21
22
Returns:
23
Generator yielding Python objects (dict, list, str, int, float, bool, None)
24
25
Raises:
26
- JSONError: For malformed JSON
27
- IncompleteJSONError: For truncated JSON data
28
"""
29
```
30
31
**Usage Examples:**
32
33
```python
34
import ijson
35
36
# Extract array items
37
json_data = '{"products": [{"id": 1, "name": "Laptop"}, {"id": 2, "name": "Phone"}]}'
38
products = ijson.items(json_data, 'products.item')
39
for product in products:
40
print(f"Product {product['id']}: {product['name']}")
41
42
# Extract nested objects
43
json_data = '{"data": {"users": {"alice": {"age": 30}, "bob": {"age": 25}}}}'
44
user_data = ijson.items(json_data, 'data.users')
45
for users_dict in user_data:
46
for name, info in users_dict.items():
47
print(f"{name}: {info['age']} years old")
48
49
# Process large JSON files
50
with open('large_dataset.json', 'rb') as file:
51
records = ijson.items(file, 'records.item')
52
for record in records:
53
process_record(record)
54
```
55
56
### Key-Value Extraction with kvitems()
57
58
Extracts key-value pairs from JSON objects under a specified prefix. Useful when you need to iterate over object properties without loading the entire object into memory.
59
60
```python { .api }
61
def kvitems(source, prefix, map_type=None, buf_size=64*1024, **config):
62
"""
63
Yield (key, value) pairs from JSON objects under prefix.
64
65
Parameters:
66
- source: File-like object, string, bytes, or iterable containing JSON data
67
- prefix (str): JSON path prefix targeting the objects to extract pairs from
68
- map_type (type, optional): Custom mapping type for nested objects (default: dict)
69
- buf_size (int): Buffer size for reading file data (default: 64*1024)
70
- **config: Backend-specific configuration options
71
72
Returns:
73
Generator yielding (key, value) tuples where key is str and value is Python object
74
75
Raises:
76
- JSONError: For malformed JSON
77
- IncompleteJSONError: For truncated JSON data
78
"""
79
```
80
81
**Usage Examples:**
82
83
```python
84
import ijson
85
86
# Extract configuration key-value pairs
87
json_data = '{"config": {"debug": true, "timeout": 30, "max_retries": 3}}'
88
config_items = ijson.kvitems(json_data, 'config')
89
for key, value in config_items:
90
print(f"Config {key}: {value}")
91
92
# Process object properties from large files
93
with open('settings.json', 'rb') as file:
94
settings = ijson.kvitems(file, 'application.settings')
95
for setting_name, setting_value in settings:
96
apply_setting(setting_name, setting_value)
97
```
98
99
### Event-Level Parsing with parse()
100
101
Provides parsing events with full path context, giving you complete control over JSON processing while maintaining memory efficiency.
102
103
```python { .api }
104
def parse(source, buf_size=64*1024, **config):
105
"""
106
Yield (prefix, event, value) tuples with path context.
107
108
Parameters:
109
- source: File-like object, string, bytes, or iterable containing JSON data
110
- buf_size (int): Buffer size for reading file data (default: 64*1024)
111
- **config: Backend-specific configuration options
112
113
Returns:
114
Generator yielding (prefix, event, value) tuples where:
115
- prefix (str): JSON path to current location
116
- event (str): Event type ('null', 'boolean', 'number', 'string', 'map_key', 'start_map', 'end_map', 'start_array', 'end_array')
117
- value: Event value (varies by event type)
118
119
Raises:
120
- JSONError: For malformed JSON
121
- IncompleteJSONError: For truncated JSON data
122
"""
123
```
124
125
**Usage Examples:**
126
127
```python
128
import ijson
129
130
json_data = '{"users": [{"name": "Alice", "active": true}, {"name": "Bob", "active": false}]}'
131
for prefix, event, value in ijson.parse(json_data):
132
if event == 'string' and prefix.endswith('.name'):
133
print(f"Found user name: {value}")
134
elif event == 'boolean' and prefix.endswith('.active'):
135
print(f"Active status: {value}")
136
```
137
138
### Low-Level Events with basic_parse()
139
140
Provides the lowest-level parsing interface, yielding raw JSON events without path context. Most efficient for custom parsing logic that doesn't need path information.
141
142
```python { .api }
143
def basic_parse(source, buf_size=64*1024, **config):
144
"""
145
Yield low-level (event, value) parsing events.
146
147
Parameters:
148
- source: File-like object, string, bytes, or iterable containing JSON data
149
- buf_size (int): Buffer size for reading file data (default: 64*1024)
150
- **config: Backend-specific configuration options
151
152
Returns:
153
Generator yielding (event, value) tuples where:
154
- event (str): Event type ('null', 'boolean', 'number', 'string', 'map_key', 'start_map', 'end_map', 'start_array', 'end_array')
155
- value: Event value (None for structural events, actual value for data events)
156
157
Raises:
158
- JSONError: For malformed JSON
159
- IncompleteJSONError: For truncated JSON data
160
"""
161
```
162
163
**Usage Examples:**
164
165
```python
166
import ijson
167
from ijson.common import ObjectBuilder
168
169
# Build custom objects from events
170
json_data = '{"name": "Alice", "age": 30, "active": true}'
171
builder = ObjectBuilder()
172
for event, value in ijson.basic_parse(json_data):
173
builder.event(event, value)
174
result = builder.value
175
print(result) # {'name': 'Alice', 'age': 30, 'active': True}
176
177
# Custom event processing
178
for event, value in ijson.basic_parse(json_data):
179
if event == 'string':
180
print(f"String value: {value}")
181
elif event == 'number':
182
print(f"Number value: {value}")
183
```
184
185
## Input Source Types
186
187
All parsing functions accept multiple input source types:
188
189
- **File objects**: Opened with `open()` in binary or text mode
190
- **String data**: JSON as Python string
191
- **Bytes data**: JSON as bytes object
192
- **Iterables**: Any iterable yielding string or bytes chunks
193
- **Async files**: File objects with async `read()` method (requires async variants)
194
195
## Error Handling
196
197
```python
198
import ijson
199
from ijson.common import JSONError, IncompleteJSONError
200
201
try:
202
data = ijson.items(malformed_json, 'data.item')
203
for item in data:
204
process(item)
205
except IncompleteJSONError:
206
print("JSON data was truncated or incomplete")
207
except JSONError as e:
208
print(f"JSON parsing error: {e}")
209
```
210
211
## Number Conversion Utilities
212
213
Utility functions for converting JSON number strings to Python numeric types.
214
215
```python { .api }
216
def integer_or_decimal(str_value):
217
"""
218
Convert string to int or Decimal for precision.
219
220
Parameters:
221
- str_value (str): String representation of a number
222
223
Returns:
224
int or decimal.Decimal: Parsed number value
225
"""
226
227
def integer_or_float(str_value):
228
"""
229
Convert string to int or float.
230
231
Parameters:
232
- str_value (str): String representation of a number
233
234
Returns:
235
int or float: Parsed number value
236
"""
237
238
def number(str_value):
239
"""
240
DEPRECATED: Convert string to int or Decimal.
241
Use integer_or_decimal() instead.
242
243
Parameters:
244
- str_value (str): String representation of a number
245
246
Returns:
247
int or decimal.Decimal: Parsed number value
248
249
Raises:
250
DeprecationWarning: Function will be removed in future release
251
"""
252
```
253
254
**Usage Examples:**
255
256
```python
257
from ijson.common import integer_or_decimal, integer_or_float
258
259
# Convert JSON number strings
260
result1 = integer_or_decimal("42") # int(42)
261
result2 = integer_or_decimal("3.14159") # Decimal('3.14159')
262
result3 = integer_or_float("42") # int(42)
263
result4 = integer_or_float("3.14159") # float(3.14159)
264
```
265
266
## Performance Considerations
267
268
- **Buffer Size**: Increase `buf_size` for better performance with large files
269
- **Backend Selection**: Faster backends (yajl2_c, yajl2_cffi) significantly improve performance
270
- **Memory Usage**: Functions process data incrementally, keeping memory usage constant regardless of JSON size
271
- **Prefix Targeting**: Specific prefixes are more efficient than processing entire documents