or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

async-processing.mdbackends.mdcoroutines.mdhigh-level-parsing.mdindex.md

high-level-parsing.mddocs/

0

# High-Level Parsing

1

2

Core parsing functions that provide the most common and convenient ways to extract data from JSON streams. These functions handle JSON parsing at the object and key-value level, abstracting away low-level parsing details.

3

4

## Capabilities

5

6

### Object Extraction with items()

7

8

Extracts complete Python objects from JSON streams under a specified prefix path. This is the most commonly used function for processing JSON arrays and nested objects.

9

10

```python { .api }

11

def items(source, prefix, map_type=None, buf_size=64*1024, **config):

12

"""

13

Yield complete Python objects found under specified prefix.

14

15

Parameters:

16

- source: File-like object, string, bytes, or iterable containing JSON data

17

- prefix (str): JSON path prefix targeting the objects to extract

18

- map_type (type, optional): Custom mapping type for objects (default: dict)

19

- buf_size (int): Buffer size for reading file data (default: 64*1024)

20

- **config: Backend-specific configuration options

21

22

Returns:

23

Generator yielding Python objects (dict, list, str, int, float, bool, None)

24

25

Raises:

26

- JSONError: For malformed JSON

27

- IncompleteJSONError: For truncated JSON data

28

"""

29

```

30

31

**Usage Examples:**

32

33

```python

34

import ijson

35

36

# Extract array items

37

json_data = '{"products": [{"id": 1, "name": "Laptop"}, {"id": 2, "name": "Phone"}]}'

38

products = ijson.items(json_data, 'products.item')

39

for product in products:

40

print(f"Product {product['id']}: {product['name']}")

41

42

# Extract nested objects

43

json_data = '{"data": {"users": {"alice": {"age": 30}, "bob": {"age": 25}}}}'

44

user_data = ijson.items(json_data, 'data.users')

45

for users_dict in user_data:

46

for name, info in users_dict.items():

47

print(f"{name}: {info['age']} years old")

48

49

# Process large JSON files

50

with open('large_dataset.json', 'rb') as file:

51

records = ijson.items(file, 'records.item')

52

for record in records:

53

process_record(record)

54

```

55

56

### Key-Value Extraction with kvitems()

57

58

Extracts key-value pairs from JSON objects under a specified prefix. Useful when you need to iterate over object properties without loading the entire object into memory.

59

60

```python { .api }

61

def kvitems(source, prefix, map_type=None, buf_size=64*1024, **config):

62

"""

63

Yield (key, value) pairs from JSON objects under prefix.

64

65

Parameters:

66

- source: File-like object, string, bytes, or iterable containing JSON data

67

- prefix (str): JSON path prefix targeting the objects to extract pairs from

68

- map_type (type, optional): Custom mapping type for nested objects (default: dict)

69

- buf_size (int): Buffer size for reading file data (default: 64*1024)

70

- **config: Backend-specific configuration options

71

72

Returns:

73

Generator yielding (key, value) tuples where key is str and value is Python object

74

75

Raises:

76

- JSONError: For malformed JSON

77

- IncompleteJSONError: For truncated JSON data

78

"""

79

```

80

81

**Usage Examples:**

82

83

```python

84

import ijson

85

86

# Extract configuration key-value pairs

87

json_data = '{"config": {"debug": true, "timeout": 30, "max_retries": 3}}'

88

config_items = ijson.kvitems(json_data, 'config')

89

for key, value in config_items:

90

print(f"Config {key}: {value}")

91

92

# Process object properties from large files

93

with open('settings.json', 'rb') as file:

94

settings = ijson.kvitems(file, 'application.settings')

95

for setting_name, setting_value in settings:

96

apply_setting(setting_name, setting_value)

97

```

98

99

### Event-Level Parsing with parse()

100

101

Provides parsing events with full path context, giving you complete control over JSON processing while maintaining memory efficiency.

102

103

```python { .api }

104

def parse(source, buf_size=64*1024, **config):

105

"""

106

Yield (prefix, event, value) tuples with path context.

107

108

Parameters:

109

- source: File-like object, string, bytes, or iterable containing JSON data

110

- buf_size (int): Buffer size for reading file data (default: 64*1024)

111

- **config: Backend-specific configuration options

112

113

Returns:

114

Generator yielding (prefix, event, value) tuples where:

115

- prefix (str): JSON path to current location

116

- event (str): Event type ('null', 'boolean', 'number', 'string', 'map_key', 'start_map', 'end_map', 'start_array', 'end_array')

117

- value: Event value (varies by event type)

118

119

Raises:

120

- JSONError: For malformed JSON

121

- IncompleteJSONError: For truncated JSON data

122

"""

123

```

124

125

**Usage Examples:**

126

127

```python

128

import ijson

129

130

json_data = '{"users": [{"name": "Alice", "active": true}, {"name": "Bob", "active": false}]}'

131

for prefix, event, value in ijson.parse(json_data):

132

if event == 'string' and prefix.endswith('.name'):

133

print(f"Found user name: {value}")

134

elif event == 'boolean' and prefix.endswith('.active'):

135

print(f"Active status: {value}")

136

```

137

138

### Low-Level Events with basic_parse()

139

140

Provides the lowest-level parsing interface, yielding raw JSON events without path context. Most efficient for custom parsing logic that doesn't need path information.

141

142

```python { .api }

143

def basic_parse(source, buf_size=64*1024, **config):

144

"""

145

Yield low-level (event, value) parsing events.

146

147

Parameters:

148

- source: File-like object, string, bytes, or iterable containing JSON data

149

- buf_size (int): Buffer size for reading file data (default: 64*1024)

150

- **config: Backend-specific configuration options

151

152

Returns:

153

Generator yielding (event, value) tuples where:

154

- event (str): Event type ('null', 'boolean', 'number', 'string', 'map_key', 'start_map', 'end_map', 'start_array', 'end_array')

155

- value: Event value (None for structural events, actual value for data events)

156

157

Raises:

158

- JSONError: For malformed JSON

159

- IncompleteJSONError: For truncated JSON data

160

"""

161

```

162

163

**Usage Examples:**

164

165

```python

166

import ijson

167

from ijson.common import ObjectBuilder

168

169

# Build custom objects from events

170

json_data = '{"name": "Alice", "age": 30, "active": true}'

171

builder = ObjectBuilder()

172

for event, value in ijson.basic_parse(json_data):

173

builder.event(event, value)

174

result = builder.value

175

print(result) # {'name': 'Alice', 'age': 30, 'active': True}

176

177

# Custom event processing

178

for event, value in ijson.basic_parse(json_data):

179

if event == 'string':

180

print(f"String value: {value}")

181

elif event == 'number':

182

print(f"Number value: {value}")

183

```

184

185

## Input Source Types

186

187

All parsing functions accept multiple input source types:

188

189

- **File objects**: Opened with `open()` in binary or text mode

190

- **String data**: JSON as Python string

191

- **Bytes data**: JSON as bytes object

192

- **Iterables**: Any iterable yielding string or bytes chunks

193

- **Async files**: File objects with async `read()` method (requires async variants)

194

195

## Error Handling

196

197

```python

198

import ijson

199

from ijson.common import JSONError, IncompleteJSONError

200

201

try:

202

data = ijson.items(malformed_json, 'data.item')

203

for item in data:

204

process(item)

205

except IncompleteJSONError:

206

print("JSON data was truncated or incomplete")

207

except JSONError as e:

208

print(f"JSON parsing error: {e}")

209

```

210

211

## Number Conversion Utilities

212

213

Utility functions for converting JSON number strings to Python numeric types.

214

215

```python { .api }

216

def integer_or_decimal(str_value):

217

"""

218

Convert string to int or Decimal for precision.

219

220

Parameters:

221

- str_value (str): String representation of a number

222

223

Returns:

224

int or decimal.Decimal: Parsed number value

225

"""

226

227

def integer_or_float(str_value):

228

"""

229

Convert string to int or float.

230

231

Parameters:

232

- str_value (str): String representation of a number

233

234

Returns:

235

int or float: Parsed number value

236

"""

237

238

def number(str_value):

239

"""

240

DEPRECATED: Convert string to int or Decimal.

241

Use integer_or_decimal() instead.

242

243

Parameters:

244

- str_value (str): String representation of a number

245

246

Returns:

247

int or decimal.Decimal: Parsed number value

248

249

Raises:

250

DeprecationWarning: Function will be removed in future release

251

"""

252

```

253

254

**Usage Examples:**

255

256

```python

257

from ijson.common import integer_or_decimal, integer_or_float

258

259

# Convert JSON number strings

260

result1 = integer_or_decimal("42") # int(42)

261

result2 = integer_or_decimal("3.14159") # Decimal('3.14159')

262

result3 = integer_or_float("42") # int(42)

263

result4 = integer_or_float("3.14159") # float(3.14159)

264

```

265

266

## Performance Considerations

267

268

- **Buffer Size**: Increase `buf_size` for better performance with large files

269

- **Backend Selection**: Faster backends (yajl2_c, yajl2_cffi) significantly improve performance

270

- **Memory Usage**: Functions process data incrementally, keeping memory usage constant regardless of JSON size

271

- **Prefix Targeting**: Specific prefixes are more efficient than processing entire documents