0
# Awkward Array
1
2
A comprehensive Python library for manipulating JSON-like data with NumPy-like idioms. Awkward Array enables efficient processing of nested, variable-sized data structures commonly found in scientific computing, particularly high-energy physics applications. It provides the performance of NumPy with the flexibility to handle complex, heterogeneous data that doesn't fit into regular arrays.
3
4
## Package Information
5
6
- **Package Name**: awkward
7
- **Language**: Python
8
- **Installation**: `pip install awkward`
9
10
## Core Imports
11
12
```python
13
import awkward as ak
14
```
15
16
For behavior customization:
17
18
```python
19
import awkward.behavior
20
```
21
22
For integration with specific frameworks:
23
24
```python
25
import awkward.numba # Numba JIT compilation
26
import awkward.jax # JAX automatic differentiation
27
```
28
29
## Basic Usage
30
31
```python
32
import awkward as ak
33
import numpy as np
34
35
# Create arrays from Python data
36
nested_list = [[1, 2, 3], [], [4, 5]]
37
array = ak.Array(nested_list)
38
print(array)
39
# [[1, 2, 3], [], [4, 5]]
40
41
# Mathematical operations work element-wise
42
squared = array ** 2
43
print(squared)
44
# [[1, 4, 9], [], [16, 25]]
45
46
# Reduction operations handle variable-length structure
47
sums = ak.sum(array, axis=1)
48
print(sums)
49
# [6, 0, 9]
50
51
# Complex nested structures
52
records = ak.Array([
53
{"x": [1, 2], "y": {"a": 10, "b": 20}},
54
{"x": [3], "y": {"a": 30, "b": 40}}
55
])
56
print(records.x)
57
# [[1, 2], [3]]
58
print(records.y.a)
59
# [10, 30]
60
```
61
62
## Architecture
63
64
Awkward Array's layered architecture provides both performance and flexibility:
65
66
- **High-Level Interface** (`Array`, `Record`, `ArrayBuilder`): User-friendly containers that provide NumPy-like behavior for complex data structures
67
- **Operations Layer**: 180+ functions implementing mathematical, statistical, structural, and I/O operations that work consistently across all data types
68
- **Content Layouts**: Efficient low-level representations (17 content types) that optimize memory usage and computational performance for different data patterns
69
- **Type System**: Rich type information (9 type classes, 13 form classes) enabling static analysis and cross-language interoperability
70
- **Behavior System**: Extensible framework allowing domain-specific customization and method injection
71
- **Backend Integration**: Unified interface supporting CPU, GPU (via CuPy/JAX), and JIT compilation (via Numba)
72
73
This design enables awkward to serve as a bridge between irregular scientific data and the NumPy ecosystem, providing the performance needed for large-scale scientific computing while maintaining the expressiveness required for complex data analysis workflows.
74
75
## Capabilities
76
77
### Array Creation and Construction
78
79
Comprehensive functions for creating arrays from various data sources including Python iterables, NumPy arrays, JSON data, and binary formats. Supports incremental building through ArrayBuilder for complex nested structures.
80
81
```python { .api }
82
def from_iter(iterable, *, allow_record=True, highlevel=True, behavior=None, attrs=None, initial=1024, resize=8): ...
83
def from_numpy(array, highlevel=True, behavior=None): ...
84
def from_json(source, highlevel=True, behavior=None): ...
85
def from_arrow(array, highlevel=True, behavior=None): ...
86
def from_parquet(path, **kwargs): ...
87
class ArrayBuilder:
88
def null(self): ...
89
def boolean(self, x): ...
90
def integer(self, x): ...
91
def real(self, x): ...
92
def complex(self, real, imag=0): ...
93
def string(self, x): ...
94
def bytestring(self, x): ...
95
def datetime(self, x): ...
96
def timedelta(self, x): ...
97
def append(self, x): ...
98
def extend(self, iterable): ...
99
def begin_list(self): ...
100
def end_list(self): ...
101
def begin_tuple(self, numfields): ...
102
def end_tuple(self): ...
103
def begin_record(self, name=None): ...
104
def end_record(self): ...
105
def field(self, key): ...
106
def index(self, i): ...
107
```
108
109
[Array Creation →](./array-creation.md)
110
111
### Array Manipulation and Transformation
112
113
Structural operations for reshaping, filtering, combining, and transforming arrays while preserving type information and handling variable-length data gracefully.
114
115
```python { .api }
116
def concatenate(arrays, axis=0): ...
117
def zip(arrays, depth_limit=None): ...
118
def flatten(array, axis=1): ...
119
def unflatten(array, counts, axis=0): ...
120
def mask(array, selection): ...
121
def combinations(array, n, axis=1): ...
122
def cartesian(arrays, axis=1): ...
123
def with_field(array, what, where): ...
124
def without_field(array, where): ...
125
```
126
127
[Array Manipulation →](./array-manipulation.md)
128
129
### Mathematical and Statistical Operations
130
131
Full suite of mathematical operations including reductions, element-wise functions, linear algebra, and statistical analysis that handle missing data and nested structures appropriately.
132
133
```python { .api }
134
def sum(array, axis=None, *, keepdims=False, mask_identity=False, highlevel=True, behavior=None, attrs=None): ...
135
def mean(array, axis=None, keepdims=False): ...
136
def var(array, axis=None, ddof=0, keepdims=False): ...
137
def std(array, axis=None, ddof=0, keepdims=False): ...
138
def min(array, axis=None, keepdims=False): ...
139
def max(array, axis=None, keepdims=False): ...
140
def argmin(array, axis=None, keepdims=False): ...
141
def argmax(array, axis=None, keepdims=False): ...
142
def linear_fit(x, y, axis=None): ...
143
def corr(x, y, axis=None): ...
144
```
145
146
[Mathematical Operations →](./mathematical-operations.md)
147
148
### Data Conversion and I/O
149
150
Extensive support for reading from and writing to various data formats including Arrow, Parquet, JSON, NumPy, and integration with popular frameworks like PyTorch, TensorFlow, and JAX.
151
152
```python { .api }
153
def to_arrow(array): ...
154
def to_parquet(array, destination, **kwargs): ...
155
def to_numpy(array): ...
156
def to_json(array, **kwargs): ...
157
def to_list(array): ...
158
def from_torch(array): ...
159
def to_torch(array): ...
160
def from_tensorflow(array): ...
161
def to_tensorflow(array): ...
162
def to_dataframe(array): ...
163
```
164
165
[Data Conversion →](./data-conversion.md)
166
167
### String Operations
168
169
Comprehensive string processing capabilities modeled after Apache Arrow's compute functions, providing efficient operations on arrays of strings including pattern matching, transformations, and analysis.
170
171
```python { .api }
172
def str.length(array): ...
173
def str.lower(array): ...
174
def str.upper(array): ...
175
def str.split_pattern(array, pattern): ...
176
def str.replace_substring(array, pattern, replacement): ...
177
def str.match_substring_regex(array, pattern): ...
178
def str.starts_with(array, pattern): ...
179
def str.extract_regex(array, pattern): ...
180
```
181
182
[String Operations →](./string-operations.md)
183
184
### Type System and Metadata
185
186
Rich type system providing precise descriptions of nested data structures, enabling static analysis, optimization, and cross-language interoperability. Includes schema management and metadata handling.
187
188
```python { .api }
189
def type(array): ...
190
def typeof(array): ...
191
class ArrayType: ...
192
class ListType: ...
193
class RecordType: ...
194
class OptionType: ...
195
def with_parameter(array, key, value): ...
196
def parameters(array): ...
197
def validity_error(array): ...
198
```
199
200
[Type System →](./type-system.md)
201
202
### Integration Modules
203
204
Seamless integration with high-performance computing frameworks including Numba JIT compilation, JAX automatic differentiation, and specialized backends for GPU computing and scientific workflows.
205
206
```python { .api }
207
import awkward.numba
208
import awkward.jax
209
import awkward.typetracer
210
def to_backend(array, backend): ...
211
def backend(array): ...
212
```
213
214
[Integration →](./integration.md)
215
216
## Core Classes
217
218
### Array
219
220
The primary user-facing class representing a multi-dimensional, possibly nested array with variable-length sublists and heterogeneous data types.
221
222
```python { .api }
223
class Array:
224
def __init__(self, data, behavior=None): ...
225
def to_list(self): ...
226
def to_numpy(self): ...
227
@property
228
def type(self): ...
229
@property
230
def layout(self): ...
231
def show(self, limit_rows=20): ...
232
```
233
234
### ArrayBuilder
235
236
Incremental array construction with support for complex nested structures and mixed data types.
237
238
```python { .api }
239
class ArrayBuilder:
240
def __init__(self, behavior=None): ...
241
def snapshot(self): ...
242
def list(self): ... # Context manager
243
def record(self): ... # Context manager
244
```
245
246
### Record
247
248
Single record (row) extracted from an Array, providing dict-like access to fields while maintaining type information.
249
250
```python { .api }
251
class Record:
252
def __init__(self, array, at): ...
253
def to_list(self): ...
254
@property
255
def fields(self): ...
256
```