or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-creation.mdarray-manipulation.mddata-conversion.mdindex.mdintegration.mdmathematical-operations.mdstring-operations.mdtype-system.md

index.mddocs/

0

# Awkward Array

1

2

A comprehensive Python library for manipulating JSON-like data with NumPy-like idioms. Awkward Array enables efficient processing of nested, variable-sized data structures commonly found in scientific computing, particularly high-energy physics applications. It provides the performance of NumPy with the flexibility to handle complex, heterogeneous data that doesn't fit into regular arrays.

3

4

## Package Information

5

6

- **Package Name**: awkward

7

- **Language**: Python

8

- **Installation**: `pip install awkward`

9

10

## Core Imports

11

12

```python

13

import awkward as ak

14

```

15

16

For behavior customization:

17

18

```python

19

import awkward.behavior

20

```

21

22

For integration with specific frameworks:

23

24

```python

25

import awkward.numba # Numba JIT compilation

26

import awkward.jax # JAX automatic differentiation

27

```

28

29

## Basic Usage

30

31

```python

32

import awkward as ak

33

import numpy as np

34

35

# Create arrays from Python data

36

nested_list = [[1, 2, 3], [], [4, 5]]

37

array = ak.Array(nested_list)

38

print(array)

39

# [[1, 2, 3], [], [4, 5]]

40

41

# Mathematical operations work element-wise

42

squared = array ** 2

43

print(squared)

44

# [[1, 4, 9], [], [16, 25]]

45

46

# Reduction operations handle variable-length structure

47

sums = ak.sum(array, axis=1)

48

print(sums)

49

# [6, 0, 9]

50

51

# Complex nested structures

52

records = ak.Array([

53

{"x": [1, 2], "y": {"a": 10, "b": 20}},

54

{"x": [3], "y": {"a": 30, "b": 40}}

55

])

56

print(records.x)

57

# [[1, 2], [3]]

58

print(records.y.a)

59

# [10, 30]

60

```

61

62

## Architecture

63

64

Awkward Array's layered architecture provides both performance and flexibility:

65

66

- **High-Level Interface** (`Array`, `Record`, `ArrayBuilder`): User-friendly containers that provide NumPy-like behavior for complex data structures

67

- **Operations Layer**: 180+ functions implementing mathematical, statistical, structural, and I/O operations that work consistently across all data types

68

- **Content Layouts**: Efficient low-level representations (17 content types) that optimize memory usage and computational performance for different data patterns

69

- **Type System**: Rich type information (9 type classes, 13 form classes) enabling static analysis and cross-language interoperability

70

- **Behavior System**: Extensible framework allowing domain-specific customization and method injection

71

- **Backend Integration**: Unified interface supporting CPU, GPU (via CuPy/JAX), and JIT compilation (via Numba)

72

73

This design enables awkward to serve as a bridge between irregular scientific data and the NumPy ecosystem, providing the performance needed for large-scale scientific computing while maintaining the expressiveness required for complex data analysis workflows.

74

75

## Capabilities

76

77

### Array Creation and Construction

78

79

Comprehensive functions for creating arrays from various data sources including Python iterables, NumPy arrays, JSON data, and binary formats. Supports incremental building through ArrayBuilder for complex nested structures.

80

81

```python { .api }

82

def from_iter(iterable, *, allow_record=True, highlevel=True, behavior=None, attrs=None, initial=1024, resize=8): ...

83

def from_numpy(array, highlevel=True, behavior=None): ...

84

def from_json(source, highlevel=True, behavior=None): ...

85

def from_arrow(array, highlevel=True, behavior=None): ...

86

def from_parquet(path, **kwargs): ...

87

class ArrayBuilder:

88

def null(self): ...

89

def boolean(self, x): ...

90

def integer(self, x): ...

91

def real(self, x): ...

92

def complex(self, real, imag=0): ...

93

def string(self, x): ...

94

def bytestring(self, x): ...

95

def datetime(self, x): ...

96

def timedelta(self, x): ...

97

def append(self, x): ...

98

def extend(self, iterable): ...

99

def begin_list(self): ...

100

def end_list(self): ...

101

def begin_tuple(self, numfields): ...

102

def end_tuple(self): ...

103

def begin_record(self, name=None): ...

104

def end_record(self): ...

105

def field(self, key): ...

106

def index(self, i): ...

107

```

108

109

[Array Creation →](./array-creation.md)

110

111

### Array Manipulation and Transformation

112

113

Structural operations for reshaping, filtering, combining, and transforming arrays while preserving type information and handling variable-length data gracefully.

114

115

```python { .api }

116

def concatenate(arrays, axis=0): ...

117

def zip(arrays, depth_limit=None): ...

118

def flatten(array, axis=1): ...

119

def unflatten(array, counts, axis=0): ...

120

def mask(array, selection): ...

121

def combinations(array, n, axis=1): ...

122

def cartesian(arrays, axis=1): ...

123

def with_field(array, what, where): ...

124

def without_field(array, where): ...

125

```

126

127

[Array Manipulation →](./array-manipulation.md)

128

129

### Mathematical and Statistical Operations

130

131

Full suite of mathematical operations including reductions, element-wise functions, linear algebra, and statistical analysis that handle missing data and nested structures appropriately.

132

133

```python { .api }

134

def sum(array, axis=None, *, keepdims=False, mask_identity=False, highlevel=True, behavior=None, attrs=None): ...

135

def mean(array, axis=None, keepdims=False): ...

136

def var(array, axis=None, ddof=0, keepdims=False): ...

137

def std(array, axis=None, ddof=0, keepdims=False): ...

138

def min(array, axis=None, keepdims=False): ...

139

def max(array, axis=None, keepdims=False): ...

140

def argmin(array, axis=None, keepdims=False): ...

141

def argmax(array, axis=None, keepdims=False): ...

142

def linear_fit(x, y, axis=None): ...

143

def corr(x, y, axis=None): ...

144

```

145

146

[Mathematical Operations →](./mathematical-operations.md)

147

148

### Data Conversion and I/O

149

150

Extensive support for reading from and writing to various data formats including Arrow, Parquet, JSON, NumPy, and integration with popular frameworks like PyTorch, TensorFlow, and JAX.

151

152

```python { .api }

153

def to_arrow(array): ...

154

def to_parquet(array, destination, **kwargs): ...

155

def to_numpy(array): ...

156

def to_json(array, **kwargs): ...

157

def to_list(array): ...

158

def from_torch(array): ...

159

def to_torch(array): ...

160

def from_tensorflow(array): ...

161

def to_tensorflow(array): ...

162

def to_dataframe(array): ...

163

```

164

165

[Data Conversion →](./data-conversion.md)

166

167

### String Operations

168

169

Comprehensive string processing capabilities modeled after Apache Arrow's compute functions, providing efficient operations on arrays of strings including pattern matching, transformations, and analysis.

170

171

```python { .api }

172

def str.length(array): ...

173

def str.lower(array): ...

174

def str.upper(array): ...

175

def str.split_pattern(array, pattern): ...

176

def str.replace_substring(array, pattern, replacement): ...

177

def str.match_substring_regex(array, pattern): ...

178

def str.starts_with(array, pattern): ...

179

def str.extract_regex(array, pattern): ...

180

```

181

182

[String Operations →](./string-operations.md)

183

184

### Type System and Metadata

185

186

Rich type system providing precise descriptions of nested data structures, enabling static analysis, optimization, and cross-language interoperability. Includes schema management and metadata handling.

187

188

```python { .api }

189

def type(array): ...

190

def typeof(array): ...

191

class ArrayType: ...

192

class ListType: ...

193

class RecordType: ...

194

class OptionType: ...

195

def with_parameter(array, key, value): ...

196

def parameters(array): ...

197

def validity_error(array): ...

198

```

199

200

[Type System →](./type-system.md)

201

202

### Integration Modules

203

204

Seamless integration with high-performance computing frameworks including Numba JIT compilation, JAX automatic differentiation, and specialized backends for GPU computing and scientific workflows.

205

206

```python { .api }

207

import awkward.numba

208

import awkward.jax

209

import awkward.typetracer

210

def to_backend(array, backend): ...

211

def backend(array): ...

212

```

213

214

[Integration →](./integration.md)

215

216

## Core Classes

217

218

### Array

219

220

The primary user-facing class representing a multi-dimensional, possibly nested array with variable-length sublists and heterogeneous data types.

221

222

```python { .api }

223

class Array:

224

def __init__(self, data, behavior=None): ...

225

def to_list(self): ...

226

def to_numpy(self): ...

227

@property

228

def type(self): ...

229

@property

230

def layout(self): ...

231

def show(self, limit_rows=20): ...

232

```

233

234

### ArrayBuilder

235

236

Incremental array construction with support for complex nested structures and mixed data types.

237

238

```python { .api }

239

class ArrayBuilder:

240

def __init__(self, behavior=None): ...

241

def snapshot(self): ...

242

def list(self): ... # Context manager

243

def record(self): ... # Context manager

244

```

245

246

### Record

247

248

Single record (row) extracted from an Array, providing dict-like access to fields while maintaining type information.

249

250

```python { .api }

251

class Record:

252

def __init__(self, array, at): ...

253

def to_list(self): ...

254

@property

255

def fields(self): ...

256

```