0
# Integration Modules
1
2
Seamless integration with high-performance computing frameworks including Numba JIT compilation, JAX automatic differentiation, and specialized backends for GPU computing and scientific workflows. These integrations enable awkward arrays to participate in high-performance computing pipelines while maintaining their flexible data model.
3
4
## Capabilities
5
6
### Numba Integration
7
8
Just-in-time compilation support for high-performance computing with awkward arrays, enabling compiled functions that work directly with nested data structures.
9
10
```python { .api }
11
import awkward.numba
12
13
def enable_numba():
14
"""
15
Enable Numba integration for awkward arrays.
16
17
This function registers awkward array types with Numba's type system,
18
allowing awkward arrays to be used in @numba.jit decorated functions.
19
"""
20
21
# Numba-compilable operations
22
@numba.jit
23
def compute_with_awkward(array):
24
"""
25
Example of Numba-compiled function working with awkward arrays.
26
27
Parameters:
28
- array: Awkward array that will be compiled
29
30
Returns:
31
Computed result with full JIT performance
32
"""
33
```
34
35
The Numba integration provides:
36
- **Type registration**: Awkward array types are registered with Numba's type inference system
37
- **Layout support**: All awkward layout types can be used in compiled functions
38
- **Memory management**: Proper memory handling for nested structures in compiled code
39
- **Performance**: Near C-speed execution for complex data processing pipelines
40
41
### JAX Integration
42
43
Automatic differentiation and GPU computing support through JAX integration, enabling machine learning and scientific computing workflows.
44
45
```python { .api }
46
import awkward.jax
47
48
def register_jax():
49
"""
50
Register awkward arrays with JAX transformation system.
51
52
Enables awkward arrays to participate in JAX transformations like
53
jit, grad, vmap, and pmap for automatic differentiation and
54
parallelization.
55
"""
56
57
# JAX transformation support
58
def jax_compatible_function(array):
59
"""
60
Function that can be transformed by JAX (jit, grad, etc.).
61
62
Parameters:
63
- array: Awkward array compatible with JAX transformations
64
65
Returns:
66
Result that supports automatic differentiation
67
"""
68
```
69
70
JAX integration features:
71
- **Automatic differentiation**: Compute gradients through nested data operations
72
- **JIT compilation**: Compile functions involving awkward arrays for GPU execution
73
- **Vectorization**: Apply functions across batches of nested data
74
- **Parallelization**: Multi-device execution for large-scale computations
75
76
### Backend Management
77
78
Unified interface for managing computational backends and moving arrays between different execution environments.
79
80
```python { .api }
81
def backend(array):
82
"""
83
Get the computational backend currently used by array.
84
85
Parameters:
86
- array: Array to check backend for
87
88
Returns:
89
str indicating current backend ("cpu", "cuda", "jax", etc.)
90
"""
91
92
def to_backend(array, backend, highlevel=True, behavior=None):
93
"""
94
Move array to specified computational backend.
95
96
Parameters:
97
- array: Array to move
98
- backend: str, target backend name
99
- "cpu": Standard CPU backend using NumPy
100
- "cuda": CUDA backend using CuPy
101
- "jax": JAX backend for automatic differentiation
102
- "typetracer": Type inference backend without data
103
- highlevel: bool, if True return Array, if False return Content layout
104
- behavior: dict, custom behavior for the result
105
106
Returns:
107
Array moved to target backend
108
"""
109
110
def copy_to(array, backend):
111
"""
112
Copy array data to different backend.
113
114
Parameters:
115
- array: Array to copy
116
- backend: str, destination backend
117
118
Returns:
119
Array copy on target backend
120
"""
121
```
122
123
### Type Tracer Integration
124
125
Lazy type inference system that analyzes array operations without materializing data, enabling static analysis and optimization.
126
127
```python { .api }
128
import awkward.typetracer
129
130
class TypeTracer:
131
"""
132
Lazy evaluation system for type inference without data materialization.
133
134
TypeTracer arrays track type information and operations without
135
storing actual data, enabling:
136
- Static type checking
137
- Memory usage analysis
138
- Operation optimization
139
- Schema inference
140
"""
141
142
def touch_data(self, recursive=True):
143
"""
144
Mark data as accessed for dependency tracking.
145
146
Parameters:
147
- recursive: bool, if True mark nested data as touched
148
"""
149
150
def touch_shape(self, recursive=True):
151
"""
152
Mark shape information as accessed.
153
154
Parameters:
155
- recursive: bool, if True mark nested shapes as touched
156
"""
157
158
def typetracer_with_report(array):
159
"""
160
Create type tracer that generates access reports.
161
162
Parameters:
163
- array: Array to create type tracer for
164
165
Returns:
166
tuple of (TypeTracer array, report function)
167
"""
168
169
def typetracer_from_form(form):
170
"""
171
Create type tracer directly from Form description.
172
173
Parameters:
174
- form: Form object describing array structure
175
176
Returns:
177
TypeTracer array matching the form
178
"""
179
```
180
181
### CppYY Integration
182
183
C++ interoperability through cppyy, enabling integration with C++ libraries and ROOT ecosystem common in high-energy physics.
184
185
```python { .api }
186
import awkward.cppyy
187
188
def register_cppyy():
189
"""
190
Register awkward types with cppyy for C++ interoperability.
191
192
Enables:
193
- Passing awkward arrays to C++ functions
194
- Converting C++ containers to awkward arrays
195
- Integration with ROOT data analysis framework
196
- Zero-copy data sharing where possible
197
"""
198
199
def cpp_interface(array):
200
"""
201
Create C++-compatible interface for array.
202
203
Parameters:
204
- array: Awkward array to create C++ interface for
205
206
Returns:
207
C++-compatible proxy object
208
"""
209
```
210
211
### GPU Computing Support
212
213
Functions for GPU-accelerated computing using CUDA and related frameworks.
214
215
```python { .api }
216
def to_cuda(array):
217
"""
218
Move array to CUDA GPU memory.
219
220
Parameters:
221
- array: Array to move to GPU
222
223
Returns:
224
Array with data in GPU memory
225
"""
226
227
def from_cuda(array):
228
"""
229
Move array from GPU to CPU memory.
230
231
Parameters:
232
- array: GPU array to move to CPU
233
234
Returns:
235
Array with data in CPU memory
236
"""
237
238
def is_cuda(array):
239
"""
240
Test if array data resides in GPU memory.
241
242
Parameters:
243
- array: Array to test
244
245
Returns:
246
bool indicating if array is on GPU
247
"""
248
```
249
250
### Framework-Specific Integration Utilities
251
252
Helper functions for specific integration scenarios and framework compatibility.
253
254
```python { .api }
255
def numba_array_typer(array_type):
256
"""
257
Create Numba type signature for awkward array type.
258
259
Parameters:
260
- array_type: Awkward array type
261
262
Returns:
263
Numba type signature for compilation
264
"""
265
266
def jax_pytree_flatten(array):
267
"""
268
Flatten awkward array for JAX pytree operations.
269
270
Parameters:
271
- array: Array to flatten
272
273
Returns:
274
tuple of (leaves, tree_def) for JAX pytree operations
275
"""
276
277
def jax_pytree_unflatten(tree_def, leaves):
278
"""
279
Reconstruct awkward array from JAX pytree components.
280
281
Parameters:
282
- tree_def: Tree definition from flatten operation
283
- leaves: Leaf values from flatten operation
284
285
Returns:
286
Reconstructed awkward array
287
"""
288
289
def dispatch_map():
290
"""
291
Get mapping of operations to backend-specific implementations.
292
293
Returns:
294
dict mapping operation names to backend implementations
295
"""
296
```
297
298
### Performance Optimization Utilities
299
300
Tools for analyzing and optimizing performance across different backends and integration scenarios.
301
302
```python { .api }
303
def benchmark_backends(array, operation, backends=None):
304
"""
305
Benchmark operation performance across different backends.
306
307
Parameters:
308
- array: Array to benchmark with
309
- operation: Function to benchmark
310
- backends: list of str, backends to test (None for all available)
311
312
Returns:
313
dict mapping backend names to timing results
314
"""
315
316
def memory_usage(array, backend=None):
317
"""
318
Analyze memory usage of array on specified backend.
319
320
Parameters:
321
- array: Array to analyze
322
- backend: str, backend to check (None for current)
323
324
Returns:
325
dict with memory usage statistics
326
"""
327
328
def optimize_for_backend(array, backend, operation_hint=None):
329
"""
330
Optimize array layout for specific backend and operation.
331
332
Parameters:
333
- array: Array to optimize
334
- backend: str, target backend
335
- operation_hint: str, hint about intended operations
336
337
Returns:
338
Array optimized for target backend
339
"""
340
```
341
342
## Usage Examples
343
344
### Numba JIT Compilation
345
346
```python
347
import awkward as ak
348
import numba
349
import numpy as np
350
351
# Enable numba integration
352
ak.numba.register()
353
354
@numba.jit
355
def fast_computation(events):
356
"""JIT-compiled function working with nested data."""
357
total = 0.0
358
for event in events:
359
for particle in event.particles:
360
if particle.pt > 10.0:
361
total += particle.pt * particle.pt
362
return total
363
364
# Use with nested data
365
events = ak.Array([
366
{"particles": [{"pt": 15.0}, {"pt": 5.0}]},
367
{"particles": [{"pt": 25.0}, {"pt": 12.0}]}
368
])
369
370
result = fast_computation(events) # Runs at compiled speed
371
```
372
373
### JAX Integration
374
375
```python
376
import awkward as ak
377
import jax
378
import jax.numpy as jnp
379
380
# Register awkward arrays as JAX pytrees
381
ak.jax.register()
382
383
def physics_calculation(events):
384
"""Function that can be JAX-transformed."""
385
pts = events.particles.pt
386
return ak.sum(pts * pts, axis=1)
387
388
# Apply JAX transformations
389
jit_calc = jax.jit(physics_calculation)
390
vectorized_calc = jax.vmap(physics_calculation)
391
392
# Automatic differentiation
393
def loss_function(events, weights):
394
result = physics_calculation(events)
395
return jnp.sum(result * weights)
396
397
gradient_fn = jax.grad(loss_function, argnums=1)
398
```
399
400
### Backend Management
401
402
```python
403
import awkward as ak
404
import cupy as cp
405
406
# Create array on CPU
407
cpu_array = ak.Array([[1, 2, 3], [4, 5]])
408
print(ak.backend(cpu_array)) # "cpu"
409
410
# Move to GPU
411
gpu_array = ak.to_backend(cpu_array, "cuda")
412
print(ak.backend(gpu_array)) # "cuda"
413
414
# Check if CUDA is available
415
if cp.cuda.is_available():
416
# Perform GPU computation
417
gpu_result = ak.sum(gpu_array * gpu_array)
418
419
# Move result back to CPU
420
cpu_result = ak.to_backend(gpu_result, "cpu")
421
```
422
423
### Type Tracing for Optimization
424
425
```python
426
import awkward as ak
427
428
# Create type tracer for schema analysis
429
form = ak.forms.RecordForm([
430
ak.forms.ListForm("i64", "i64", ak.forms.NumpyForm("float64")),
431
ak.forms.NumpyForm("int32")
432
], ["particles", "event_id"])
433
434
tracer = ak.typetracer.typetracer_from_form(form)
435
436
def analyze_operation(data):
437
"""Function to analyze without data."""
438
return ak.sum(data.particles, axis=1) + data.event_id
439
440
# Trace operation to understand access patterns
441
traced_result = analyze_operation(tracer)
442
print(f"Result type: {ak.type(traced_result)}")
443
```
444
445
### C++ Integration via CppYY
446
447
```python
448
import awkward as ak
449
import cppyy
450
451
# Register awkward arrays with cppyy
452
ak.cppyy.register()
453
454
# Define C++ function (example)
455
cppyy.cppdef("""
456
double compute_mass(const std::vector<double>& pt,
457
const std::vector<double>& eta) {
458
double total = 0.0;
459
for(size_t i = 0; i < pt.size(); ++i) {
460
total += pt[i] * cosh(eta[i]);
461
}
462
return total;
463
}
464
""")
465
466
# Use with awkward arrays
467
particles = ak.Array({
468
"pt": [[10.0, 20.0], [15.0]],
469
"eta": [[1.0, 0.5], [1.2]]
470
})
471
472
# Convert to C++ compatible format and call
473
for event in particles:
474
mass = cppyy.gbl.compute_mass(event.pt, event.eta)
475
print(f"Event mass: {mass}")
476
```
477
478
### Performance Benchmarking
479
480
```python
481
import awkward as ak
482
import time
483
484
# Create test data
485
large_array = ak.Array([
486
[i + j for j in range(1000)]
487
for i in range(1000)
488
])
489
490
def benchmark_operation(array, backend_name):
491
"""Benchmark array operation on specific backend."""
492
# Move to backend
493
backend_array = ak.to_backend(array, backend_name)
494
495
# Time the operation
496
start = time.time()
497
result = ak.sum(backend_array * backend_array, axis=1)
498
end = time.time()
499
500
return end - start
501
502
# Compare backends
503
backends = ["cpu"]
504
if ak.backend.cuda_available():
505
backends.append("cuda")
506
if ak.backend.jax_available():
507
backends.append("jax")
508
509
for backend in backends:
510
duration = benchmark_operation(large_array, backend)
511
print(f"{backend}: {duration:.4f} seconds")
512
```
513
514
### Integration Best Practices
515
516
```python
517
import awkward as ak
518
519
def optimize_for_computation(array, target_backend="cpu", operation="reduction"):
520
"""Optimize array for specific computation pattern."""
521
522
# Pack array for better memory layout
523
packed = ak.to_packed(array)
524
525
# Move to target backend
526
backend_array = ak.to_backend(packed, target_backend)
527
528
# Apply operation-specific optimizations
529
if operation == "reduction" and target_backend == "cuda":
530
# Use specific CUDA optimizations
531
return ak.with_parameter(backend_array, "gpu_optimized", True)
532
533
return backend_array
534
535
# Example usage
536
data = ak.Array([[1, 2, 3], [4, 5, 6, 7], [8, 9]])
537
optimized = optimize_for_computation(data, "cuda", "reduction")
538
result = ak.sum(optimized, axis=1) # Runs with optimizations
539
```