0
# Reduction and Aggregation Operations
1
2
Functions for computing statistics and aggregations along specified axes, including standard reductions and NaN-aware variants. These operations efficiently compute summary statistics while preserving computational efficiency on sparse data.
3
4
## Capabilities
5
6
### Standard Reduction Operations
7
8
Core statistical functions that operate along specified axes or across entire arrays.
9
10
```python { .api }
11
def sum(a, axis=None, keepdims=False):
12
"""
13
Compute sum of array elements along specified axis.
14
15
Parameters:
16
- a: sparse array, input array
17
- axis: int or tuple, axis/axes along which to sum (None for all elements)
18
- keepdims: bool, whether to preserve dimensions in result
19
20
Returns:
21
Sparse array or scalar with sum of elements
22
"""
23
24
def prod(a, axis=None, keepdims=False):
25
"""
26
Compute product of array elements along specified axis.
27
28
Parameters:
29
- a: sparse array, input array
30
- axis: int or tuple, axis/axes along which to compute product
31
- keepdims: bool, whether to preserve dimensions in result
32
33
Returns:
34
Sparse array or scalar with product of elements
35
"""
36
37
def mean(a, axis=None, keepdims=False):
38
"""
39
Compute arithmetic mean along specified axis.
40
41
Parameters:
42
- a: sparse array, input array
43
- axis: int or tuple, axis/axes along which to compute mean
44
- keepdims: bool, whether to preserve dimensions in result
45
46
Returns:
47
Sparse array or scalar with mean values
48
"""
49
50
def var(a, axis=None, keepdims=False, ddof=0):
51
"""
52
Compute variance along specified axis.
53
54
Parameters:
55
- a: sparse array, input array
56
- axis: int or tuple, axis/axes along which to compute variance
57
- keepdims: bool, whether to preserve dimensions in result
58
- ddof: int, delta degrees of freedom for sample variance
59
60
Returns:
61
Sparse array or scalar with variance values
62
"""
63
64
def std(a, axis=None, keepdims=False, ddof=0):
65
"""
66
Compute standard deviation along specified axis.
67
68
Parameters:
69
- a: sparse array, input array
70
- axis: int or tuple, axis/axes along which to compute std
71
- keepdims: bool, whether to preserve dimensions in result
72
- ddof: int, delta degrees of freedom for sample std
73
74
Returns:
75
Sparse array or scalar with standard deviation values
76
"""
77
```
78
79
### Min/Max Operations
80
81
Functions for finding minimum and maximum values and their locations.
82
83
```python { .api }
84
def max(a, axis=None, keepdims=False):
85
"""
86
Find maximum values along specified axis.
87
88
Parameters:
89
- a: sparse array, input array
90
- axis: int or tuple, axis/axes along which to find maximum
91
- keepdims: bool, whether to preserve dimensions in result
92
93
Returns:
94
Sparse array or scalar with maximum values
95
"""
96
97
def min(a, axis=None, keepdims=False):
98
"""
99
Find minimum values along specified axis.
100
101
Parameters:
102
- a: sparse array, input array
103
- axis: int or tuple, axis/axes along which to find minimum
104
- keepdims: bool, whether to preserve dimensions in result
105
106
Returns:
107
Sparse array or scalar with minimum values
108
"""
109
110
def argmax(a, axis=None, keepdims=False):
111
"""
112
Find indices of maximum values along axis.
113
114
Parameters:
115
- a: sparse array, input array
116
- axis: int, axis along which to find argmax (None for global)
117
- keepdims: bool, whether to preserve dimensions in result
118
119
Returns:
120
Array with indices of maximum values
121
"""
122
123
def argmin(a, axis=None, keepdims=False):
124
"""
125
Find indices of minimum values along axis.
126
127
Parameters:
128
- a: sparse array, input array
129
- axis: int, axis along which to find argmin (None for global)
130
- keepdims: bool, whether to preserve dimensions in result
131
132
Returns:
133
Array with indices of minimum values
134
"""
135
```
136
137
### Boolean Reductions
138
139
Logical reduction operations for boolean arrays and conditions.
140
141
```python { .api }
142
def all(a, axis=None, keepdims=False):
143
"""
144
Test whether all array elements along axis evaluate to True.
145
146
Parameters:
147
- a: sparse array, input array (typically boolean)
148
- axis: int or tuple, axis/axes along which to test
149
- keepdims: bool, whether to preserve dimensions in result
150
151
Returns:
152
Sparse boolean array or scalar, True where all elements are True
153
"""
154
155
def any(a, axis=None, keepdims=False):
156
"""
157
Test whether any array element along axis evaluates to True.
158
159
Parameters:
160
- a: sparse array, input array (typically boolean)
161
- axis: int or tuple, axis/axes along which to test
162
- keepdims: bool, whether to preserve dimensions in result
163
164
Returns:
165
Sparse boolean array or scalar, True where any element is True
166
"""
167
```
168
169
### NaN-Aware Reductions
170
171
Specialized reduction functions that ignore NaN values in computations.
172
173
```python { .api }
174
def nansum(a, axis=None, keepdims=False):
175
"""
176
Compute sum along axis, ignoring NaN values.
177
178
Parameters:
179
- a: sparse array, input array
180
- axis: int or tuple, axis/axes along which to sum
181
- keepdims: bool, whether to preserve dimensions in result
182
183
Returns:
184
Sparse array or scalar with sum ignoring NaN values
185
"""
186
187
def nanprod(a, axis=None, keepdims=False):
188
"""
189
Compute product along axis, ignoring NaN values.
190
191
Parameters:
192
- a: sparse array, input array
193
- axis: int or tuple, axis/axes along which to compute product
194
- keepdims: bool, whether to preserve dimensions in result
195
196
Returns:
197
Sparse array or scalar with product ignoring NaN values
198
"""
199
200
def nanmean(a, axis=None, keepdims=False):
201
"""
202
Compute mean along axis, ignoring NaN values.
203
204
Parameters:
205
- a: sparse array, input array
206
- axis: int or tuple, axis/axes along which to compute mean
207
- keepdims: bool, whether to preserve dimensions in result
208
209
Returns:
210
Sparse array or scalar with mean ignoring NaN values
211
"""
212
213
def nanmax(a, axis=None, keepdims=False):
214
"""
215
Find maximum along axis, ignoring NaN values.
216
217
Parameters:
218
- a: sparse array, input array
219
- axis: int or tuple, axis/axes along which to find maximum
220
- keepdims: bool, whether to preserve dimensions in result
221
222
Returns:
223
Sparse array or scalar with maximum ignoring NaN values
224
"""
225
226
def nanmin(a, axis=None, keepdims=False):
227
"""
228
Find minimum along axis, ignoring NaN values.
229
230
Parameters:
231
- a: sparse array, input array
232
- axis: int or tuple, axis/axes along which to find minimum
233
- keepdims: bool, whether to preserve dimensions in result
234
235
Returns:
236
Sparse array or scalar with minimum ignoring NaN values
237
"""
238
239
def nanreduce(a, func, axis=None, keepdims=False):
240
"""
241
Generic reduction function that ignores NaN values.
242
243
Parameters:
244
- a: sparse array, input array
245
- func: callable, reduction function to apply
246
- axis: int or tuple, axis/axes along which to reduce
247
- keepdims: bool, whether to preserve dimensions in result
248
249
Returns:
250
Result of applying func along axis, ignoring NaN values
251
"""
252
```
253
254
## Usage Examples
255
256
### Basic Reductions
257
258
```python
259
import sparse
260
import numpy as np
261
262
# Create test array
263
test_array = sparse.COO.from_numpy(
264
np.array([[1, 0, 3, 0], [5, 2, 0, 4], [0, 0, 6, 1]])
265
)
266
print(f"Test array shape: {test_array.shape}")
267
print(f"Test array nnz: {test_array.nnz}")
268
269
# Global reductions (entire array)
270
total_sum = sparse.sum(test_array)
271
mean_value = sparse.mean(test_array)
272
max_value = sparse.max(test_array)
273
min_value = sparse.min(test_array)
274
275
print(f"Total sum: {total_sum.todense()}") # 22
276
print(f"Mean: {mean_value.todense():.2f}") # 1.83
277
print(f"Max: {max_value.todense()}") # 6
278
print(f"Min: {min_value.todense()}") # 0 (sparse arrays include zeros)
279
```
280
281
### Axis-Specific Reductions
282
283
```python
284
# Row-wise reductions (axis=1)
285
row_sums = sparse.sum(test_array, axis=1)
286
row_means = sparse.mean(test_array, axis=1)
287
row_max = sparse.max(test_array, axis=1)
288
289
print(f"Row sums shape: {row_sums.shape}") # (3,)
290
print(f"Row sums: {row_sums.todense()}") # [4, 11, 7]
291
print(f"Row means: {row_means.todense()}") # [1.0, 2.75, 1.75]
292
293
# Column-wise reductions (axis=0)
294
col_sums = sparse.sum(test_array, axis=0)
295
col_means = sparse.mean(test_array, axis=0)
296
297
print(f"Column sums shape: {col_sums.shape}") # (4,)
298
print(f"Column sums: {col_sums.todense()}") # [6, 2, 9, 5]
299
```
300
301
### Keepdims Parameter
302
303
```python
304
# Compare results with and without keepdims
305
row_sums_keepdims = sparse.sum(test_array, axis=1, keepdims=True)
306
row_sums_no_keepdims = sparse.sum(test_array, axis=1, keepdims=False)
307
308
print(f"With keepdims: {row_sums_keepdims.shape}") # (3, 1)
309
print(f"Without keepdims: {row_sums_no_keepdims.shape}") # (3,)
310
311
# Keepdims useful for broadcasting
312
normalized = test_array / row_sums_keepdims # Broadcasting works
313
print(f"Normalized array shape: {normalized.shape}")
314
```
315
316
### Multiple Axis Reductions
317
318
```python
319
# Create 3D array for multi-axis reductions
320
array_3d = sparse.random((4, 5, 6), density=0.2)
321
322
# Reduce along multiple axes
323
sum_axes_01 = sparse.sum(array_3d, axis=(0, 1)) # Sum over first two axes
324
mean_axes_02 = sparse.mean(array_3d, axis=(0, 2)) # Mean over first and last axes
325
326
print(f"Original shape: {array_3d.shape}") # (4, 5, 6)
327
print(f"Sum axes (0,1): {sum_axes_01.shape}") # (6,)
328
print(f"Mean axes (0,2): {mean_axes_02.shape}") # (5,)
329
330
# All axes - equivalent to global reduction
331
sum_all_axes = sparse.sum(array_3d, axis=(0, 1, 2))
332
sum_global = sparse.sum(array_3d)
333
print(f"All axes equal global: {np.isclose(sum_all_axes.todense(), sum_global.todense())}")
334
```
335
336
### Statistical Measures
337
338
```python
339
# Variance and standard deviation
340
data = sparse.random((100, 50), density=0.1)
341
342
variance = sparse.var(data, axis=0) # Column-wise variance
343
std_dev = sparse.std(data, axis=0) # Column-wise standard deviation
344
std_sample = sparse.std(data, axis=0, ddof=1) # Sample standard deviation
345
346
print(f"Population std vs sample std:")
347
print(f"Population: {sparse.mean(std_dev).todense():.4f}")
348
print(f"Sample: {sparse.mean(std_sample).todense():.4f}")
349
350
# Verify relationship: std = sqrt(var)
351
print(f"Std² ≈ Var: {np.allclose((std_dev ** 2).todense(), variance.todense())}")
352
```
353
354
### Index Finding Operations
355
356
```python
357
# Find locations of extreme values
358
large_array = sparse.random((20, 30), density=0.05)
359
360
# Global argmax/argmin
361
global_max_idx = sparse.argmax(large_array)
362
global_min_idx = sparse.argmin(large_array)
363
364
print(f"Global max index: {global_max_idx}")
365
print(f"Global min index: {global_min_idx}")
366
367
# Axis-specific argmax/argmin
368
row_max_indices = sparse.argmax(large_array, axis=1) # Max in each row
369
col_max_indices = sparse.argmax(large_array, axis=0) # Max in each column
370
371
print(f"Row max indices shape: {row_max_indices.shape}") # (20,)
372
print(f"Column max indices shape: {col_max_indices.shape}") # (30,)
373
```
374
375
### Boolean Reductions
376
377
```python
378
# Create boolean conditions
379
condition_array = sparse.greater(test_array, 2)
380
print(f"Elements > 2:")
381
print(condition_array.todense())
382
383
# Boolean reductions
384
any_gt_2 = sparse.any(condition_array) # Any element > 2?
385
all_gt_2 = sparse.all(condition_array) # All elements > 2?
386
387
any_rows = sparse.any(condition_array, axis=1) # Any > 2 in each row?
388
all_cols = sparse.all(condition_array, axis=0) # All > 2 in each column?
389
390
print(f"Any > 2: {any_gt_2.todense()}") # True
391
print(f"All > 2: {all_gt_2.todense()}") # False
392
print(f"Any per row: {any_rows.todense()}") # [True, True, True]
393
print(f"All per column: {all_cols.todense()}") # [False, False, False, False]
394
```
395
396
### NaN-Aware Reductions
397
398
```python
399
# Create array with NaN values
400
array_with_nan = sparse.COO.from_numpy(
401
np.array([[1.0, np.nan, 3.0], [4.0, 2.0, np.nan], [np.nan, 5.0, 6.0]])
402
)
403
404
# Compare standard vs NaN-aware reductions
405
regular_sum = sparse.sum(array_with_nan, axis=1)
406
nan_aware_sum = sparse.nansum(array_with_nan, axis=1)
407
408
regular_mean = sparse.mean(array_with_nan, axis=1)
409
nan_aware_mean = sparse.nanmean(array_with_nan, axis=1)
410
411
print("Regular vs NaN-aware reductions:")
412
print(f"Regular sum: {regular_sum.todense()}") # Contains NaN
413
print(f"NaN-aware sum: {nan_aware_sum.todense()}") # Ignores NaN
414
print(f"Regular mean: {regular_mean.todense()}") # Contains NaN
415
print(f"NaN-aware mean: {nan_aware_mean.todense()}") # Ignores NaN
416
```
417
418
### Custom Reductions
419
420
```python
421
# Using nanreduce for custom operations
422
def geometric_mean_func(arr):
423
"""Custom geometric mean function"""
424
return np.exp(np.mean(np.log(arr)))
425
426
# Apply custom reduction (avoiding zeros for log)
427
positive_array = sparse.random((10, 10), density=0.1) + 0.1
428
429
# Use nanreduce with custom function
430
custom_result = sparse.nanreduce(positive_array, geometric_mean_func, axis=0)
431
print(f"Custom geometric mean shape: {custom_result.shape}")
432
```
433
434
### Large-Scale Reductions
435
436
```python
437
# Efficient reductions on large sparse arrays
438
large_sparse = sparse.random((10000, 5000), density=0.001) # Very sparse
439
440
# These operations are memory efficient due to sparsity
441
row_sums_large = sparse.sum(large_sparse, axis=1)
442
col_means_large = sparse.mean(large_sparse, axis=0)
443
444
print(f"Large array: {large_sparse.shape}, density: {large_sparse.density:.4%}")
445
print(f"Row sums nnz: {row_sums_large.nnz} / {row_sums_large.size}")
446
print(f"Col means nnz: {col_means_large.nnz} / {col_means_large.size}")
447
448
# Global statistics are single values
449
global_stats = {
450
'sum': sparse.sum(large_sparse).todense(),
451
'mean': sparse.mean(large_sparse).todense(),
452
'std': sparse.std(large_sparse).todense(),
453
'max': sparse.max(large_sparse).todense(),
454
'min': sparse.min(large_sparse).todense()
455
}
456
457
print("Global statistics:", global_stats)
458
```
459
460
### Performance Considerations for Sparse Reductions
461
462
```python
463
# Demonstrating sparsity preservation in reductions
464
original = sparse.random((1000, 1000), density=0.01)
465
print(f"Original density: {original.density:.2%}")
466
467
# Reductions along different axes have different density implications
468
axis0_reduction = sparse.sum(original, axis=0) # Often denser
469
axis1_reduction = sparse.sum(original, axis=1) # Often denser
470
global_reduction = sparse.sum(original) # Single value
471
472
print(f"Axis-0 reduction nnz: {axis0_reduction.nnz} / {axis0_reduction.size}")
473
print(f"Axis-1 reduction nnz: {axis1_reduction.nnz} / {axis1_reduction.size}")
474
print(f"Global reduction: {global_reduction.todense()}")
475
```
476
477
## Performance and Memory Considerations
478
479
### Computational Efficiency
480
481
- **Sparse structure**: Operations only compute on stored (non-zero) elements
482
- **Axis selection**: Different axes may have different computational costs
483
- **Memory usage**: Reductions typically produce denser results than inputs
484
- **Keepdims**: Can enable efficient broadcasting in subsequent operations
485
486
### Optimization Tips
487
488
- Use axis-specific reductions when possible for better memory efficiency
489
- Consider using `keepdims=True` when the result will be used for broadcasting
490
- NaN-aware functions have additional overhead but handle missing data correctly
491
- Boolean reductions (`any`, `all`) can short-circuit for efficiency