0
# Reduction Functions
1
2
Statistical and aggregation functions that reduce arrays along specified axes. These functions provide optimized NaN handling and support for all common statistical operations, delivering significant performance improvements over standard NumPy implementations.
3
4
## Capabilities
5
6
### Sum Functions
7
8
Compute sums of array elements with optional NaN handling and axis specification.
9
10
```python { .api }
11
def nansum(a, axis=None):
12
"""
13
Sum of array elements over given axis, ignoring NaNs.
14
15
Parameters:
16
- a: array_like, input array
17
- axis: None or int or tuple of ints, axis along which to sum
18
19
Returns:
20
ndarray or scalar, sum of array elements
21
"""
22
```
23
24
### Mean Functions
25
26
Calculate arithmetic means with NaN-aware implementations.
27
28
```python { .api }
29
def nanmean(a, axis=None):
30
"""
31
Compute arithmetic mean along specified axis, ignoring NaNs.
32
33
Parameters:
34
- a: array_like, input array
35
- axis: None or int or tuple of ints, axis along which to compute mean
36
37
Returns:
38
ndarray or scalar, arithmetic mean
39
"""
40
```
41
42
### Standard Deviation and Variance
43
44
Statistical dispersion measures with delta degrees of freedom support.
45
46
```python { .api }
47
def nanstd(a, axis=None, ddof=0):
48
"""
49
Compute standard deviation along specified axis, ignoring NaNs.
50
51
Parameters:
52
- a: array_like, input array
53
- axis: None or int or tuple of ints, axis along which to compute std
54
- ddof: int, delta degrees of freedom (default 0)
55
56
Returns:
57
ndarray or scalar, standard deviation
58
"""
59
60
def nanvar(a, axis=None, ddof=0):
61
"""
62
Compute variance along specified axis, ignoring NaNs.
63
64
Parameters:
65
- a: array_like, input array
66
- axis: None or int or tuple of ints, axis along which to compute variance
67
- ddof: int, delta degrees of freedom (default 0)
68
69
Returns:
70
ndarray or scalar, variance
71
"""
72
```
73
74
### Minimum and Maximum Functions
75
76
Find extreme values with NaN handling and index location support.
77
78
```python { .api }
79
def nanmin(a, axis=None):
80
"""
81
Minimum values along axis, ignoring NaNs.
82
83
Parameters:
84
- a: array_like, input array
85
- axis: None or int or tuple of ints, axis along which to find minimum
86
87
Returns:
88
ndarray or scalar, minimum values
89
"""
90
91
def nanmax(a, axis=None):
92
"""
93
Maximum values along axis, ignoring NaNs.
94
95
Parameters:
96
- a: array_like, input array
97
- axis: None or int or tuple of ints, axis along which to find maximum
98
99
Returns:
100
ndarray or scalar, maximum values
101
"""
102
103
def nanargmin(a, axis=None):
104
"""
105
Indices of minimum values along axis, ignoring NaNs.
106
107
Parameters:
108
- a: array_like, input array
109
- axis: None or int or tuple of ints, axis along which to find indices
110
111
Returns:
112
ndarray or scalar, indices of minimum values
113
"""
114
115
def nanargmax(a, axis=None):
116
"""
117
Indices of maximum values along axis, ignoring NaNs.
118
119
Parameters:
120
- a: array_like, input array
121
- axis: None or int or tuple of ints, axis along which to find indices
122
123
Returns:
124
ndarray or scalar, indices of maximum values
125
"""
126
```
127
128
### Median Functions
129
130
Robust central tendency measures with NaN support.
131
132
```python { .api }
133
def median(a, axis=None):
134
"""
135
Compute median along specified axis.
136
137
Parameters:
138
- a: array_like, input array
139
- axis: None or int or tuple of ints, axis along which to compute median
140
141
Returns:
142
ndarray or scalar, median values
143
"""
144
145
def nanmedian(a, axis=None):
146
"""
147
Compute median along specified axis, ignoring NaNs.
148
149
Parameters:
150
- a: array_like, input array
151
- axis: None or int or tuple of ints, axis along which to compute median
152
153
Returns:
154
ndarray or scalar, median values
155
"""
156
```
157
158
### Utility Functions
159
160
Specialized reduction operations for specific use cases.
161
162
```python { .api }
163
def ss(a, axis=None):
164
"""
165
Sum of squares of array elements.
166
167
Parameters:
168
- a: array_like, input array
169
- axis: None or int or tuple of ints, axis along which to sum squares
170
171
Returns:
172
ndarray or scalar, sum of squares
173
"""
174
175
def anynan(a, axis=None):
176
"""
177
Test whether any array element along axis is NaN.
178
179
Parameters:
180
- a: array_like, input array
181
- axis: None or int or tuple of ints, axis along which to test
182
183
Returns:
184
ndarray or bool, True if any element is NaN
185
"""
186
187
def allnan(a, axis=None):
188
"""
189
Test whether all array elements along axis are NaN.
190
191
Parameters:
192
- a: array_like, input array
193
- axis: None or int or tuple of ints, axis along which to test
194
195
Returns:
196
ndarray or bool, True if all elements are NaN
197
"""
198
```
199
200
## Usage Examples
201
202
### Basic Statistical Analysis
203
204
```python
205
import bottleneck as bn
206
import numpy as np
207
208
# Create data with missing values
209
data = np.array([[1.0, 2.0, np.nan],
210
[4.0, np.nan, 6.0],
211
[7.0, 8.0, 9.0]])
212
213
# Compute statistics ignoring NaN
214
mean_val = bn.nanmean(data) # Overall mean: 5.25
215
row_means = bn.nanmean(data, axis=1) # Per-row means: [1.5, 5.0, 8.0]
216
col_means = bn.nanmean(data, axis=0) # Per-column means: [4.0, 5.0, 7.5]
217
218
# Find extremes with their locations
219
min_val = bn.nanmin(data) # 1.0
220
min_idx = bn.nanargmin(data) # 0 (flattened index)
221
max_val = bn.nanmax(data) # 9.0
222
max_idx = bn.nanargmax(data) # 8 (flattened index)
223
```
224
225
### Checking for Missing Data
226
227
```python
228
import bottleneck as bn
229
import numpy as np
230
231
# Sample data with various NaN patterns
232
complete_row = np.array([1, 2, 3, 4, 5])
233
partial_nans = np.array([1, np.nan, 3, np.nan, 5])
234
all_nans = np.array([np.nan, np.nan, np.nan])
235
236
# Test for any NaN presence
237
bn.anynan(complete_row) # False
238
bn.anynan(partial_nans) # True
239
bn.anynan(all_nans) # True
240
241
# Test if all values are NaN
242
bn.allnan(complete_row) # False
243
bn.allnan(partial_nans) # False
244
bn.allnan(all_nans) # True
245
```
246
247
### Robust Statistical Measures
248
249
```python
250
import bottleneck as bn
251
import numpy as np
252
253
# Time series data with outliers and missing values
254
timeseries = np.array([10, 12, np.nan, 15, 100, 11, 13, np.nan, 14])
255
256
# Robust measures less affected by outliers
257
median_val = bn.nanmedian(timeseries) # 13.0 (robust central tendency)
258
mean_val = bn.nanmean(timeseries) # 25.0 (affected by outlier 100)
259
260
# Dispersion measures
261
std_val = bn.nanstd(timeseries) # Standard deviation
262
var_val = bn.nanvar(timeseries) # Variance
263
264
# Population vs sample statistics (using ddof parameter)
265
pop_std = bn.nanstd(timeseries, ddof=0) # Population standard deviation
266
sample_std = bn.nanstd(timeseries, ddof=1) # Sample standard deviation
267
```