0
# Array Manipulation Functions
1
2
Utilities for array transformation, ranking, and data manipulation operations that maintain array structure while modifying values or order. These functions provide specialized operations for data preprocessing and analysis workflows.
3
4
## Capabilities
5
6
### Value Replacement
7
8
In-place replacement of array values with optimized performance.
9
10
```python { .api }
11
def replace(a, old, new):
12
"""
13
Replace values in array in-place.
14
15
Replaces all occurrences of 'old' value with 'new' value in array 'a'.
16
Supports NaN replacement and handles type casting for integer arrays.
17
18
Parameters:
19
- a: numpy.ndarray, input array to modify (modified in-place)
20
- old: scalar, value to replace (can be NaN for float arrays)
21
- new: scalar, replacement value
22
23
Returns:
24
None (array is modified in-place)
25
26
Raises:
27
TypeError: if 'a' is not a numpy array
28
ValueError: if type casting is not safe for integer arrays
29
"""
30
```
31
32
### Ranking Functions
33
34
Assign ranks to array elements with support for ties and missing values.
35
36
```python { .api }
37
def rankdata(a, axis=None):
38
"""
39
Assign ranks to data, dealing with ties appropriately.
40
41
Returns the ranks of the elements in the array. Ranks begin at 1.
42
Ties are resolved by averaging the ranks of tied elements.
43
44
Parameters:
45
- a: array_like, input array to rank
46
- axis: None or int, axis along which to rank (None for flattened array)
47
48
Returns:
49
ndarray, array of ranks (float64 dtype)
50
"""
51
52
def nanrankdata(a, axis=None):
53
"""
54
Assign ranks to data, ignoring NaN values.
55
56
Similar to rankdata but ignores NaN values in the ranking process.
57
NaN values in the output array correspond to NaN values in the input.
58
59
Parameters:
60
- a: array_like, input array to rank
61
- axis: None or int, axis along which to rank (None for flattened array)
62
63
Returns:
64
ndarray, array of ranks with NaN preserved (float64 dtype)
65
"""
66
```
67
68
### Partitioning Functions
69
70
Partial sorting operations for efficient selection of order statistics.
71
72
```python { .api }
73
def partition(a, kth, axis=-1):
74
"""
75
Partial sort array along given axis.
76
77
Rearranges array elements such that the k-th element is in its final
78
sorted position. Elements smaller than k-th are before it, larger after.
79
This is a re-export of numpy.partition for convenience.
80
81
Parameters:
82
- a: array_like, input array
83
- kth: int or sequence of ints, indices that define the partition
84
- axis: int, axis along which to partition (default: -1)
85
86
Returns:
87
ndarray, partitioned array
88
"""
89
90
def argpartition(a, kth, axis=-1):
91
"""
92
Indices that would partition array along given axis.
93
94
Returns indices that would partition the array, similar to partition
95
but returning indices rather than the partitioned array.
96
This is a re-export of numpy.argpartition for convenience.
97
98
Parameters:
99
- a: array_like, input array
100
- kth: int or sequence of ints, indices that define the partition
101
- axis: int, axis along which to find partition indices (default: -1)
102
103
Returns:
104
ndarray, indices that would partition the array
105
"""
106
```
107
108
### Forward Fill Function
109
110
Propagate valid values forward to fill missing data gaps.
111
112
```python { .api }
113
def push(a, n=None, axis=-1):
114
"""
115
Fill NaN values by pushing forward the last valid value.
116
117
Forward-fills NaN values with the most recent non-NaN value along the
118
specified axis. Optionally limits the number of consecutive fills.
119
120
Parameters:
121
- a: array_like, input array
122
- n: int or None, maximum number of consecutive NaN values to fill
123
(None for unlimited filling, default: None)
124
- axis: int, axis along which to push values (default: -1)
125
126
Returns:
127
ndarray, array with NaN values forward-filled
128
"""
129
```
130
131
## Usage Examples
132
133
### Data Cleaning and Preprocessing
134
135
```python
136
import bottleneck as bn
137
import numpy as np
138
139
# Replace missing value indicators
140
data = np.array([1.0, -999.0, 3.0, -999.0, 5.0])
141
bn.replace(data, -999.0, np.nan) # In-place replacement
142
print("After replacement:", data) # [1.0, nan, 3.0, nan, 5.0]
143
144
# Replace NaN values with zero
145
data_with_nans = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
146
bn.replace(data_with_nans, np.nan, 0.0)
147
print("NaNs replaced:", data_with_nans) # [1.0, 0.0, 3.0, 0.0, 5.0]
148
149
# Handle integer arrays (requires compatible types)
150
int_data = np.array([1, -1, 3, -1, 5])
151
bn.replace(int_data, -1, 0) # Replace -1 with 0
152
print("Integer replacement:", int_data) # [1, 0, 3, 0, 5]
153
```
154
155
### Ranking and Percentile Analysis
156
157
```python
158
import bottleneck as bn
159
import numpy as np
160
161
# Basic ranking
162
scores = np.array([85, 92, 78, 92, 88])
163
ranks = bn.rankdata(scores)
164
print("Scores:", scores) # [85, 92, 78, 92, 88]
165
print("Ranks:", ranks) # [2.0, 4.5, 1.0, 4.5, 3.0]
166
167
# Ranking with missing values
168
scores_with_nan = np.array([85, np.nan, 78, 92, 88])
169
nan_ranks = bn.nanrankdata(scores_with_nan)
170
print("Scores with NaN:", scores_with_nan)
171
print("NaN-aware ranks:", nan_ranks) # [3.0, nan, 1.0, 4.0, 2.0]
172
173
# Multi-dimensional ranking
174
matrix = np.array([[3, 1, 4],
175
[1, 5, 9],
176
[2, 6, 5]])
177
178
# Rank along rows (axis=1)
179
row_ranks = bn.rankdata(matrix, axis=1)
180
print("Row-wise ranks:")
181
print(row_ranks)
182
183
# Rank entire array (flattened)
184
flat_ranks = bn.rankdata(matrix, axis=None)
185
print("Flattened ranks:", flat_ranks)
186
```
187
188
### Forward Filling Time Series
189
190
```python
191
import bottleneck as bn
192
import numpy as np
193
194
# Time series with missing values
195
timeseries = np.array([1.0, 2.0, np.nan, np.nan, 5.0, np.nan, 7.0])
196
197
# Unlimited forward fill
198
filled_unlimited = bn.push(timeseries.copy())
199
print("Original: ", timeseries)
200
print("Unlimited: ", filled_unlimited) # [1.0, 2.0, 2.0, 2.0, 5.0, 5.0, 7.0]
201
202
# Limited forward fill (max 1 consecutive fill)
203
filled_limited = bn.push(timeseries.copy(), n=1)
204
print("Limited(1):", filled_limited) # [1.0, 2.0, 2.0, nan, 5.0, 5.0, 7.0]
205
206
# Multi-dimensional forward fill
207
matrix_ts = np.array([[1.0, np.nan, 3.0],
208
[np.nan, 2.0, np.nan],
209
[4.0, np.nan, np.nan]])
210
211
# Fill along columns (axis=0)
212
filled_cols = bn.push(matrix_ts.copy(), axis=0)
213
print("Original matrix:")
214
print(matrix_ts)
215
print("Column-wise filled:")
216
print(filled_cols)
217
218
# Fill along rows (axis=1)
219
filled_rows = bn.push(matrix_ts.copy(), axis=1)
220
print("Row-wise filled:")
221
print(filled_rows)
222
```
223
224
### Efficient Selection with Partitioning
225
226
```python
227
import bottleneck as bn
228
import numpy as np
229
230
# Large array where we need to find top-k elements efficiently
231
large_array = np.random.randn(10000)
232
233
# Find the 10 largest elements using partition (much faster than full sort)
234
k = 10
235
# Partition to get 10 largest (at the end)
236
partitioned = bn.partition(large_array, -k)
237
top_10 = partitioned[-k:] # Last 10 elements are the largest
238
239
# Get indices of top 10 elements
240
top_10_indices = bn.argpartition(large_array, -k)[-k:]
241
top_10_values = large_array[top_10_indices]
242
243
print("Top 10 values:", top_10_values)
244
print("Their indices:", top_10_indices)
245
246
# For finding median efficiently
247
n = len(large_array)
248
median_idx = n // 2
249
partitioned_for_median = bn.partition(large_array.copy(), median_idx)
250
median_value = partitioned_for_median[median_idx]
251
print(f"Median value: {median_value}")
252
```
253
254
### Ranking for Data Analysis
255
256
```python
257
import bottleneck as bn
258
import numpy as np
259
260
# Student scores across multiple subjects
261
students = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
262
math_scores = np.array([85, 92, 78, 96, 88])
263
science_scores = np.array([90, 85, 92, 88, 95])
264
265
# Convert scores to ranks (higher score = higher rank)
266
math_ranks = bn.rankdata(math_scores)
267
science_ranks = bn.rankdata(science_scores)
268
269
# Create comprehensive ranking
270
combined_scores = np.column_stack([math_scores, science_scores])
271
overall_ranks = bn.rankdata(combined_scores.mean(axis=1))
272
273
print("Student Rankings:")
274
for i, student in enumerate(students):
275
print(f"{student}: Math={math_ranks[i]:.1f}, Science={science_ranks[i]:.1f}, Overall={overall_ranks[i]:.1f}")
276
277
# Handle tied rankings with percentile interpretation
278
percentiles = ((math_ranks - 1) / (len(math_ranks) - 1)) * 100
279
print("\nMath Score Percentiles:")
280
for i, student in enumerate(students):
281
print(f"{student}: {percentiles[i]:.1f}th percentile")
282
```
283
284
## Performance Notes
285
286
Array manipulation functions provide significant performance benefits:
287
288
- **replace()**: In-place operations avoid memory allocation overhead
289
- **rankdata/nanrankdata**: 2x to 50x faster than equivalent SciPy functions
290
- **partition/argpartition**: Re-exported NumPy functions for API completeness
291
- **push()**: Optimized forward-fill algorithm significantly faster than pandas equivalents
292
293
These functions are optimized for:
294
- Large arrays with frequent manipulation operations
295
- Time series data preprocessing pipelines
296
- Statistical analysis workflows requiring ranking operations
297
- Memory-constrained environments where in-place operations are preferred