Tessl Tile for pypi/bottleneck@1.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

array-manipulation-functions.md index.md moving-window-functions.md reduction-functions.md utility-functions.md

array-manipulation-functions.mddocs/

0
# Array Manipulation Functions
1

2
Utilities for array transformation, ranking, and data manipulation operations that maintain array structure while modifying values or order. These functions provide specialized operations for data preprocessing and analysis workflows.
3

4
## Capabilities
5

6
### Value Replacement
7

8
In-place replacement of array values with optimized performance.
9

10
```python { .api }
11
def replace(a, old, new):
12
    """
13
    Replace values in array in-place.
14

15
    Replaces all occurrences of 'old' value with 'new' value in array 'a'.
16
    Supports NaN replacement and handles type casting for integer arrays.
17

18
    Parameters:
19
    - a: numpy.ndarray, input array to modify (modified in-place)
20
    - old: scalar, value to replace (can be NaN for float arrays)
21
    - new: scalar, replacement value
22

23
    Returns:
24
    None (array is modified in-place)
25

26
    Raises:
27
    TypeError: if 'a' is not a numpy array
28
    ValueError: if type casting is not safe for integer arrays
29
    """
30
```
31

32
### Ranking Functions
33

34
Assign ranks to array elements with support for ties and missing values.
35

36
```python { .api }
37
def rankdata(a, axis=None):
38
    """
39
    Assign ranks to data, dealing with ties appropriately.
40

41
    Returns the ranks of the elements in the array. Ranks begin at 1.
42
    Ties are resolved by averaging the ranks of tied elements.
43

44
    Parameters:
45
    - a: array_like, input array to rank
46
    - axis: None or int, axis along which to rank (None for flattened array)
47

48
    Returns:
49
    ndarray, array of ranks (float64 dtype)
50
    """
51

52
def nanrankdata(a, axis=None):
53
    """
54
    Assign ranks to data, ignoring NaN values.
55

56
    Similar to rankdata but ignores NaN values in the ranking process.
57
    NaN values in the output array correspond to NaN values in the input.
58

59
    Parameters:
60
    - a: array_like, input array to rank
61
    - axis: None or int, axis along which to rank (None for flattened array)
62

63
    Returns:
64
    ndarray, array of ranks with NaN preserved (float64 dtype)
65
    """
66
```
67

68
### Partitioning Functions
69

70
Partial sorting operations for efficient selection of order statistics.
71

72
```python { .api }
73
def partition(a, kth, axis=-1):
74
    """
75
    Partial sort array along given axis.
76

77
    Rearranges array elements such that the k-th element is in its final
78
    sorted position. Elements smaller than k-th are before it, larger after.
79
    This is a re-export of numpy.partition for convenience.
80

81
    Parameters:
82
    - a: array_like, input array
83
    - kth: int or sequence of ints, indices that define the partition
84
    - axis: int, axis along which to partition (default: -1)
85

86
    Returns:
87
    ndarray, partitioned array
88
    """
89

90
def argpartition(a, kth, axis=-1):
91
    """
92
    Indices that would partition array along given axis.
93

94
    Returns indices that would partition the array, similar to partition
95
    but returning indices rather than the partitioned array.
96
    This is a re-export of numpy.argpartition for convenience.
97

98
    Parameters:
99
    - a: array_like, input array
100
    - kth: int or sequence of ints, indices that define the partition
101
    - axis: int, axis along which to find partition indices (default: -1)
102

103
    Returns:
104
    ndarray, indices that would partition the array
105
    """
106
```
107

108
### Forward Fill Function
109

110
Propagate valid values forward to fill missing data gaps.
111

112
```python { .api }
113
def push(a, n=None, axis=-1):
114
    """
115
    Fill NaN values by pushing forward the last valid value.
116

117
    Forward-fills NaN values with the most recent non-NaN value along the
118
    specified axis. Optionally limits the number of consecutive fills.
119

120
    Parameters:
121
    - a: array_like, input array
122
    - n: int or None, maximum number of consecutive NaN values to fill
123
         (None for unlimited filling, default: None)
124
    - axis: int, axis along which to push values (default: -1)
125

126
    Returns:
127
    ndarray, array with NaN values forward-filled
128
    """
129
```
130

131
## Usage Examples
132

133
### Data Cleaning and Preprocessing
134

135
```python
136
import bottleneck as bn
137
import numpy as np
138

139
# Replace missing value indicators
140
data = np.array([1.0, -999.0, 3.0, -999.0, 5.0])
141
bn.replace(data, -999.0, np.nan)  # In-place replacement
142
print("After replacement:", data)  # [1.0, nan, 3.0, nan, 5.0]
143

144
# Replace NaN values with zero
145
data_with_nans = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
146
bn.replace(data_with_nans, np.nan, 0.0)
147
print("NaNs replaced:", data_with_nans)  # [1.0, 0.0, 3.0, 0.0, 5.0]
148

149
# Handle integer arrays (requires compatible types)
150
int_data = np.array([1, -1, 3, -1, 5])
151
bn.replace(int_data, -1, 0)  # Replace -1 with 0
152
print("Integer replacement:", int_data)  # [1, 0, 3, 0, 5]
153
```
154

155
### Ranking and Percentile Analysis
156

157
```python
158
import bottleneck as bn
159
import numpy as np
160

161
# Basic ranking
162
scores = np.array([85, 92, 78, 92, 88])
163
ranks = bn.rankdata(scores)
164
print("Scores:", scores)         # [85, 92, 78, 92, 88]  
165
print("Ranks:", ranks)           # [2.0, 4.5, 1.0, 4.5, 3.0]
166

167
# Ranking with missing values
168
scores_with_nan = np.array([85, np.nan, 78, 92, 88])
169
nan_ranks = bn.nanrankdata(scores_with_nan)
170
print("Scores with NaN:", scores_with_nan)
171
print("NaN-aware ranks:", nan_ranks)  # [3.0, nan, 1.0, 4.0, 2.0]
172

173
# Multi-dimensional ranking
174
matrix = np.array([[3, 1, 4],
175
                   [1, 5, 9],
176
                   [2, 6, 5]])
177

178
# Rank along rows (axis=1)
179
row_ranks = bn.rankdata(matrix, axis=1)
180
print("Row-wise ranks:")
181
print(row_ranks)
182

183
# Rank entire array (flattened)
184
flat_ranks = bn.rankdata(matrix, axis=None)
185
print("Flattened ranks:", flat_ranks)
186
```
187

188
### Forward Filling Time Series
189

190
```python
191
import bottleneck as bn
192
import numpy as np
193

194
# Time series with missing values
195
timeseries = np.array([1.0, 2.0, np.nan, np.nan, 5.0, np.nan, 7.0])
196

197
# Unlimited forward fill
198
filled_unlimited = bn.push(timeseries.copy())
199
print("Original:  ", timeseries)
200
print("Unlimited: ", filled_unlimited)  # [1.0, 2.0, 2.0, 2.0, 5.0, 5.0, 7.0]
201

202
# Limited forward fill (max 1 consecutive fill)
203
filled_limited = bn.push(timeseries.copy(), n=1)
204
print("Limited(1):", filled_limited)    # [1.0, 2.0, 2.0, nan, 5.0, 5.0, 7.0]
205

206
# Multi-dimensional forward fill
207
matrix_ts = np.array([[1.0, np.nan, 3.0],
208
                      [np.nan, 2.0, np.nan],
209
                      [4.0, np.nan, np.nan]])
210

211
# Fill along columns (axis=0)
212
filled_cols = bn.push(matrix_ts.copy(), axis=0)
213
print("Original matrix:")
214
print(matrix_ts)
215
print("Column-wise filled:")
216
print(filled_cols)
217

218
# Fill along rows (axis=1)  
219
filled_rows = bn.push(matrix_ts.copy(), axis=1)
220
print("Row-wise filled:")
221
print(filled_rows)
222
```
223

224
### Efficient Selection with Partitioning
225

226
```python
227
import bottleneck as bn
228
import numpy as np
229

230
# Large array where we need to find top-k elements efficiently
231
large_array = np.random.randn(10000)
232

233
# Find the 10 largest elements using partition (much faster than full sort)
234
k = 10
235
# Partition to get 10 largest (at the end)
236
partitioned = bn.partition(large_array, -k)
237
top_10 = partitioned[-k:]  # Last 10 elements are the largest
238

239
# Get indices of top 10 elements
240
top_10_indices = bn.argpartition(large_array, -k)[-k:]
241
top_10_values = large_array[top_10_indices]
242

243
print("Top 10 values:", top_10_values)
244
print("Their indices:", top_10_indices)
245

246
# For finding median efficiently
247
n = len(large_array)
248
median_idx = n // 2
249
partitioned_for_median = bn.partition(large_array.copy(), median_idx)
250
median_value = partitioned_for_median[median_idx]
251
print(f"Median value: {median_value}")
252
```
253

254
### Ranking for Data Analysis
255

256
```python
257
import bottleneck as bn
258
import numpy as np
259

260
# Student scores across multiple subjects
261
students = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
262
math_scores = np.array([85, 92, 78, 96, 88])
263
science_scores = np.array([90, 85, 92, 88, 95])
264

265
# Convert scores to ranks (higher score = higher rank)
266
math_ranks = bn.rankdata(math_scores)
267
science_ranks = bn.rankdata(science_scores)
268

269
# Create comprehensive ranking
270
combined_scores = np.column_stack([math_scores, science_scores])
271
overall_ranks = bn.rankdata(combined_scores.mean(axis=1))
272

273
print("Student Rankings:")
274
for i, student in enumerate(students):
275
    print(f"{student}: Math={math_ranks[i]:.1f}, Science={science_ranks[i]:.1f}, Overall={overall_ranks[i]:.1f}")
276

277
# Handle tied rankings with percentile interpretation
278
percentiles = ((math_ranks - 1) / (len(math_ranks) - 1)) * 100
279
print("\nMath Score Percentiles:")
280
for i, student in enumerate(students):
281
    print(f"{student}: {percentiles[i]:.1f}th percentile")
282
```
283

284
## Performance Notes
285

286
Array manipulation functions provide significant performance benefits:
287

288
- **replace()**: In-place operations avoid memory allocation overhead
289
- **rankdata/nanrankdata**: 2x to 50x faster than equivalent SciPy functions  
290
- **partition/argpartition**: Re-exported NumPy functions for API completeness
291
- **push()**: Optimized forward-fill algorithm significantly faster than pandas equivalents
292

293
These functions are optimized for:
294
- Large arrays with frequent manipulation operations
295
- Time series data preprocessing pipelines  
296
- Statistical analysis workflows requiring ranking operations
297
- Memory-constrained environments where in-place operations are preferred

Version

Tile

Files

array-manipulation-functions.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

array-manipulation-functions.mddocs/