or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-manipulation-functions.mdindex.mdmoving-window-functions.mdreduction-functions.mdutility-functions.md

array-manipulation-functions.mddocs/

0

# Array Manipulation Functions

1

2

Utilities for array transformation, ranking, and data manipulation operations that maintain array structure while modifying values or order. These functions provide specialized operations for data preprocessing and analysis workflows.

3

4

## Capabilities

5

6

### Value Replacement

7

8

In-place replacement of array values with optimized performance.

9

10

```python { .api }

11

def replace(a, old, new):

12

"""

13

Replace values in array in-place.

14

15

Replaces all occurrences of 'old' value with 'new' value in array 'a'.

16

Supports NaN replacement and handles type casting for integer arrays.

17

18

Parameters:

19

- a: numpy.ndarray, input array to modify (modified in-place)

20

- old: scalar, value to replace (can be NaN for float arrays)

21

- new: scalar, replacement value

22

23

Returns:

24

None (array is modified in-place)

25

26

Raises:

27

TypeError: if 'a' is not a numpy array

28

ValueError: if type casting is not safe for integer arrays

29

"""

30

```

31

32

### Ranking Functions

33

34

Assign ranks to array elements with support for ties and missing values.

35

36

```python { .api }

37

def rankdata(a, axis=None):

38

"""

39

Assign ranks to data, dealing with ties appropriately.

40

41

Returns the ranks of the elements in the array. Ranks begin at 1.

42

Ties are resolved by averaging the ranks of tied elements.

43

44

Parameters:

45

- a: array_like, input array to rank

46

- axis: None or int, axis along which to rank (None for flattened array)

47

48

Returns:

49

ndarray, array of ranks (float64 dtype)

50

"""

51

52

def nanrankdata(a, axis=None):

53

"""

54

Assign ranks to data, ignoring NaN values.

55

56

Similar to rankdata but ignores NaN values in the ranking process.

57

NaN values in the output array correspond to NaN values in the input.

58

59

Parameters:

60

- a: array_like, input array to rank

61

- axis: None or int, axis along which to rank (None for flattened array)

62

63

Returns:

64

ndarray, array of ranks with NaN preserved (float64 dtype)

65

"""

66

```

67

68

### Partitioning Functions

69

70

Partial sorting operations for efficient selection of order statistics.

71

72

```python { .api }

73

def partition(a, kth, axis=-1):

74

"""

75

Partial sort array along given axis.

76

77

Rearranges array elements such that the k-th element is in its final

78

sorted position. Elements smaller than k-th are before it, larger after.

79

This is a re-export of numpy.partition for convenience.

80

81

Parameters:

82

- a: array_like, input array

83

- kth: int or sequence of ints, indices that define the partition

84

- axis: int, axis along which to partition (default: -1)

85

86

Returns:

87

ndarray, partitioned array

88

"""

89

90

def argpartition(a, kth, axis=-1):

91

"""

92

Indices that would partition array along given axis.

93

94

Returns indices that would partition the array, similar to partition

95

but returning indices rather than the partitioned array.

96

This is a re-export of numpy.argpartition for convenience.

97

98

Parameters:

99

- a: array_like, input array

100

- kth: int or sequence of ints, indices that define the partition

101

- axis: int, axis along which to find partition indices (default: -1)

102

103

Returns:

104

ndarray, indices that would partition the array

105

"""

106

```

107

108

### Forward Fill Function

109

110

Propagate valid values forward to fill missing data gaps.

111

112

```python { .api }

113

def push(a, n=None, axis=-1):

114

"""

115

Fill NaN values by pushing forward the last valid value.

116

117

Forward-fills NaN values with the most recent non-NaN value along the

118

specified axis. Optionally limits the number of consecutive fills.

119

120

Parameters:

121

- a: array_like, input array

122

- n: int or None, maximum number of consecutive NaN values to fill

123

(None for unlimited filling, default: None)

124

- axis: int, axis along which to push values (default: -1)

125

126

Returns:

127

ndarray, array with NaN values forward-filled

128

"""

129

```

130

131

## Usage Examples

132

133

### Data Cleaning and Preprocessing

134

135

```python

136

import bottleneck as bn

137

import numpy as np

138

139

# Replace missing value indicators

140

data = np.array([1.0, -999.0, 3.0, -999.0, 5.0])

141

bn.replace(data, -999.0, np.nan) # In-place replacement

142

print("After replacement:", data) # [1.0, nan, 3.0, nan, 5.0]

143

144

# Replace NaN values with zero

145

data_with_nans = np.array([1.0, np.nan, 3.0, np.nan, 5.0])

146

bn.replace(data_with_nans, np.nan, 0.0)

147

print("NaNs replaced:", data_with_nans) # [1.0, 0.0, 3.0, 0.0, 5.0]

148

149

# Handle integer arrays (requires compatible types)

150

int_data = np.array([1, -1, 3, -1, 5])

151

bn.replace(int_data, -1, 0) # Replace -1 with 0

152

print("Integer replacement:", int_data) # [1, 0, 3, 0, 5]

153

```

154

155

### Ranking and Percentile Analysis

156

157

```python

158

import bottleneck as bn

159

import numpy as np

160

161

# Basic ranking

162

scores = np.array([85, 92, 78, 92, 88])

163

ranks = bn.rankdata(scores)

164

print("Scores:", scores) # [85, 92, 78, 92, 88]

165

print("Ranks:", ranks) # [2.0, 4.5, 1.0, 4.5, 3.0]

166

167

# Ranking with missing values

168

scores_with_nan = np.array([85, np.nan, 78, 92, 88])

169

nan_ranks = bn.nanrankdata(scores_with_nan)

170

print("Scores with NaN:", scores_with_nan)

171

print("NaN-aware ranks:", nan_ranks) # [3.0, nan, 1.0, 4.0, 2.0]

172

173

# Multi-dimensional ranking

174

matrix = np.array([[3, 1, 4],

175

[1, 5, 9],

176

[2, 6, 5]])

177

178

# Rank along rows (axis=1)

179

row_ranks = bn.rankdata(matrix, axis=1)

180

print("Row-wise ranks:")

181

print(row_ranks)

182

183

# Rank entire array (flattened)

184

flat_ranks = bn.rankdata(matrix, axis=None)

185

print("Flattened ranks:", flat_ranks)

186

```

187

188

### Forward Filling Time Series

189

190

```python

191

import bottleneck as bn

192

import numpy as np

193

194

# Time series with missing values

195

timeseries = np.array([1.0, 2.0, np.nan, np.nan, 5.0, np.nan, 7.0])

196

197

# Unlimited forward fill

198

filled_unlimited = bn.push(timeseries.copy())

199

print("Original: ", timeseries)

200

print("Unlimited: ", filled_unlimited) # [1.0, 2.0, 2.0, 2.0, 5.0, 5.0, 7.0]

201

202

# Limited forward fill (max 1 consecutive fill)

203

filled_limited = bn.push(timeseries.copy(), n=1)

204

print("Limited(1):", filled_limited) # [1.0, 2.0, 2.0, nan, 5.0, 5.0, 7.0]

205

206

# Multi-dimensional forward fill

207

matrix_ts = np.array([[1.0, np.nan, 3.0],

208

[np.nan, 2.0, np.nan],

209

[4.0, np.nan, np.nan]])

210

211

# Fill along columns (axis=0)

212

filled_cols = bn.push(matrix_ts.copy(), axis=0)

213

print("Original matrix:")

214

print(matrix_ts)

215

print("Column-wise filled:")

216

print(filled_cols)

217

218

# Fill along rows (axis=1)

219

filled_rows = bn.push(matrix_ts.copy(), axis=1)

220

print("Row-wise filled:")

221

print(filled_rows)

222

```

223

224

### Efficient Selection with Partitioning

225

226

```python

227

import bottleneck as bn

228

import numpy as np

229

230

# Large array where we need to find top-k elements efficiently

231

large_array = np.random.randn(10000)

232

233

# Find the 10 largest elements using partition (much faster than full sort)

234

k = 10

235

# Partition to get 10 largest (at the end)

236

partitioned = bn.partition(large_array, -k)

237

top_10 = partitioned[-k:] # Last 10 elements are the largest

238

239

# Get indices of top 10 elements

240

top_10_indices = bn.argpartition(large_array, -k)[-k:]

241

top_10_values = large_array[top_10_indices]

242

243

print("Top 10 values:", top_10_values)

244

print("Their indices:", top_10_indices)

245

246

# For finding median efficiently

247

n = len(large_array)

248

median_idx = n // 2

249

partitioned_for_median = bn.partition(large_array.copy(), median_idx)

250

median_value = partitioned_for_median[median_idx]

251

print(f"Median value: {median_value}")

252

```

253

254

### Ranking for Data Analysis

255

256

```python

257

import bottleneck as bn

258

import numpy as np

259

260

# Student scores across multiple subjects

261

students = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']

262

math_scores = np.array([85, 92, 78, 96, 88])

263

science_scores = np.array([90, 85, 92, 88, 95])

264

265

# Convert scores to ranks (higher score = higher rank)

266

math_ranks = bn.rankdata(math_scores)

267

science_ranks = bn.rankdata(science_scores)

268

269

# Create comprehensive ranking

270

combined_scores = np.column_stack([math_scores, science_scores])

271

overall_ranks = bn.rankdata(combined_scores.mean(axis=1))

272

273

print("Student Rankings:")

274

for i, student in enumerate(students):

275

print(f"{student}: Math={math_ranks[i]:.1f}, Science={science_ranks[i]:.1f}, Overall={overall_ranks[i]:.1f}")

276

277

# Handle tied rankings with percentile interpretation

278

percentiles = ((math_ranks - 1) / (len(math_ranks) - 1)) * 100

279

print("\nMath Score Percentiles:")

280

for i, student in enumerate(students):

281

print(f"{student}: {percentiles[i]:.1f}th percentile")

282

```

283

284

## Performance Notes

285

286

Array manipulation functions provide significant performance benefits:

287

288

- **replace()**: In-place operations avoid memory allocation overhead

289

- **rankdata/nanrankdata**: 2x to 50x faster than equivalent SciPy functions

290

- **partition/argpartition**: Re-exported NumPy functions for API completeness

291

- **push()**: Optimized forward-fill algorithm significantly faster than pandas equivalents

292

293

These functions are optimized for:

294

- Large arrays with frequent manipulation operations

295

- Time series data preprocessing pipelines

296

- Statistical analysis workflows requiring ranking operations

297

- Memory-constrained environments where in-place operations are preferred