or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

classification.mdclustering.mddatasets.mdevaluation.mdfeature-engineering.mdfile-io.mdindex.mdmath-utils.mdpattern-mining.mdplotting.mdpreprocessing.mdregression.mdtext-processing.mdutilities.md

preprocessing.mddocs/

0

# Data Preprocessing

1

2

Data transformation utilities including scaling, encoding, and array manipulation functions compatible with scikit-learn pipelines.

3

4

## Capabilities

5

6

### Mean Centering

7

8

Center data around the mean for normalization.

9

10

```python { .api }

11

class MeanCenterer:

12

def __init__(self):

13

"""Mean centering transformer"""

14

15

def fit(self, X, y=None):

16

"""Compute the mean to be used for centering"""

17

18

def transform(self, X):

19

"""Center data around the mean"""

20

21

def fit_transform(self, X, y=None):

22

"""Fit and transform data"""

23

24

mean_: # Computed mean values

25

```

26

27

### Transaction Encoding

28

29

Encode transaction data for frequent pattern mining algorithms.

30

31

```python { .api }

32

class TransactionEncoder:

33

def __init__(self):

34

"""Encode transaction data to binary matrix format"""

35

36

def fit(self, X):

37

"""Learn the unique items in the transaction dataset"""

38

39

def transform(self, X):

40

"""Transform transactions to binary matrix"""

41

42

def fit_transform(self, X):

43

"""Fit and transform transactions"""

44

45

columns_: # Column names (unique items)

46

```

47

48

### Scaling Functions

49

50

Scaling and standardization utilities for feature normalization.

51

52

```python { .api }

53

def standardize(array, columns=None, ddof=0):

54

"""

55

Z-score standardization of features.

56

57

Parameters:

58

- array: array-like, input data

59

- columns: list, columns to standardize (all if None)

60

- ddof: int, degrees of freedom for standard deviation

61

62

Returns:

63

- standardized_array: array-like, standardized data

64

"""

65

66

def minmax_scaling(array, columns=None, min_val=0, max_val=1):

67

"""

68

Min-max feature scaling to specified range.

69

70

Parameters:

71

- array: array-like, input data

72

- columns: list, columns to scale (all if None)

73

- min_val: float, minimum value of scaled range

74

- max_val: float, maximum value of scaled range

75

76

Returns:

77

- scaled_array: array-like, scaled data

78

"""

79

```

80

81

### Additional Transformers

82

83

Utility transformers for data pipeline integration.

84

85

```python { .api }

86

class CopyTransformer:

87

def __init__(self):

88

"""Identity transformer that copies input data"""

89

90

def fit(self, X, y=None):

91

"""Fit transformer (no-op)"""

92

93

def transform(self, X):

94

"""Return copy of input data"""

95

96

class DenseTransformer:

97

def __init__(self):

98

"""Convert sparse matrices to dense format"""

99

100

def fit(self, X, y=None):

101

"""Fit transformer (no-op)"""

102

103

def transform(self, X):

104

"""Convert sparse matrix to dense"""

105

106

def one_hot(y, dtype=int):

107

"""

108

One-hot encode categorical labels.

109

110

Parameters:

111

- y: array-like, categorical labels

112

- dtype: data type for output array

113

114

Returns:

115

- encoded: array, one-hot encoded matrix

116

"""

117

118

def shuffle_arrays_unison(*arrays, random_seed=None):

119

"""

120

Shuffle multiple arrays in unison.

121

122

Parameters:

123

- arrays: array-like objects to shuffle together

124

- random_seed: int, random seed for reproducibility

125

126

Returns:

127

- shuffled_arrays: tuple of shuffled arrays

128

"""

129

```

130

131

## Usage Examples

132

133

```python

134

from mlxtend.preprocessing import TransactionEncoder, MeanCenterer, standardize

135

import pandas as pd

136

import numpy as np

137

138

# Transaction encoding example

139

transactions = [['bread', 'milk'], ['bread', 'beer'], ['milk', 'beer']]

140

te = TransactionEncoder()

141

te_ary = te.fit(transactions).transform(transactions)

142

df = pd.DataFrame(te_ary, columns=te.columns_)

143

144

# Mean centering example

145

X = np.random.randn(100, 5)

146

mc = MeanCenterer()

147

X_centered = mc.fit_transform(X)

148

149

# Standardization example

150

X_std = standardize(X)

151

```