Tessl Tile for pypi/patsy@1.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

builtins.md categorical.md contrasts.md high-level.md index.md matrix-building.md splines.md transforms.md utilities.md

transforms.mddocs/

0
# Stateful Transforms
1

2
Transform functions that maintain state across data processing operations. These transforms remember characteristics of the training data and apply consistent transformations to new data, essential for preprocessing in statistical modeling.
3

4
## Capabilities
5

6
### Stateful Transform Decorator
7

8
Creates stateful transform callable objects from classes implementing the stateful transform protocol.
9

10
```python { .api }
11
def stateful_transform(class_):
12
    """
13
    Create a stateful transform callable from a class implementing the stateful transform protocol.
14

15
    Parameters:
16
    - class_: A class implementing the stateful transform protocol with methods:
17
              - __init__(): Initialize the transform
18
              - memorize_chunk(input_data): Process data during learning phase
19
              - memorize_finish(): Finalize learning phase
20
              - transform(input_data): Apply transformation to data
21

22
    Returns:
23
    Callable transform object that can be used in formulas
24
    """
25
```
26

27
#### Usage Examples
28

29
```python
30
import patsy
31
import numpy as np
32

33
# Define a custom stateful transform class
34
class CustomScale:
35
    def __init__(self):
36
        self.scale_factor = None
37
    
38
    def memorize_chunk(self, input_data):
39
        # Accumulate data statistics during training
40
        pass
41
    
42
    def memorize_finish(self):
43
        # Finalize computation after seeing all training data
44
        pass
45
    
46
    def transform(self, input_data):
47
        # Apply transformation consistently to new data
48
        return input_data * self.scale_factor
49

50
# Create the stateful transform
51
custom_scale = patsy.stateful_transform(CustomScale)
52

53
# Use in formulas (conceptually)
54
# design = patsy.dmatrix("custom_scale(x)", data)
55
```
56

57
### Centering Transform
58

59
Subtracts the mean from data, centering it around zero while preserving the scale.
60

61
```python { .api }
62
def center(x):
63
    """
64
    Stateful transform that centers input data by subtracting the mean.
65

66
    Parameters:
67
    - x: Array-like data to center
68

69
    Returns:
70
    Array with same shape as input, centered around zero
71

72
    Notes:
73
    - For multi-column input, centers each column separately
74
    - Equivalent to standardize(x, rescale=False)
75
    - State: Remembers the mean of training data
76
    """
77
```
78

79
#### Usage Examples
80

81
```python
82
import patsy
83
import numpy as np
84
import pandas as pd
85

86
# Sample data
87
data = pd.DataFrame({
88
    'x': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
89
    'y': [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
90
})
91

92
# Center a variable in formula
93
design = patsy.dmatrix("center(x)", data)
94
print(f"Original mean: {np.mean(data['x'])}")
95
print(f"Centered mean: {np.mean(design)}")  # Should be close to 0
96

97
# Center multiple variables
98
design = patsy.dmatrix("center(x) + center(y)", data)
99

100
# Complete model with centering
101
y_matrix, X_matrix = patsy.dmatrices("y ~ center(x)", data)
102

103
# Centering preserves relationships but changes intercept interpretation
104
print("Design matrix mean by column:", np.mean(X_matrix, axis=0))
105
```
106

107
### Standardization Transform
108

109
Centers data and scales to unit variance (z-score standardization).
110

111
```python { .api }
112
def standardize(x, center=True, rescale=True, ddof=0):
113
    """
114
    Stateful transform that standardizes input data (z-score normalization).
115

116
    Parameters:
117
    - x: Array-like data to standardize
118
    - center (bool): Whether to subtract the mean (default: True)
119
    - rescale (bool): Whether to divide by standard deviation (default: True)
120
    - ddof (int): Delta degrees of freedom for standard deviation computation (default: 0)
121

122
    Returns:
123
    Array with same shape as input, standardized
124

125
    Notes:
126
    - ddof=0 gives maximum likelihood estimate (divides by n)
127
    - ddof=1 gives unbiased estimate (divides by n-1)
128
    - For multi-column input, standardizes each column separately
129
    - State: Remembers mean and standard deviation of training data
130
    """
131
```
132

133
#### Usage Examples
134

135
```python
136
import patsy
137
import numpy as np
138
import pandas as pd
139

140
# Sample data with different scales
141
data = pd.DataFrame({
142
    'small': [0.1, 0.2, 0.3, 0.4, 0.5],
143
    'large': [100, 200, 300, 400, 500],
144
    'y': [1, 2, 3, 4, 5]
145
})
146

147
# Standardize variables to have mean 0, std 1
148
design = patsy.dmatrix("standardize(small) + standardize(large)", data)
149
print("Standardized means:", np.mean(design, axis=0))  # Should be ~0
150
print("Standardized stds:", np.std(design, axis=0))    # Should be ~1
151

152
# Only center without rescaling
153
design = patsy.dmatrix("standardize(small, rescale=False)", data)
154

155
# Only rescale without centering
156
design = patsy.dmatrix("standardize(small, center=False)", data)
157

158
# Use unbiased standard deviation (ddof=1)
159
design = patsy.dmatrix("standardize(small, ddof=1)", data)
160

161
# Complete model with standardization
162
y_matrix, X_matrix = patsy.dmatrices("y ~ standardize(small) + standardize(large)", data)
163
```
164

165
### Scale Transform
166

167
Alias for the standardize function, providing the same functionality.
168

169
```python { .api }
170
def scale(x, ddof=0):
171
    """
172
    Alias for standardize() function.
173
    
174
    Equivalent to standardize(x, center=True, rescale=True, ddof=ddof)
175

176
    Parameters:
177
    - x: Array-like data to scale
178
    - ddof (int): Delta degrees of freedom for standard deviation computation
179

180
    Returns:
181
    Standardized array (mean 0, standard deviation 1)
182
    """
183
```
184

185
#### Usage Examples
186

187
```python
188
import patsy
189
import pandas as pd
190

191
data = pd.DataFrame({
192
    'x': [10, 20, 30, 40, 50],
193
    'y': [1, 4, 9, 16, 25]
194
})
195

196
# scale() is equivalent to standardize()
197
design1 = patsy.dmatrix("scale(x)", data)
198
design2 = patsy.dmatrix("standardize(x)", data)
199
print("Designs are equal:", np.allclose(design1, design2))
200

201
# Complete model using scale
202
y_matrix, X_matrix = patsy.dmatrices("y ~ scale(x)", data)
203
```
204

205
## Transform Behavior and State
206

207
### Stateful Nature
208

209
Stateful transforms work in two phases:
210

211
1. **Learning Phase** (during initial matrix construction):
212
   - `memorize_chunk()`: Process training data chunks
213
   - `memorize_finish()`: Finalize parameter computation
214

215
2. **Transform Phase** (during application to new data):
216
   - `transform()`: Apply learned parameters to new data
217

218
### Consistent Application
219

220
```python
221
import patsy
222
import numpy as np
223

224
# Training data
225
train_data = {'x': [1, 2, 3, 4, 5]}
226
builder = patsy.dmatrix("standardize(x)", train_data)
227

228
# The standardize transform has learned the mean and std from training data
229
# Now it can be applied consistently to new data
230
test_data = {'x': [1.5, 2.5, 3.5]}
231
test_design = builder.transform(test_data)  # Uses same mean/std from training
232
```
233

234
### Integration with Incremental Processing
235

236
Stateful transforms work with Patsy's incremental processing for large datasets:
237

238
```python
239
import patsy
240

241
def data_chunks():
242
    # Generator yielding data chunks
243
    for i in range(0, 10000, 1000):
244
        yield {'x': list(range(i, i+1000))}
245

246
# Build incremental design matrix with transforms
247
builder = patsy.incr_dbuilder("standardize(x)", data_chunks)
248

249
# Apply to new data using learned parameters
250
new_data = {'x': [5000, 5001, 5002]}
251
design = builder.build(new_data)
252
```
253

254
## Advanced Transform Usage
255

256
### Multiple Transforms
257

258
```python
259
# Chain transforms
260
design = patsy.dmatrix("center(standardize(x))", data)  # Note: This is redundant
261

262
# Apply different transforms to different variables
263
design = patsy.dmatrix("center(x1) + standardize(x2) + scale(x3)", data)
264
```
265

266
### Custom Transform Development
267

268
```python
269
class RobustScale:
270
    """Custom stateful transform using median and MAD instead of mean and std"""
271
    
272
    def __init__(self):
273
        self.median = None
274
        self.mad = None
275
    
276
    def memorize_chunk(self, input_data):
277
        # In practice, you'd accumulate statistics across chunks
278
        data = np.asarray(input_data)
279
        if self.median is None:
280
            self.median = np.median(data)
281
            self.mad = np.median(np.abs(data - self.median))
282
    
283
    def memorize_finish(self):
284
        # Finalize computation if needed
285
        pass
286
    
287
    def transform(self, input_data):
288
        data = np.asarray(input_data)
289
        return (data - self.median) / (1.4826 * self.mad)  # 1.4826 for normal consistency
290

291
# Create the transform
292
robust_scale = patsy.stateful_transform(RobustScale)
293
```
294

295
### Transform with Model Fitting
296

297
```python
298
import patsy
299
from sklearn.linear_model import LinearRegression
300

301
# Create standardized design matrices
302
data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8, 10]}
303
y, X = patsy.dmatrices("y ~ standardize(x)", data)
304

305
# Fit model
306
model = LinearRegression(fit_intercept=False)
307
model.fit(X, y.ravel())
308

309
# The transform state is preserved for new predictions
310
new_data = {'x': [1.5, 2.5, 3.5]}
311
X_new = patsy.dmatrix("standardize(x)", new_data, 
312
                      return_type="matrix")  # Uses same standardization parameters
313
predictions = model.predict(X_new)
314
```

Version

Tile

Files

transforms.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

transforms.mddocs/