Tessl Tile for pypi/patsy@1.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

builtins.md categorical.md contrasts.md high-level.md index.md matrix-building.md splines.md transforms.md utilities.md

contrasts.mddocs/

0
# Contrast Coding
1

2
Classes implementing different contrast coding schemes for categorical variables. These coding schemes determine how categorical factors are represented in design matrices, affecting the interpretation of model coefficients.
3

4
## Capabilities
5

6
### Contrast Matrix Base Class
7

8
The foundation class for all contrast coding schemes, containing the actual coding matrix and column naming information.
9

10
```python { .api }
11
class ContrastMatrix:
12
    """
13
    Container for a matrix used for coding categorical factors.
14
    
15
    Attributes:
16
    - matrix: 2d ndarray where each column corresponds to one design matrix column
17
             and each row contains entries for a single categorical level
18
    - column_suffixes: List of strings appended to factor names for column names
19
    """
20
    def __init__(self, matrix, column_suffixes):
21
        """
22
        Create a contrast matrix.
23
        
24
        Parameters:
25
        - matrix: 2d array-like coding matrix
26
        - column_suffixes: List of suffix strings for column naming
27
        """
28
```
29

30
### Treatment Contrasts (Dummy Coding)
31

32
The default contrast coding scheme, comparing each level to a reference level.
33

34
```python { .api }
35
class Treatment:
36
    """
37
    Treatment coding (dummy coding) - the default contrast scheme.
38
    
39
    For reduced-rank coding, one level is the reference (represented by intercept),
40
    and each column represents the difference between a level and the reference.
41
    For full-rank coding, classic dummy coding with each level having its own column.
42
    """
43
    def __init__(self, reference=None):
44
        """
45
        Parameters:
46
        - reference: Level to use as reference (default: first level)
47
        """
48
```
49

50
#### Usage Examples
51

52
```python
53
import patsy
54
from patsy import Treatment
55
import pandas as pd
56

57
data = pd.DataFrame({
58
    'group': ['A', 'B', 'C', 'A', 'B', 'C'],
59
    'y': [1, 2, 3, 1.5, 2.5, 3.5]
60
})
61

62
# Default treatment contrasts (first level as reference)
63
y, X = patsy.dmatrices("y ~ C(group)", data)
64
print(X.design_info.column_names)  # ['Intercept', 'C(group)[T.B]', 'C(group)[T.C]']
65

66
# Specify reference level
67
y, X = patsy.dmatrices("y ~ C(group, Treatment(reference='B'))", data)
68
print(X.design_info.column_names)  # ['Intercept', 'C(group)[T.A]', 'C(group)[T.C]']
69
```
70

71
### Sum-to-Zero Contrasts (Deviation Coding)
72

73
Compares each level to the grand mean, with coefficients that sum to zero.
74

75
```python { .api }
76
class Sum:
77
    """
78
    Deviation coding (sum-to-zero coding).
79
    
80
    Compares the mean of each level to the mean-of-means (overall mean in balanced designs).
81
    Coefficients sum to zero, making interpretation relative to the grand mean.
82
    """
83
    def __init__(self, omit=None):
84
        """
85
        Parameters:
86
        - omit: Level to omit to avoid redundancy (default: last level)
87
        """
88
```
89

90
#### Usage Examples
91

92
```python
93
import patsy
94
from patsy import Sum
95

96
# Sum-to-zero contrasts
97
y, X = patsy.dmatrices("y ~ C(group, Sum)", data)
98
print(X.design_info.column_names)  # ['Intercept', 'C(group)[S.A]', 'C(group)[S.B]']
99

100
# Specify which level to omit
101
y, X = patsy.dmatrices("y ~ C(group, Sum(omit='A'))", data)
102
```
103

104
### Helmert Contrasts
105

106
Compares each level with the average of all preceding levels.
107

108
```python { .api }
109
class Helmert:
110
    """
111
    Helmert contrasts.
112
    
113
    Compares the second level with the first, the third with the average of
114
    the first two, and so on. Useful for ordered factors.
115
    
116
    Warning: Multiple definitions of 'Helmert coding' exist. Verify this matches
117
    your expected interpretation.
118
    """
119
```
120

121
#### Usage Examples
122

123
```python
124
import patsy
125
from patsy import Helmert
126

127
# Helmert contrasts for ordered factors
128
data = pd.DataFrame({
129
    'dose': ['low', 'medium', 'high', 'low', 'medium', 'high'],
130
    'response': [1, 2, 4, 1.2, 2.1, 3.8]
131
})
132

133
y, X = patsy.dmatrices("response ~ C(dose, Helmert, levels=['low', 'medium', 'high'])", data)
134
print(X.design_info.column_names)
135
```
136

137
### Polynomial Contrasts
138

139
Treats categorical levels as ordered samples for polynomial trend analysis.
140

141
```python { .api }
142
class Poly:
143
    """
144
    Orthogonal polynomial contrast coding.
145
    
146
    Treats levels as ordered samples from an underlying continuous scale,
147
    decomposing effects into linear, quadratic, cubic, etc. components.
148
    Useful for ordered factors with potentially nonlinear relationships.
149
    """
150
```
151

152
#### Usage Examples
153

154
```python
155
import patsy
156
from patsy import Poly
157

158
# Polynomial contrasts for dose-response analysis
159
data = pd.DataFrame({
160
    'dose': [1, 2, 3, 4, 1, 2, 3, 4],  # Numeric levels
161
    'response': [1, 1.8, 3.2, 4.5, 1.1, 1.9, 3.1, 4.6]
162
})
163

164
y, X = patsy.dmatrices("response ~ C(dose, Poly)", data)
165
print(X.design_info.column_names)  # Linear, quadratic, cubic terms
166
```
167

168
### Difference Contrasts (Backward Difference)
169

170
Compares each level with the immediately preceding level, useful for ordered factors.
171

172
```python { .api }
173
class Diff:
174
    """
175
    Backward difference coding.
176
    
177
    Compares each level with the preceding level: second minus first,
178
    third minus second, etc. Useful for ordered factors to examine
179
    step-wise changes between adjacent levels.
180
    """
181
```
182

183
#### Usage Examples
184

185
```python
186
import patsy
187
from patsy import Diff
188

189
# Difference contrasts for time periods
190
data = pd.DataFrame({
191
    'period': ['pre', 'during', 'post', 'pre', 'during', 'post'],
192
    'measurement': [10, 15, 12, 9, 16, 13]
193
})
194

195
y, X = patsy.dmatrices("measurement ~ C(period, Diff, levels=['pre', 'during', 'post'])", data)
196
print(X.design_info.column_names)  # Shows differences: during-pre, post-during
197
```
198

199
## Contrast Coding Concepts
200

201
### Full-Rank vs Reduced-Rank Coding
202

203
- **Reduced-rank coding**: Includes an intercept term, omits one level to avoid multicollinearity
204
- **Full-rank coding**: Includes all levels without an intercept, useful for certain modeling approaches
205

206
### Choosing Contrast Schemes
207

208
| Contrast Type | Best For | Interpretation |
209
|---------------|----------|----------------|
210
| Treatment | General categorical factors | Difference from reference level |
211
| Sum | Balanced designs, ANOVA-style analysis | Deviation from grand mean |
212
| Helmert | Ordered factors, progressive comparisons | Cumulative effects |
213
| Polynomial | Ordered factors, trend analysis | Linear, quadratic, cubic trends |
214
| Diff | Ordered factors, adjacent comparisons | Step-wise changes |
215
216
### Custom Contrast Matrices
217

218
```python
219
import numpy as np
220
from patsy import ContrastMatrix
221

222
# Create custom contrast matrix
223
custom_matrix = np.array([[1, 0], [0, 1], [-1, -1]])
224
custom_contrasts = ContrastMatrix(custom_matrix, ["[custom.1]", "[custom.2]"])
225

226
# Use in formula (requires integration with Patsy's system)
227
```
228

229
## Integration with Categorical Variables
230

231
Contrast coding works seamlessly with categorical variable specification:
232

233
```python
234
import patsy
235
from patsy import C, Treatment, Sum
236

237
data = {'factor': ['A', 'B', 'C'] * 10, 'y': range(30)}
238

239
# Combine C() with contrast specification
240
designs = [
241
    patsy.dmatrix("C(factor, Treatment)", data),
242
    patsy.dmatrix("C(factor, Sum)", data),
243
    patsy.dmatrix("C(factor, levels=['C', 'B', 'A'])", data)  # Custom ordering
244
]
245
```

Version

Tile

Files

contrasts.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

contrasts.mddocs/