0
# Contrast Coding
1
2
Classes implementing different contrast coding schemes for categorical variables. These coding schemes determine how categorical factors are represented in design matrices, affecting the interpretation of model coefficients.
3
4
## Capabilities
5
6
### Contrast Matrix Base Class
7
8
The foundation class for all contrast coding schemes, containing the actual coding matrix and column naming information.
9
10
```python { .api }
11
class ContrastMatrix:
12
"""
13
Container for a matrix used for coding categorical factors.
14
15
Attributes:
16
- matrix: 2d ndarray where each column corresponds to one design matrix column
17
and each row contains entries for a single categorical level
18
- column_suffixes: List of strings appended to factor names for column names
19
"""
20
def __init__(self, matrix, column_suffixes):
21
"""
22
Create a contrast matrix.
23
24
Parameters:
25
- matrix: 2d array-like coding matrix
26
- column_suffixes: List of suffix strings for column naming
27
"""
28
```
29
30
### Treatment Contrasts (Dummy Coding)
31
32
The default contrast coding scheme, comparing each level to a reference level.
33
34
```python { .api }
35
class Treatment:
36
"""
37
Treatment coding (dummy coding) - the default contrast scheme.
38
39
For reduced-rank coding, one level is the reference (represented by intercept),
40
and each column represents the difference between a level and the reference.
41
For full-rank coding, classic dummy coding with each level having its own column.
42
"""
43
def __init__(self, reference=None):
44
"""
45
Parameters:
46
- reference: Level to use as reference (default: first level)
47
"""
48
```
49
50
#### Usage Examples
51
52
```python
53
import patsy
54
from patsy import Treatment
55
import pandas as pd
56
57
data = pd.DataFrame({
58
'group': ['A', 'B', 'C', 'A', 'B', 'C'],
59
'y': [1, 2, 3, 1.5, 2.5, 3.5]
60
})
61
62
# Default treatment contrasts (first level as reference)
63
y, X = patsy.dmatrices("y ~ C(group)", data)
64
print(X.design_info.column_names) # ['Intercept', 'C(group)[T.B]', 'C(group)[T.C]']
65
66
# Specify reference level
67
y, X = patsy.dmatrices("y ~ C(group, Treatment(reference='B'))", data)
68
print(X.design_info.column_names) # ['Intercept', 'C(group)[T.A]', 'C(group)[T.C]']
69
```
70
71
### Sum-to-Zero Contrasts (Deviation Coding)
72
73
Compares each level to the grand mean, with coefficients that sum to zero.
74
75
```python { .api }
76
class Sum:
77
"""
78
Deviation coding (sum-to-zero coding).
79
80
Compares the mean of each level to the mean-of-means (overall mean in balanced designs).
81
Coefficients sum to zero, making interpretation relative to the grand mean.
82
"""
83
def __init__(self, omit=None):
84
"""
85
Parameters:
86
- omit: Level to omit to avoid redundancy (default: last level)
87
"""
88
```
89
90
#### Usage Examples
91
92
```python
93
import patsy
94
from patsy import Sum
95
96
# Sum-to-zero contrasts
97
y, X = patsy.dmatrices("y ~ C(group, Sum)", data)
98
print(X.design_info.column_names) # ['Intercept', 'C(group)[S.A]', 'C(group)[S.B]']
99
100
# Specify which level to omit
101
y, X = patsy.dmatrices("y ~ C(group, Sum(omit='A'))", data)
102
```
103
104
### Helmert Contrasts
105
106
Compares each level with the average of all preceding levels.
107
108
```python { .api }
109
class Helmert:
110
"""
111
Helmert contrasts.
112
113
Compares the second level with the first, the third with the average of
114
the first two, and so on. Useful for ordered factors.
115
116
Warning: Multiple definitions of 'Helmert coding' exist. Verify this matches
117
your expected interpretation.
118
"""
119
```
120
121
#### Usage Examples
122
123
```python
124
import patsy
125
from patsy import Helmert
126
127
# Helmert contrasts for ordered factors
128
data = pd.DataFrame({
129
'dose': ['low', 'medium', 'high', 'low', 'medium', 'high'],
130
'response': [1, 2, 4, 1.2, 2.1, 3.8]
131
})
132
133
y, X = patsy.dmatrices("response ~ C(dose, Helmert, levels=['low', 'medium', 'high'])", data)
134
print(X.design_info.column_names)
135
```
136
137
### Polynomial Contrasts
138
139
Treats categorical levels as ordered samples for polynomial trend analysis.
140
141
```python { .api }
142
class Poly:
143
"""
144
Orthogonal polynomial contrast coding.
145
146
Treats levels as ordered samples from an underlying continuous scale,
147
decomposing effects into linear, quadratic, cubic, etc. components.
148
Useful for ordered factors with potentially nonlinear relationships.
149
"""
150
```
151
152
#### Usage Examples
153
154
```python
155
import patsy
156
from patsy import Poly
157
158
# Polynomial contrasts for dose-response analysis
159
data = pd.DataFrame({
160
'dose': [1, 2, 3, 4, 1, 2, 3, 4], # Numeric levels
161
'response': [1, 1.8, 3.2, 4.5, 1.1, 1.9, 3.1, 4.6]
162
})
163
164
y, X = patsy.dmatrices("response ~ C(dose, Poly)", data)
165
print(X.design_info.column_names) # Linear, quadratic, cubic terms
166
```
167
168
### Difference Contrasts (Backward Difference)
169
170
Compares each level with the immediately preceding level, useful for ordered factors.
171
172
```python { .api }
173
class Diff:
174
"""
175
Backward difference coding.
176
177
Compares each level with the preceding level: second minus first,
178
third minus second, etc. Useful for ordered factors to examine
179
step-wise changes between adjacent levels.
180
"""
181
```
182
183
#### Usage Examples
184
185
```python
186
import patsy
187
from patsy import Diff
188
189
# Difference contrasts for time periods
190
data = pd.DataFrame({
191
'period': ['pre', 'during', 'post', 'pre', 'during', 'post'],
192
'measurement': [10, 15, 12, 9, 16, 13]
193
})
194
195
y, X = patsy.dmatrices("measurement ~ C(period, Diff, levels=['pre', 'during', 'post'])", data)
196
print(X.design_info.column_names) # Shows differences: during-pre, post-during
197
```
198
199
## Contrast Coding Concepts
200
201
### Full-Rank vs Reduced-Rank Coding
202
203
- **Reduced-rank coding**: Includes an intercept term, omits one level to avoid multicollinearity
204
- **Full-rank coding**: Includes all levels without an intercept, useful for certain modeling approaches
205
206
### Choosing Contrast Schemes
207
208
| Contrast Type | Best For | Interpretation |
209
|---------------|----------|----------------|
210
| Treatment | General categorical factors | Difference from reference level |
211
| Sum | Balanced designs, ANOVA-style analysis | Deviation from grand mean |
212
| Helmert | Ordered factors, progressive comparisons | Cumulative effects |
213
| Polynomial | Ordered factors, trend analysis | Linear, quadratic, cubic trends |
214
| Diff | Ordered factors, adjacent comparisons | Step-wise changes |
215
216
### Custom Contrast Matrices
217
218
```python
219
import numpy as np
220
from patsy import ContrastMatrix
221
222
# Create custom contrast matrix
223
custom_matrix = np.array([[1, 0], [0, 1], [-1, -1]])
224
custom_contrasts = ContrastMatrix(custom_matrix, ["[custom.1]", "[custom.2]"])
225
226
# Use in formula (requires integration with Patsy's system)
227
```
228
229
## Integration with Categorical Variables
230
231
Contrast coding works seamlessly with categorical variable specification:
232
233
```python
234
import patsy
235
from patsy import C, Treatment, Sum
236
237
data = {'factor': ['A', 'B', 'C'] * 10, 'y': range(30)}
238
239
# Combine C() with contrast specification
240
designs = [
241
patsy.dmatrix("C(factor, Treatment)", data),
242
patsy.dmatrix("C(factor, Sum)", data),
243
patsy.dmatrix("C(factor, levels=['C', 'B', 'A'])", data) # Custom ordering
244
]
245
```