or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

builtins.mdcategorical.mdcontrasts.mdhigh-level.mdindex.mdmatrix-building.mdsplines.mdtransforms.mdutilities.md

contrasts.mddocs/

0

# Contrast Coding

1

2

Classes implementing different contrast coding schemes for categorical variables. These coding schemes determine how categorical factors are represented in design matrices, affecting the interpretation of model coefficients.

3

4

## Capabilities

5

6

### Contrast Matrix Base Class

7

8

The foundation class for all contrast coding schemes, containing the actual coding matrix and column naming information.

9

10

```python { .api }

11

class ContrastMatrix:

12

"""

13

Container for a matrix used for coding categorical factors.

14

15

Attributes:

16

- matrix: 2d ndarray where each column corresponds to one design matrix column

17

and each row contains entries for a single categorical level

18

- column_suffixes: List of strings appended to factor names for column names

19

"""

20

def __init__(self, matrix, column_suffixes):

21

"""

22

Create a contrast matrix.

23

24

Parameters:

25

- matrix: 2d array-like coding matrix

26

- column_suffixes: List of suffix strings for column naming

27

"""

28

```

29

30

### Treatment Contrasts (Dummy Coding)

31

32

The default contrast coding scheme, comparing each level to a reference level.

33

34

```python { .api }

35

class Treatment:

36

"""

37

Treatment coding (dummy coding) - the default contrast scheme.

38

39

For reduced-rank coding, one level is the reference (represented by intercept),

40

and each column represents the difference between a level and the reference.

41

For full-rank coding, classic dummy coding with each level having its own column.

42

"""

43

def __init__(self, reference=None):

44

"""

45

Parameters:

46

- reference: Level to use as reference (default: first level)

47

"""

48

```

49

50

#### Usage Examples

51

52

```python

53

import patsy

54

from patsy import Treatment

55

import pandas as pd

56

57

data = pd.DataFrame({

58

'group': ['A', 'B', 'C', 'A', 'B', 'C'],

59

'y': [1, 2, 3, 1.5, 2.5, 3.5]

60

})

61

62

# Default treatment contrasts (first level as reference)

63

y, X = patsy.dmatrices("y ~ C(group)", data)

64

print(X.design_info.column_names) # ['Intercept', 'C(group)[T.B]', 'C(group)[T.C]']

65

66

# Specify reference level

67

y, X = patsy.dmatrices("y ~ C(group, Treatment(reference='B'))", data)

68

print(X.design_info.column_names) # ['Intercept', 'C(group)[T.A]', 'C(group)[T.C]']

69

```

70

71

### Sum-to-Zero Contrasts (Deviation Coding)

72

73

Compares each level to the grand mean, with coefficients that sum to zero.

74

75

```python { .api }

76

class Sum:

77

"""

78

Deviation coding (sum-to-zero coding).

79

80

Compares the mean of each level to the mean-of-means (overall mean in balanced designs).

81

Coefficients sum to zero, making interpretation relative to the grand mean.

82

"""

83

def __init__(self, omit=None):

84

"""

85

Parameters:

86

- omit: Level to omit to avoid redundancy (default: last level)

87

"""

88

```

89

90

#### Usage Examples

91

92

```python

93

import patsy

94

from patsy import Sum

95

96

# Sum-to-zero contrasts

97

y, X = patsy.dmatrices("y ~ C(group, Sum)", data)

98

print(X.design_info.column_names) # ['Intercept', 'C(group)[S.A]', 'C(group)[S.B]']

99

100

# Specify which level to omit

101

y, X = patsy.dmatrices("y ~ C(group, Sum(omit='A'))", data)

102

```

103

104

### Helmert Contrasts

105

106

Compares each level with the average of all preceding levels.

107

108

```python { .api }

109

class Helmert:

110

"""

111

Helmert contrasts.

112

113

Compares the second level with the first, the third with the average of

114

the first two, and so on. Useful for ordered factors.

115

116

Warning: Multiple definitions of 'Helmert coding' exist. Verify this matches

117

your expected interpretation.

118

"""

119

```

120

121

#### Usage Examples

122

123

```python

124

import patsy

125

from patsy import Helmert

126

127

# Helmert contrasts for ordered factors

128

data = pd.DataFrame({

129

'dose': ['low', 'medium', 'high', 'low', 'medium', 'high'],

130

'response': [1, 2, 4, 1.2, 2.1, 3.8]

131

})

132

133

y, X = patsy.dmatrices("response ~ C(dose, Helmert, levels=['low', 'medium', 'high'])", data)

134

print(X.design_info.column_names)

135

```

136

137

### Polynomial Contrasts

138

139

Treats categorical levels as ordered samples for polynomial trend analysis.

140

141

```python { .api }

142

class Poly:

143

"""

144

Orthogonal polynomial contrast coding.

145

146

Treats levels as ordered samples from an underlying continuous scale,

147

decomposing effects into linear, quadratic, cubic, etc. components.

148

Useful for ordered factors with potentially nonlinear relationships.

149

"""

150

```

151

152

#### Usage Examples

153

154

```python

155

import patsy

156

from patsy import Poly

157

158

# Polynomial contrasts for dose-response analysis

159

data = pd.DataFrame({

160

'dose': [1, 2, 3, 4, 1, 2, 3, 4], # Numeric levels

161

'response': [1, 1.8, 3.2, 4.5, 1.1, 1.9, 3.1, 4.6]

162

})

163

164

y, X = patsy.dmatrices("response ~ C(dose, Poly)", data)

165

print(X.design_info.column_names) # Linear, quadratic, cubic terms

166

```

167

168

### Difference Contrasts (Backward Difference)

169

170

Compares each level with the immediately preceding level, useful for ordered factors.

171

172

```python { .api }

173

class Diff:

174

"""

175

Backward difference coding.

176

177

Compares each level with the preceding level: second minus first,

178

third minus second, etc. Useful for ordered factors to examine

179

step-wise changes between adjacent levels.

180

"""

181

```

182

183

#### Usage Examples

184

185

```python

186

import patsy

187

from patsy import Diff

188

189

# Difference contrasts for time periods

190

data = pd.DataFrame({

191

'period': ['pre', 'during', 'post', 'pre', 'during', 'post'],

192

'measurement': [10, 15, 12, 9, 16, 13]

193

})

194

195

y, X = patsy.dmatrices("measurement ~ C(period, Diff, levels=['pre', 'during', 'post'])", data)

196

print(X.design_info.column_names) # Shows differences: during-pre, post-during

197

```

198

199

## Contrast Coding Concepts

200

201

### Full-Rank vs Reduced-Rank Coding

202

203

- **Reduced-rank coding**: Includes an intercept term, omits one level to avoid multicollinearity

204

- **Full-rank coding**: Includes all levels without an intercept, useful for certain modeling approaches

205

206

### Choosing Contrast Schemes

207

208

| Contrast Type | Best For | Interpretation |

209

|---------------|----------|----------------|

210

| Treatment | General categorical factors | Difference from reference level |

211

| Sum | Balanced designs, ANOVA-style analysis | Deviation from grand mean |

212

| Helmert | Ordered factors, progressive comparisons | Cumulative effects |

213

| Polynomial | Ordered factors, trend analysis | Linear, quadratic, cubic trends |

214

| Diff | Ordered factors, adjacent comparisons | Step-wise changes |

215

216

### Custom Contrast Matrices

217

218

```python

219

import numpy as np

220

from patsy import ContrastMatrix

221

222

# Create custom contrast matrix

223

custom_matrix = np.array([[1, 0], [0, 1], [-1, -1]])

224

custom_contrasts = ContrastMatrix(custom_matrix, ["[custom.1]", "[custom.2]"])

225

226

# Use in formula (requires integration with Patsy's system)

227

```

228

229

## Integration with Categorical Variables

230

231

Contrast coding works seamlessly with categorical variable specification:

232

233

```python

234

import patsy

235

from patsy import C, Treatment, Sum

236

237

data = {'factor': ['A', 'B', 'C'] * 10, 'y': range(30)}

238

239

# Combine C() with contrast specification

240

designs = [

241

patsy.dmatrix("C(factor, Treatment)", data),

242

patsy.dmatrix("C(factor, Sum)", data),

243

patsy.dmatrix("C(factor, levels=['C', 'B', 'A'])", data) # Custom ordering

244

]

245

```