tessl/pypi-patsy

A Python package for describing statistical models and for building design matrices.

—

Pending

Overview

Eval results

Files

Contrast Coding

Name: tessl/pypi-patsy
Author: tessl

Classes implementing different contrast coding schemes for categorical variables. These coding schemes determine how categorical factors are represented in design matrices, affecting the interpretation of model coefficients.

Capabilities

Contrast Matrix Base Class

The foundation class for all contrast coding schemes, containing the actual coding matrix and column naming information.

class ContrastMatrix:
    """
    Container for a matrix used for coding categorical factors.
    
    Attributes:
    - matrix: 2d ndarray where each column corresponds to one design matrix column
             and each row contains entries for a single categorical level
    - column_suffixes: List of strings appended to factor names for column names
    """
    def __init__(self, matrix, column_suffixes):
        """
        Create a contrast matrix.
        
        Parameters:
        - matrix: 2d array-like coding matrix
        - column_suffixes: List of suffix strings for column naming
        """

Treatment Contrasts (Dummy Coding)

The default contrast coding scheme, comparing each level to a reference level.

class Treatment:
    """
    Treatment coding (dummy coding) - the default contrast scheme.
    
    For reduced-rank coding, one level is the reference (represented by intercept),
    and each column represents the difference between a level and the reference.
    For full-rank coding, classic dummy coding with each level having its own column.
    """
    def __init__(self, reference=None):
        """
        Parameters:
        - reference: Level to use as reference (default: first level)
        """

Usage Examples

import patsy
from patsy import Treatment
import pandas as pd

data = pd.DataFrame({
    'group': ['A', 'B', 'C', 'A', 'B', 'C'],
    'y': [1, 2, 3, 1.5, 2.5, 3.5]
})

# Default treatment contrasts (first level as reference)
y, X = patsy.dmatrices("y ~ C(group)", data)
print(X.design_info.column_names)  # ['Intercept', 'C(group)[T.B]', 'C(group)[T.C]']

# Specify reference level
y, X = patsy.dmatrices("y ~ C(group, Treatment(reference='B'))", data)
print(X.design_info.column_names)  # ['Intercept', 'C(group)[T.A]', 'C(group)[T.C]']

Sum-to-Zero Contrasts (Deviation Coding)

Compares each level to the grand mean, with coefficients that sum to zero.

class Sum:
    """
    Deviation coding (sum-to-zero coding).
    
    Compares the mean of each level to the mean-of-means (overall mean in balanced designs).
    Coefficients sum to zero, making interpretation relative to the grand mean.
    """
    def __init__(self, omit=None):
        """
        Parameters:
        - omit: Level to omit to avoid redundancy (default: last level)
        """

Usage Examples

import patsy
from patsy import Sum

# Sum-to-zero contrasts
y, X = patsy.dmatrices("y ~ C(group, Sum)", data)
print(X.design_info.column_names)  # ['Intercept', 'C(group)[S.A]', 'C(group)[S.B]']

# Specify which level to omit
y, X = patsy.dmatrices("y ~ C(group, Sum(omit='A'))", data)

Helmert Contrasts

Compares each level with the average of all preceding levels.

class Helmert:
    """
    Helmert contrasts.
    
    Compares the second level with the first, the third with the average of
    the first two, and so on. Useful for ordered factors.
    
    Warning: Multiple definitions of 'Helmert coding' exist. Verify this matches
    your expected interpretation.
    """

Usage Examples

import patsy
from patsy import Helmert

# Helmert contrasts for ordered factors
data = pd.DataFrame({
    'dose': ['low', 'medium', 'high', 'low', 'medium', 'high'],
    'response': [1, 2, 4, 1.2, 2.1, 3.8]
})

y, X = patsy.dmatrices("response ~ C(dose, Helmert, levels=['low', 'medium', 'high'])", data)
print(X.design_info.column_names)

Polynomial Contrasts

Treats categorical levels as ordered samples for polynomial trend analysis.

class Poly:
    """
    Orthogonal polynomial contrast coding.
    
    Treats levels as ordered samples from an underlying continuous scale,
    decomposing effects into linear, quadratic, cubic, etc. components.
    Useful for ordered factors with potentially nonlinear relationships.
    """

Usage Examples

import patsy
from patsy import Poly

# Polynomial contrasts for dose-response analysis
data = pd.DataFrame({
    'dose': [1, 2, 3, 4, 1, 2, 3, 4],  # Numeric levels
    'response': [1, 1.8, 3.2, 4.5, 1.1, 1.9, 3.1, 4.6]
})

y, X = patsy.dmatrices("response ~ C(dose, Poly)", data)
print(X.design_info.column_names)  # Linear, quadratic, cubic terms

Difference Contrasts (Backward Difference)

Compares each level with the immediately preceding level, useful for ordered factors.

class Diff:
    """
    Backward difference coding.
    
    Compares each level with the preceding level: second minus first,
    third minus second, etc. Useful for ordered factors to examine
    step-wise changes between adjacent levels.
    """

Usage Examples

import patsy
from patsy import Diff

# Difference contrasts for time periods
data = pd.DataFrame({
    'period': ['pre', 'during', 'post', 'pre', 'during', 'post'],
    'measurement': [10, 15, 12, 9, 16, 13]
})

y, X = patsy.dmatrices("measurement ~ C(period, Diff, levels=['pre', 'during', 'post'])", data)
print(X.design_info.column_names)  # Shows differences: during-pre, post-during

Contrast Coding Concepts

Full-Rank vs Reduced-Rank Coding

Reduced-rank coding: Includes an intercept term, omits one level to avoid multicollinearity
Full-rank coding: Includes all levels without an intercept, useful for certain modeling approaches

Choosing Contrast Schemes

Contrast Type	Best For	Interpretation
Treatment	General categorical factors	Difference from reference level
Sum	Balanced designs, ANOVA-style analysis	Deviation from grand mean
Helmert	Ordered factors, progressive comparisons	Cumulative effects
Polynomial	Ordered factors, trend analysis	Linear, quadratic, cubic trends
Diff	Ordered factors, adjacent comparisons	Step-wise changes

Custom Contrast Matrices

import numpy as np
from patsy import ContrastMatrix

# Create custom contrast matrix
custom_matrix = np.array([[1, 0], [0, 1], [-1, -1]])
custom_contrasts = ContrastMatrix(custom_matrix, ["[custom.1]", "[custom.2]"])

# Use in formula (requires integration with Patsy's system)

Integration with Categorical Variables

Contrast coding works seamlessly with categorical variable specification:

import patsy
from patsy import C, Treatment, Sum

data = {'factor': ['A', 'B', 'C'] * 10, 'y': range(30)}

# Combine C() with contrast specification
designs = [
    patsy.dmatrix("C(factor, Treatment)", data),
    patsy.dmatrix("C(factor, Sum)", data),
    patsy.dmatrix("C(factor, levels=['C', 'B', 'A'])", data)  # Custom ordering
]

Install with Tessl CLI