or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-patsy

A Python package for describing statistical models and for building design matrices.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/patsy@1.0.x

To install, run

npx @tessl/cli install tessl/pypi-patsy@1.0.0

0

# Patsy

1

2

A Python package for describing statistical models (especially linear models or models with linear components) and building design matrices. Patsy brings the convenience of R-style 'formulas' to Python, allowing users to specify statistical models using intuitive string-based syntax like `"y ~ x + I(x**2)"`. The library provides comprehensive functionality for transforming data into design matrices suitable for statistical analysis, handling categorical variables, interactions, transformations, and various statistical functions including splines.

3

4

## Package Information

5

6

- **Package Name**: patsy

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install patsy`

10

11

## Core Imports

12

13

```python

14

import patsy

15

```

16

17

Most common pattern for high-level functions:

18

19

```python

20

from patsy import dmatrix, dmatrices, C

21

```

22

23

## Basic Usage

24

25

```python

26

import patsy

27

import pandas as pd

28

import numpy as np

29

30

# Create some sample data

31

data = pd.DataFrame({

32

'y': [1, 2, 3, 4, 5, 6],

33

'x1': [1, 2, 3, 4, 5, 6],

34

'x2': [0.5, 1.5, 2.5, 3.5, 4.5, 5.5],

35

'group': ['A', 'A', 'B', 'B', 'C', 'C']

36

})

37

38

# Build a single design matrix (predictors only)

39

design_matrix = patsy.dmatrix("x1 + x2 + C(group)", data)

40

print(design_matrix)

41

42

# Build both outcome and predictor matrices

43

y, X = patsy.dmatrices("y ~ x1 + x2 + C(group)", data)

44

print("Outcome:", y)

45

print("Predictors:", X)

46

47

# Using interactions and transformations

48

design_matrix = patsy.dmatrix("x1 + I(x1**2) + x1:x2", data)

49

print(design_matrix)

50

```

51

52

## Architecture

53

54

Patsy is built around several key architectural components:

55

56

- **Formula Language**: R-style formulas describing model structure

57

- **Term System**: Internal representation of model terms and their relationships

58

- **Factor System**: Evaluation and encoding of individual variables

59

- **Design Matrix Builders**: Objects that construct design matrices from data

60

- **Transform System**: Stateful transformations for centering, scaling, and custom operations

61

- **Categorical Handling**: Automatic detection and coding of categorical variables

62

63

This design enables flexible model specification while providing efficient matrix construction for statistical computing.

64

65

## Capabilities

66

67

### High-Level Interface

68

69

The main entry points for creating design matrices from formula strings. These functions handle the complete workflow from formula parsing to matrix construction.

70

71

```python { .api }

72

def dmatrix(formula_like, data={}, eval_env=0, NA_action="drop", return_type="matrix"): ...

73

def dmatrices(formula_like, data={}, eval_env=0, NA_action="drop", return_type="matrix"): ...

74

def incr_dbuilder(formula_like, data_iter_maker, eval_env=0, NA_action="drop"): ...

75

def incr_dbuilders(formula_like, data_iter_maker, eval_env=0, NA_action="drop"): ...

76

```

77

78

[High-Level Interface](./high-level.md)

79

80

### Categorical Variables

81

82

Functions and classes for handling categorical data, including automatic detection, manual specification, and conversion utilities.

83

84

```python { .api }

85

def C(data, contrast=None, levels=None): ...

86

def guess_categorical(data): ...

87

def categorical_to_int(data, levels=None, pandas_index=False): ...

88

class CategoricalSniffer: ...

89

```

90

91

[Categorical Variables](./categorical.md)

92

93

### Contrast Coding

94

95

Classes implementing different contrast coding schemes for categorical variables, essential for statistical modeling.

96

97

```python { .api }

98

class ContrastMatrix: ...

99

class Treatment: ...

100

class Sum: ...

101

class Helmert: ...

102

class Poly: ...

103

class Diff: ...

104

```

105

106

[Contrast Coding](./contrasts.md)

107

108

### Spline Functions

109

110

B-splines and cubic regression splines for modeling non-linear relationships, compatible with R and MGCV implementations.

111

112

```python { .api }

113

def bs(x, df=None, knots=None, degree=3, include_intercept=False, lower_bound=None, upper_bound=None): ...

114

def cr(x, df=10, constraints=None): ...

115

def cc(x, df=10, constraints=None): ...

116

def te(*args, **kwargs): ...

117

```

118

119

[Spline Functions](./splines.md)

120

121

### Stateful Transforms

122

123

Transform functions that maintain state across data processing, useful for centering, standardization, and custom transformations.

124

125

```python { .api }

126

def stateful_transform(class_): ...

127

def center(x): ...

128

def standardize(x): ...

129

def scale(x, ddof=0): ...

130

```

131

132

[Stateful Transforms](./transforms.md)

133

134

### Design Matrix Building

135

136

Lower-level functions for constructing design matrices from parsed terms, providing more control over the matrix building process.

137

138

```python { .api }

139

def design_matrix_builders(termlists, data_iter_maker, eval_env=None, NA_action="drop"): ...

140

def build_design_matrices(builders, data, NA_action=None, return_type="matrix"): ...

141

```

142

143

[Design Matrix Building](./matrix-building.md)

144

145

### Built-in Functions

146

147

Special functions available in formula namespaces for escaping arithmetic operations and handling variable names with special characters.

148

149

```python { .api }

150

def I(x): ...

151

def Q(name): ...

152

```

153

154

[Built-in Functions](./builtins.md)

155

156

### Utility Functions

157

158

Helper functions for generating test data, creating balanced designs, and other common tasks.

159

160

```python { .api }

161

def balanced(*factors, levels=None): ...

162

def demo_data(formula, num_rows=100, seed=None): ...

163

class LookupFactor: ...

164

```

165

166

[Utility Functions](./utilities.md)

167

168

## Core Types

169

170

```python { .api }

171

class PatsyError(Exception):

172

"""Main exception class for Patsy-specific errors."""

173

def __init__(self, message, origin=None): ...

174

def set_origin(self, origin): ...

175

176

class ModelDesc:

177

"""Describes the overall structure of a statistical model."""

178

@classmethod

179

def from_formula(cls, formula_string, default_env=0): ...

180

181

class Term:

182

"""Represents a term in a statistical model."""

183

def __init__(self, factors, origin=None): ...

184

185

class DesignInfo:

186

"""Information about the structure of a design matrix."""

187

def __init__(self, column_names, factor_infos=None, term_name_slices=None,

188

term_names=None, terms=None, builder=None): ...

189

190

class DesignMatrix(numpy.ndarray):

191

"""numpy array subclass with design matrix metadata."""

192

@property

193

def design_info(self): ...

194

195

class LinearConstraint:

196

"""Class for representing linear constraints on design matrices."""

197

def __init__(self, constraint_matrix, constants=None): ...

198

199

class NAAction:

200

"""Defines strategy for handling missing data."""

201

def __init__(self, on_NA="drop", NA_types=["None", "NaN"]): ...

202

def is_numerical_NA(self, array): ...

203

def is_categorical_NA(self, array): ...

204

205

class EvalEnvironment:

206

"""Captures the environment for evaluating formulas."""

207

def __init__(self, namespaces, flags=0): ...

208

@classmethod

209

def capture(cls, depth=0, reference=None): ...

210

def eval(self, code, inner_namespace={}): ...

211

def namespace(self, name): ...

212

213

class EvalFactor:

214

"""Factor that evaluates arbitrary Python code in a given environment."""

215

def __init__(self, code, origin=None): ...

216

def eval(self, state, env): ...

217

def name(self): ...

218

219

class Origin:

220

"""Tracks the origin of objects in strings for error reporting."""

221

def __init__(self, code, start, end): ...

222

@classmethod

223

def combine(cls, origin_objs): ...

224

def caretize(self, indent=0): ...

225

```

226

227

## Constants

228

229

```python { .api }

230

INTERCEPT: Term # Special constant representing the intercept term

231

```