A Python package for describing statistical models and for building design matrices.
npx @tessl/cli install tessl/pypi-patsy@1.0.00
# Patsy
1
2
A Python package for describing statistical models (especially linear models or models with linear components) and building design matrices. Patsy brings the convenience of R-style 'formulas' to Python, allowing users to specify statistical models using intuitive string-based syntax like `"y ~ x + I(x**2)"`. The library provides comprehensive functionality for transforming data into design matrices suitable for statistical analysis, handling categorical variables, interactions, transformations, and various statistical functions including splines.
3
4
## Package Information
5
6
- **Package Name**: patsy
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install patsy`
10
11
## Core Imports
12
13
```python
14
import patsy
15
```
16
17
Most common pattern for high-level functions:
18
19
```python
20
from patsy import dmatrix, dmatrices, C
21
```
22
23
## Basic Usage
24
25
```python
26
import patsy
27
import pandas as pd
28
import numpy as np
29
30
# Create some sample data
31
data = pd.DataFrame({
32
'y': [1, 2, 3, 4, 5, 6],
33
'x1': [1, 2, 3, 4, 5, 6],
34
'x2': [0.5, 1.5, 2.5, 3.5, 4.5, 5.5],
35
'group': ['A', 'A', 'B', 'B', 'C', 'C']
36
})
37
38
# Build a single design matrix (predictors only)
39
design_matrix = patsy.dmatrix("x1 + x2 + C(group)", data)
40
print(design_matrix)
41
42
# Build both outcome and predictor matrices
43
y, X = patsy.dmatrices("y ~ x1 + x2 + C(group)", data)
44
print("Outcome:", y)
45
print("Predictors:", X)
46
47
# Using interactions and transformations
48
design_matrix = patsy.dmatrix("x1 + I(x1**2) + x1:x2", data)
49
print(design_matrix)
50
```
51
52
## Architecture
53
54
Patsy is built around several key architectural components:
55
56
- **Formula Language**: R-style formulas describing model structure
57
- **Term System**: Internal representation of model terms and their relationships
58
- **Factor System**: Evaluation and encoding of individual variables
59
- **Design Matrix Builders**: Objects that construct design matrices from data
60
- **Transform System**: Stateful transformations for centering, scaling, and custom operations
61
- **Categorical Handling**: Automatic detection and coding of categorical variables
62
63
This design enables flexible model specification while providing efficient matrix construction for statistical computing.
64
65
## Capabilities
66
67
### High-Level Interface
68
69
The main entry points for creating design matrices from formula strings. These functions handle the complete workflow from formula parsing to matrix construction.
70
71
```python { .api }
72
def dmatrix(formula_like, data={}, eval_env=0, NA_action="drop", return_type="matrix"): ...
73
def dmatrices(formula_like, data={}, eval_env=0, NA_action="drop", return_type="matrix"): ...
74
def incr_dbuilder(formula_like, data_iter_maker, eval_env=0, NA_action="drop"): ...
75
def incr_dbuilders(formula_like, data_iter_maker, eval_env=0, NA_action="drop"): ...
76
```
77
78
[High-Level Interface](./high-level.md)
79
80
### Categorical Variables
81
82
Functions and classes for handling categorical data, including automatic detection, manual specification, and conversion utilities.
83
84
```python { .api }
85
def C(data, contrast=None, levels=None): ...
86
def guess_categorical(data): ...
87
def categorical_to_int(data, levels=None, pandas_index=False): ...
88
class CategoricalSniffer: ...
89
```
90
91
[Categorical Variables](./categorical.md)
92
93
### Contrast Coding
94
95
Classes implementing different contrast coding schemes for categorical variables, essential for statistical modeling.
96
97
```python { .api }
98
class ContrastMatrix: ...
99
class Treatment: ...
100
class Sum: ...
101
class Helmert: ...
102
class Poly: ...
103
class Diff: ...
104
```
105
106
[Contrast Coding](./contrasts.md)
107
108
### Spline Functions
109
110
B-splines and cubic regression splines for modeling non-linear relationships, compatible with R and MGCV implementations.
111
112
```python { .api }
113
def bs(x, df=None, knots=None, degree=3, include_intercept=False, lower_bound=None, upper_bound=None): ...
114
def cr(x, df=10, constraints=None): ...
115
def cc(x, df=10, constraints=None): ...
116
def te(*args, **kwargs): ...
117
```
118
119
[Spline Functions](./splines.md)
120
121
### Stateful Transforms
122
123
Transform functions that maintain state across data processing, useful for centering, standardization, and custom transformations.
124
125
```python { .api }
126
def stateful_transform(class_): ...
127
def center(x): ...
128
def standardize(x): ...
129
def scale(x, ddof=0): ...
130
```
131
132
[Stateful Transforms](./transforms.md)
133
134
### Design Matrix Building
135
136
Lower-level functions for constructing design matrices from parsed terms, providing more control over the matrix building process.
137
138
```python { .api }
139
def design_matrix_builders(termlists, data_iter_maker, eval_env=None, NA_action="drop"): ...
140
def build_design_matrices(builders, data, NA_action=None, return_type="matrix"): ...
141
```
142
143
[Design Matrix Building](./matrix-building.md)
144
145
### Built-in Functions
146
147
Special functions available in formula namespaces for escaping arithmetic operations and handling variable names with special characters.
148
149
```python { .api }
150
def I(x): ...
151
def Q(name): ...
152
```
153
154
[Built-in Functions](./builtins.md)
155
156
### Utility Functions
157
158
Helper functions for generating test data, creating balanced designs, and other common tasks.
159
160
```python { .api }
161
def balanced(*factors, levels=None): ...
162
def demo_data(formula, num_rows=100, seed=None): ...
163
class LookupFactor: ...
164
```
165
166
[Utility Functions](./utilities.md)
167
168
## Core Types
169
170
```python { .api }
171
class PatsyError(Exception):
172
"""Main exception class for Patsy-specific errors."""
173
def __init__(self, message, origin=None): ...
174
def set_origin(self, origin): ...
175
176
class ModelDesc:
177
"""Describes the overall structure of a statistical model."""
178
@classmethod
179
def from_formula(cls, formula_string, default_env=0): ...
180
181
class Term:
182
"""Represents a term in a statistical model."""
183
def __init__(self, factors, origin=None): ...
184
185
class DesignInfo:
186
"""Information about the structure of a design matrix."""
187
def __init__(self, column_names, factor_infos=None, term_name_slices=None,
188
term_names=None, terms=None, builder=None): ...
189
190
class DesignMatrix(numpy.ndarray):
191
"""numpy array subclass with design matrix metadata."""
192
@property
193
def design_info(self): ...
194
195
class LinearConstraint:
196
"""Class for representing linear constraints on design matrices."""
197
def __init__(self, constraint_matrix, constants=None): ...
198
199
class NAAction:
200
"""Defines strategy for handling missing data."""
201
def __init__(self, on_NA="drop", NA_types=["None", "NaN"]): ...
202
def is_numerical_NA(self, array): ...
203
def is_categorical_NA(self, array): ...
204
205
class EvalEnvironment:
206
"""Captures the environment for evaluating formulas."""
207
def __init__(self, namespaces, flags=0): ...
208
@classmethod
209
def capture(cls, depth=0, reference=None): ...
210
def eval(self, code, inner_namespace={}): ...
211
def namespace(self, name): ...
212
213
class EvalFactor:
214
"""Factor that evaluates arbitrary Python code in a given environment."""
215
def __init__(self, code, origin=None): ...
216
def eval(self, state, env): ...
217
def name(self): ...
218
219
class Origin:
220
"""Tracks the origin of objects in strings for error reporting."""
221
def __init__(self, code, start, end): ...
222
@classmethod
223
def combine(cls, origin_objs): ...
224
def caretize(self, indent=0): ...
225
```
226
227
## Constants
228
229
```python { .api }
230
INTERCEPT: Term # Special constant representing the intercept term
231
```