or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

builtins.mdcategorical.mdcontrasts.mdhigh-level.mdindex.mdmatrix-building.mdsplines.mdtransforms.mdutilities.md

high-level.mddocs/

0

# High-Level Interface

1

2

The main entry points for creating design matrices from formula strings. These functions handle the complete workflow from formula parsing to matrix construction, providing the most convenient interface for typical statistical modeling tasks.

3

4

## Capabilities

5

6

### Single Design Matrix Construction

7

8

Constructs a single design matrix from a formula specification, commonly used for creating predictor matrices in regression models.

9

10

```python { .api }

11

def dmatrix(formula_like, data={}, eval_env=0, NA_action="drop", return_type="matrix"):

12

"""

13

Construct a single design matrix given a formula_like and data.

14

15

Parameters:

16

- formula_like: Formula string, ModelDesc, DesignInfo, explicit matrix, or object with __patsy_get_model_desc__ method

17

- data (dict-like): Dict-like object to look up variables referenced in formula

18

- eval_env (int or EvalEnvironment): Environment for variable lookup (0=caller frame, 1=caller's caller, etc.)

19

- NA_action (str or NAAction): Strategy for handling missing data ("drop", "raise", or NAAction object)

20

- return_type (str): "matrix" for numpy arrays or "dataframe" for pandas DataFrames

21

22

Returns:

23

DesignMatrix (numpy.ndarray subclass with metadata) or pandas DataFrame

24

"""

25

```

26

27

#### Usage Examples

28

29

```python

30

import patsy

31

import pandas as pd

32

33

# Simple linear terms

34

data = {'x': [1, 2, 3, 4], 'y': [2, 4, 6, 8]}

35

design = patsy.dmatrix("x", data)

36

37

# Polynomial terms with I() function

38

design = patsy.dmatrix("x + I(x**2)", data)

39

40

# Categorical variables

41

data = {'treatment': ['A', 'B', 'A', 'B'], 'response': [1, 2, 3, 4]}

42

design = patsy.dmatrix("C(treatment)", data)

43

44

# Interactions

45

design = patsy.dmatrix("x * C(treatment)", data)

46

```

47

48

### Dual Design Matrix Construction

49

50

Constructs both outcome and predictor design matrices from a formula specification, the standard approach for regression modeling.

51

52

```python { .api }

53

def dmatrices(formula_like, data={}, eval_env=0, NA_action="drop", return_type="matrix"):

54

"""

55

Construct two design matrices given a formula_like and data.

56

57

This function requires a two-sided formula (outcome ~ predictors) and returns

58

two matrices: the outcome (y) and predictor (X) matrices.

59

60

Parameters:

61

- formula_like: Two-sided formula string or equivalent (must specify both outcome and predictors)

62

- data (dict-like): Dict-like object to look up variables referenced in formula

63

- eval_env (int or EvalEnvironment): Environment for variable lookup

64

- NA_action (str or NAAction): Strategy for handling missing data

65

- return_type (str): "matrix" for numpy arrays or "dataframe" for pandas DataFrames

66

67

Returns:

68

Tuple of (outcome_matrix, predictor_matrix) - both DesignMatrix objects or DataFrames

69

"""

70

```

71

72

#### Usage Examples

73

74

```python

75

import patsy

76

import pandas as pd

77

78

# Basic regression model

79

data = pd.DataFrame({

80

'y': [1, 2, 3, 4, 5],

81

'x1': [1, 2, 3, 4, 5],

82

'x2': [2, 4, 6, 8, 10]

83

})

84

85

# Two-sided formula

86

y, X = patsy.dmatrices("y ~ x1 + x2", data)

87

print("Outcome shape:", y.shape)

88

print("Predictors shape:", X.shape)

89

90

# More complex model with interactions and transformations

91

y, X = patsy.dmatrices("y ~ x1 * x2 + I(x1**2)", data)

92

93

# Categorical predictors

94

data = pd.DataFrame({

95

'y': [1, 2, 3, 4, 5, 6],

96

'x': [1, 2, 3, 4, 5, 6],

97

'group': ['A', 'A', 'B', 'B', 'C', 'C']

98

})

99

y, X = patsy.dmatrices("y ~ x + C(group)", data)

100

```

101

102

### Incremental Design Matrix Builders

103

104

For large datasets that don't fit in memory, these functions create builders that can process data incrementally.

105

106

```python { .api }

107

def incr_dbuilder(formula_like, data_iter_maker, eval_env=0, NA_action="drop"):

108

"""

109

Construct a design matrix builder incrementally from a large data set.

110

111

Parameters:

112

- formula_like: Formula string, ModelDesc, DesignInfo, or object with __patsy_get_model_desc__ method (explicit matrices not allowed)

113

- data_iter_maker: Zero-argument callable returning iterator over dict-like data objects

114

- eval_env (int or EvalEnvironment): Environment for variable lookup

115

- NA_action (str or NAAction): Strategy for handling missing data

116

117

Returns:

118

DesignMatrixBuilder object that can process data incrementally

119

"""

120

121

def incr_dbuilders(formula_like, data_iter_maker, eval_env=0, NA_action="drop"):

122

"""

123

Construct two design matrix builders incrementally from a large data set.

124

125

This is the incremental version of dmatrices(), for processing large datasets

126

that require multiple passes or don't fit in memory.

127

128

Parameters:

129

- formula_like: Two-sided formula string or equivalent

130

- data_iter_maker: Zero-argument callable returning iterator over dict-like data objects

131

- eval_env (int or EvalEnvironment): Environment for variable lookup

132

- NA_action (str or NAAction): Strategy for handling missing data

133

134

Returns:

135

Tuple of (outcome_builder, predictor_builder) - both DesignMatrixBuilder objects

136

"""

137

```

138

139

#### Usage Examples

140

141

```python

142

import patsy

143

144

# Function that returns an iterator over data chunks

145

def data_chunks():

146

# This could read from a database, files, etc.

147

for i in range(0, 10000, 1000):

148

yield {'x': list(range(i, i+1000)),

149

'y': [j*2 for j in range(i, i+1000)]}

150

151

# Build incremental design matrix builder

152

builder = patsy.incr_dbuilder("x + I(x**2)", data_chunks)

153

154

# Use the builder to process new data

155

new_data = {'x': [1, 2, 3], 'y': [2, 4, 6]}

156

design_matrix = builder.build(new_data)

157

158

# For two-sided formulas

159

y_builder, X_builder = patsy.incr_dbuilders("y ~ x + I(x**2)", data_chunks)

160

```

161

162

## Formula Types

163

164

The `formula_like` parameter accepts several types:

165

166

- **String formulas**: R-style formula strings like `"y ~ x1 + x2"`

167

- **ModelDesc objects**: Parsed formula representations

168

- **DesignInfo objects**: Metadata about matrix structure

169

- **Explicit matrices**: numpy arrays or pandas DataFrames (dmatrix only)

170

- **Objects with __patsy_get_model_desc__ method**: Custom formula-like objects

171

172

## Return Types

173

174

Functions support two return types via the `return_type` parameter:

175

176

- **"matrix"** (default): Returns DesignMatrix objects (numpy.ndarray subclasses with metadata)

177

- **"dataframe"**: Returns pandas DataFrames (requires pandas installation)

178

179

## Missing Data Handling

180

181

The `NA_action` parameter controls missing data handling:

182

183

- **"drop"** (default): Remove rows with any missing values

184

- **"raise"**: Raise an exception if missing values are encountered

185

- **NAAction object**: Custom missing data handling strategy