0
# Spline Functions
1
2
B-splines and cubic regression splines for modeling non-linear relationships in statistical models. Patsy provides implementations compatible with R and MGCV, allowing flexible smooth terms in formulas.
3
4
## Capabilities
5
6
### B-Splines
7
8
Generates B-spline basis functions for non-linear curve fitting, providing smooth approximation of arbitrary functions.
9
10
```python { .api }
11
def bs(x, df=None, knots=None, degree=3, include_intercept=False, lower_bound=None, upper_bound=None):
12
"""
13
Generate B-spline basis for x, allowing non-linear fits.
14
15
Parameters:
16
- x: Array-like data to create spline basis for
17
- df (int or None): Number of degrees of freedom (columns in output)
18
- knots (array-like or None): Interior knot locations (default: equally spaced quantiles)
19
- degree (int): Degree of the spline (default: 3 for cubic)
20
- include_intercept (bool): Whether basis spans intercept term (default: False)
21
- lower_bound (float or None): Lower boundary for spline
22
- upper_bound (float or None): Upper boundary for spline
23
24
Returns:
25
2D array with basis functions as columns
26
27
Note: Must specify at least one of df and knots
28
"""
29
```
30
31
#### Usage Examples
32
33
```python
34
import patsy
35
import numpy as np
36
import pandas as pd
37
38
# Sample data with non-linear relationship
39
x = np.linspace(0, 10, 100)
40
y = 2 * np.sin(x) + np.random.normal(0, 0.1, 100)
41
data = pd.DataFrame({'x': x, 'y': y})
42
43
# Basic B-spline with 4 degrees of freedom
44
design = patsy.dmatrix("bs(x, df=4)", data)
45
print(f"B-spline basis shape: {design.shape}")
46
47
# B-spline with custom knots
48
knots = [2, 4, 6, 8]
49
design = patsy.dmatrix("bs(x, knots=knots)", data, extra_env={'knots': knots})
50
51
# Higher degree spline
52
design = patsy.dmatrix("bs(x, df=6, degree=5)", data)
53
54
# Include intercept in basis
55
design = patsy.dmatrix("bs(x, df=4, include_intercept=True)", data)
56
57
# Complete model with B-splines
58
y_matrix, X_matrix = patsy.dmatrices("y ~ bs(x, df=5)", data)
59
```
60
61
### Cubic Regression Splines
62
63
Natural cubic splines with optional constraints, compatible with MGCV's cubic regression splines.
64
65
```python { .api }
66
def cr(x, df=None, knots=None, lower_bound=None, upper_bound=None, constraints=None):
67
"""
68
Generate natural cubic spline basis for x with optional constraints.
69
70
Parameters:
71
- x: Array-like data to create spline basis for
72
- df (int or None): Number of degrees of freedom
73
- knots (array-like or None): Interior knot locations
74
- lower_bound (float or None): Lower boundary for spline
75
- upper_bound (float or None): Upper boundary for spline
76
- constraints (str or None): Constraint type ('center' for centering constraint)
77
78
Returns:
79
2D array with natural cubic spline basis functions
80
"""
81
```
82
83
#### Usage Examples
84
85
```python
86
import patsy
87
import numpy as np
88
89
# Basic cubic regression spline
90
x = np.linspace(-2, 2, 50)
91
y = x**3 + 0.5 * x + np.random.normal(0, 0.2, 50)
92
data = {'x': x, 'y': y}
93
94
# Natural cubic spline with 5 degrees of freedom
95
design = patsy.dmatrix("cr(x, df=5)", data)
96
97
# With centering constraint
98
design = patsy.dmatrix("cr(x, df=5, constraints='center')", data)
99
100
# Complete model
101
y_matrix, X_matrix = patsy.dmatrices("y ~ cr(x, df=6)", data)
102
```
103
104
### Cyclic Cubic Splines
105
106
Cubic splines with cyclic boundary conditions, useful for periodic data.
107
108
```python { .api }
109
def cc(x, df=None, knots=None, lower_bound=None, upper_bound=None, constraints=None):
110
"""
111
Generate cyclic cubic spline basis for x with optional constraints.
112
113
Parameters:
114
- x: Array-like data to create spline basis for
115
- df (int or None): Number of degrees of freedom
116
- knots (array-like or None): Interior knot locations
117
- lower_bound (float or None): Lower boundary for cyclic period
118
- upper_bound (float or None): Upper boundary for cyclic period
119
- constraints (str or None): Constraint type ('center' for centering constraint)
120
121
Returns:
122
2D array with cyclic cubic spline basis functions
123
"""
124
```
125
126
#### Usage Examples
127
128
```python
129
import patsy
130
import numpy as np
131
132
# Cyclic data (e.g., seasonal patterns, angles)
133
t = np.linspace(0, 2*np.pi, 100)
134
y = np.sin(2*t) + 0.5*np.cos(3*t) + np.random.normal(0, 0.1, 100)
135
data = {'t': t, 'y': y}
136
137
# Cyclic cubic spline
138
design = patsy.dmatrix("cc(t, df=8)", data)
139
140
# With explicit boundaries for the cyclic period
141
design = patsy.dmatrix("cc(t, df=8, lower_bound=0, upper_bound=6.28)", data)
142
143
# Complete model for seasonal data
144
y_matrix, X_matrix = patsy.dmatrices("y ~ cc(t, df=10)", data)
145
```
146
147
### Tensor Product Smooths
148
149
Multi-dimensional smooth terms as tensor products of univariate smooths, for modeling interactions between smooth functions.
150
151
```python { .api }
152
def te(*args, constraints=None):
153
"""
154
Generate tensor product smooth of several covariates.
155
156
Parameters:
157
- *args: Multiple smooth terms (s1, s2, ..., sn) as marginal univariate smooths
158
- constraints (str or None): Constraint type for the tensor product
159
160
Returns:
161
2D array with tensor product basis functions
162
163
Note: Marginal smooths must transform data into basis function arrays.
164
The resulting basis dimension is the product of marginal basis dimensions.
165
"""
166
```
167
168
#### Usage Examples
169
170
```python
171
import patsy
172
import numpy as np
173
174
# Two-dimensional smooth surface
175
x1 = np.random.uniform(-2, 2, 100)
176
x2 = np.random.uniform(-2, 2, 100)
177
y = x1**2 + x2**2 + x1*x2 + np.random.normal(0, 0.5, 100)
178
data = {'x1': x1, 'x2': x2, 'y': y}
179
180
# Tensor product of cubic regression splines
181
# Note: This requires careful setup of the marginal smooths
182
design = patsy.dmatrix("te(cr(x1, df=5), cr(x2, df=5))", data)
183
184
# Three-dimensional tensor product
185
x3 = np.random.uniform(-1, 1, 100)
186
data['x3'] = x3
187
design = patsy.dmatrix("te(cr(x1, df=4), cr(x2, df=4), cr(x3, df=3))", data)
188
189
# Complete model with tensor product smooth
190
y_matrix, X_matrix = patsy.dmatrices("y ~ te(cr(x1, df=5), cr(x2, df=5))", data)
191
```
192
193
## Spline Usage Patterns
194
195
### Choosing Spline Types
196
197
| Spline Type | Best For | Characteristics |
198
|-------------|----------|-----------------|
199
| B-splines (`bs`) | General smooth curves | Flexible, local support, compatible with R |
200
| Cubic regression (`cr`) | Natural smooth curves | Natural boundary conditions, MGCV compatible |
201
| Cyclic cubic (`cc`) | Periodic/seasonal data | Cyclic boundary conditions |
202
| Tensor products (`te`) | Multi-dimensional smooths | Interaction of smooth terms |
203
204
### Integration with Linear Models
205
206
```python
207
import patsy
208
import numpy as np
209
from sklearn.linear_model import LinearRegression
210
211
# Generate sample data
212
np.random.seed(42)
213
x = np.linspace(0, 10, 100)
214
y = 2*np.sin(x) + 0.5*x + np.random.normal(0, 0.3, 100)
215
data = {'x': x, 'y': y}
216
217
# Create spline design matrix
218
y_matrix, X_matrix = patsy.dmatrices("y ~ bs(x, df=6)", data)
219
220
# Fit with scikit-learn
221
model = LinearRegression(fit_intercept=False) # Intercept already in design matrix
222
model.fit(X_matrix, y_matrix.ravel())
223
224
# Predict on new data
225
x_new = np.linspace(0, 10, 50)
226
data_new = {'x': x_new}
227
X_new = patsy.dmatrix("bs(x, df=6)", data_new)
228
y_pred = model.predict(X_new)
229
```
230
231
### Combining Splines with Other Terms
232
233
```python
234
# Mixed models with splines and linear terms
235
y, X = patsy.dmatrices("y ~ x1 + bs(x2, df=4) + C(group)", data)
236
237
# Multiple spline terms
238
y, X = patsy.dmatrices("y ~ bs(x1, df=3) + bs(x2, df=5)", data)
239
240
# Spline interactions
241
y, X = patsy.dmatrices("y ~ bs(x1, df=3) * bs(x2, df=3)", data)
242
```
243
244
## Advanced Spline Features
245
246
### Boundary Handling
247
248
Splines handle boundaries differently:
249
- **B-splines**: Can specify explicit bounds
250
- **Natural splines**: Linear extrapolation beyond boundaries
251
- **Cyclic splines**: Periodic boundary conditions
252
253
### Constraint Options
254
255
- **Centering constraints**: Remove the overall mean from spline basis
256
- **Custom constraints**: Apply specific parameter constraints
257
- **Integration constraints**: Ensure specific integral properties
258
259
### Stateful Transform Nature
260
261
All spline functions are stateful transforms, meaning:
262
- They remember the training data characteristics
263
- They can be consistently applied to new data
264
- They integrate with Patsy's incremental processing system