0
# OLS Models
1
2
Ordinary least squares regression with comprehensive spatial and non-spatial diagnostic capabilities. spreg provides both base OLS estimation and full diagnostic models with extensive testing options.
3
4
## Capabilities
5
6
### Base OLS Estimation
7
8
Core OLS estimation without diagnostics, providing essential regression coefficients and variance-covariance matrices with optional robust standard error corrections.
9
10
```python { .api }
11
class BaseOLS:
12
def __init__(self, y, x, robust=None, gwk=None, sig2n_k=False):
13
"""
14
Ordinary least squares estimation (no diagnostics or constant added).
15
16
Parameters:
17
- y (array): nx1 dependent variable
18
- x (array): nxk independent variables, excluding constant
19
- robust (str, optional): 'white' for White correction, 'hac' for HAC correction
20
- gwk (pysal W object, optional): Kernel spatial weights for HAC estimation
21
- sig2n_k (bool): If True, use n-k for sigma^2 estimation; if False, use n
22
23
Attributes:
24
- betas (array): kx1 estimated coefficients
25
- u (array): nx1 residuals
26
- predy (array): nx1 predicted values
27
- vm (array): kxk variance-covariance matrix
28
- sig2 (float): Sigma squared
29
- n (int): Number of observations
30
- k (int): Number of parameters
31
"""
32
```
33
34
### Full OLS with Diagnostics
35
36
Complete OLS implementation with spatial and non-spatial diagnostic tests, supporting SLX specifications and regime-based analysis.
37
38
```python { .api }
39
class OLS:
40
def __init__(self, y, x, w=None, robust=None, gwk=None, sig2n_k=False,
41
nonspat_diag=True, spat_diag=False, moran=False,
42
white_test=False, vif=False, slx_lags=0, slx_vars='All',
43
regimes=None, vm=False, constant_regi='one', cols2regi='all',
44
regime_err_sep=False, cores=False, name_y=None, name_x=None,
45
name_w=None, name_ds=None, latex=False):
46
"""
47
Ordinary least squares with extensive diagnostics.
48
49
Parameters:
50
- y (array): nx1 dependent variable
51
- x (array): nxk independent variables (constant added automatically)
52
- w (pysal W object, optional): Spatial weights for spatial diagnostics
53
- robust (str, optional): 'white' or 'hac' for robust standard errors
54
- gwk (pysal W object, optional): Kernel weights for HAC estimation
55
- sig2n_k (bool): Use n-k for sigma^2 estimation
56
- nonspat_diag (bool): Compute non-spatial diagnostics (default True)
57
- spat_diag (bool): Compute spatial diagnostics (requires w)
58
- moran (bool): Compute Moran's I test on residuals
59
- white_test (bool): Compute White's heteroskedasticity test
60
- vif (bool): Compute variance inflation factors
61
- slx_lags (int): Number of spatial lags of X to include
62
- slx_vars (str/list): Variables to be spatially lagged ('All' or list)
63
- regimes (list/Series, optional): Regime identifier for observations
64
- vm (bool): Include variance-covariance matrix in output
65
- constant_regi (str): 'one' (constant across regimes) or 'many'
66
- cols2regi (str/list): Variables that vary by regime ('all' or list)
67
- regime_err_sep (bool): Run separate regressions for each regime
68
- cores (bool): Use multiprocessing for regime estimation
69
- name_y, name_x, name_w, name_ds (str): Variable and dataset names
70
- latex (bool): Format output for LaTeX
71
72
Attributes:
73
- All BaseOLS attributes plus:
74
- r2 (float): R-squared
75
- ar2 (float): Adjusted R-squared
76
- f_stat (tuple): F-statistic (value, p-value)
77
- t_stat (list): t-statistics with p-values for each coefficient
78
- jarque_bera (dict): Jarque-Bera normality test results
79
- breusch_pagan (dict): Breusch-Pagan heteroskedasticity test
80
- white (dict): White heteroskedasticity test (if white_test=True)
81
- koenker_bassett (dict): Koenker-Bassett test results
82
- lm_error (dict): LM test for spatial error (if spat_diag=True)
83
- lm_lag (dict): LM test for spatial lag (if spat_diag=True)
84
- rlm_error (dict): Robust LM test for spatial error
85
- rlm_lag (dict): Robust LM test for spatial lag
86
- lm_sarma (dict): LM test for SARMA specification
87
- moran_res (dict): Moran's I test on residuals (if moran=True)
88
- vif (dict): Variance inflation factors (if vif=True)
89
- summary (str): Comprehensive formatted results
90
- output (DataFrame): Formatted results table
91
"""
92
```
93
94
## Usage Examples
95
96
### Basic OLS Regression
97
98
```python
99
import numpy as np
100
import spreg
101
from libpysal import weights
102
103
# Prepare data
104
n = 100
105
y = np.random.randn(n, 1)
106
x = np.random.randn(n, 3)
107
108
# Basic OLS without diagnostics
109
base_ols = spreg.BaseOLS(y, x)
110
print("Coefficients:", base_ols.betas.flatten())
111
print("R-squared would need manual calculation")
112
113
# Full OLS with non-spatial diagnostics
114
ols_model = spreg.OLS(y, x, nonspat_diag=True, name_y='y',
115
name_x=['x1', 'x2', 'x3'])
116
print(ols_model.summary)
117
print("R-squared:", ols_model.r2)
118
print("F-statistic:", ols_model.f_stat)
119
```
120
121
### OLS with Spatial Diagnostics
122
123
```python
124
import numpy as np
125
import spreg
126
from libpysal import weights
127
128
# Create spatial data
129
n = 49 # 7x7 grid
130
y = np.random.randn(n, 1)
131
x = np.random.randn(n, 2)
132
w = weights.lat2W(7, 7) # 7x7 lattice weights
133
134
# OLS with spatial diagnostics
135
spatial_ols = spreg.OLS(y, x, w=w, spat_diag=True, moran=True,
136
name_y='y', name_x=['x1', 'x2'])
137
138
print(spatial_ols.summary)
139
print("LM Error test:", spatial_ols.lm_error)
140
print("LM Lag test:", spatial_ols.lm_lag)
141
print("Moran's I on residuals:", spatial_ols.moran_res)
142
143
# Check if spatial dependence is detected
144
if spatial_ols.lm_error['p-value'] < 0.05:
145
print("Spatial error dependence detected")
146
if spatial_ols.lm_lag['p-value'] < 0.05:
147
print("Spatial lag dependence detected")
148
```
149
150
### OLS with SLX Specification
151
152
```python
153
import numpy as np
154
import spreg
155
from libpysal import weights
156
157
# Spatial lag of X (SLX) model
158
n = 100
159
y = np.random.randn(n, 1)
160
x = np.random.randn(n, 2)
161
w = weights.KNN.from_array(np.random.randn(n, 2), k=5)
162
163
# Include spatial lags of X variables
164
slx_model = spreg.OLS(y, x, w=w, slx_lags=1, slx_vars='All',
165
spat_diag=True, name_y='y', name_x=['x1', 'x2'])
166
167
print(slx_model.summary)
168
print("Number of coefficients (includes spatial lags):", slx_model.k)
169
```
170
171
### OLS with Robust Standard Errors
172
173
```python
174
import numpy as np
175
import spreg
176
177
# OLS with White robust standard errors
178
n = 100
179
y = np.random.randn(n, 1)
180
x = np.random.randn(n, 2)
181
182
# White correction for heteroskedasticity
183
white_ols = spreg.OLS(y, x, robust='white', nonspat_diag=True,
184
name_y='y', name_x=['x1', 'x2'])
185
186
print(white_ols.summary)
187
print("Uses White-corrected standard errors")
188
189
# HAC correction requires spatial weights kernel
190
from libpysal import weights
191
w_kernel = weights.DistanceBand.from_array(np.random.randn(n, 2),
192
threshold=1.0, binary=False)
193
hac_ols = spreg.OLS(y, x, robust='hac', gwk=w_kernel,
194
name_y='y', name_x=['x1', 'x2'])
195
print("Uses HAC-corrected standard errors")
196
```
197
198
### Regime-Based OLS
199
200
```python
201
import numpy as np
202
import spreg
203
204
# OLS with regimes
205
n = 100
206
y = np.random.randn(n, 1)
207
x = np.random.randn(n, 2)
208
regimes = np.random.choice(['A', 'B', 'C'], n)
209
210
# Different intercepts and slopes by regime
211
regime_ols = spreg.OLS(y, x, regimes=regimes, constant_regi='many',
212
cols2regi='all', name_y='y', name_x=['x1', 'x2'],
213
name_regimes='region')
214
215
print(regime_ols.summary)
216
print("Number of regimes:", regime_ols.nr)
217
print("Chow test results:", regime_ols.chow)
218
219
# Separate regression for each regime
220
separate_ols = spreg.OLS(y, x, regimes=regimes, regime_err_sep=True,
221
name_y='y', name_x=['x1', 'x2'])
222
print("Individual regime results:", separate_ols.multi.keys())
223
```
224
225
## Common Diagnostic Interpretations
226
227
### R-squared and Model Fit
228
- `r2`: Proportion of variance explained by the model
229
- `ar2`: Adjusted R-squared, penalized for number of parameters
230
- `f_stat`: Overall model significance test
231
232
### Heteroskedasticity Tests
233
- `breusch_pagan`: Tests for heteroskedasticity related to fitted values
234
- `white`: General heteroskedasticity test (if requested)
235
- `koenker_bassett`: Studentized version of Breusch-Pagan
236
237
### Spatial Dependence Tests
238
- `lm_error`: Tests for spatial error dependence
239
- `lm_lag`: Tests for spatial lag dependence
240
- `rlm_error`, `rlm_lag`: Robust versions accounting for local misspecification
241
- `lm_sarma`: Joint test for both error and lag dependence
242
- `moran_res`: Moran's I test on regression residuals
243
244
### Multicollinearity
245
- `vif`: Variance inflation factors for detecting multicollinearity
246
247
A VIF > 10 typically indicates problematic multicollinearity.
248
249
## Model Selection Guidelines
250
251
1. **Start with basic OLS** with non-spatial diagnostics
252
2. **Add spatial diagnostics** if working with spatial data
253
3. **Check for spatial dependence**:
254
- If LM Error is significant → consider spatial error model
255
- If LM Lag is significant → consider spatial lag model
256
- If both significant → use robust tests to distinguish
257
4. **Check for heteroskedasticity**: Use robust standard errors if detected
258
5. **Consider SLX specification** for spatially-lagged independent variables
259
6. **Use regime models** when parameters vary systematically across groups