or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

diagnostics.mdindex.mdml-models.mdols-models.mdpanel-models.mdprobit-models.mdregime-models.mdspatial-error-models.mdsur-models.mdtsls-models.mdutilities.md

tsls-models.mddocs/

0

# Two-Stage Least Squares Models

1

2

Two-stage least squares (TSLS) estimation for handling endogenous variables in regression models, with spatial diagnostic capabilities and regime-based analysis options.

3

4

## Capabilities

5

6

### Base TSLS Estimation

7

8

Core two-stage least squares estimation without diagnostics, providing instrumental variable estimation for models with endogeneity.

9

10

```python { .api }

11

class BaseTSLS:

12

def __init__(self, y, x, yend, q=None, h=None, robust=None, gwk=None, sig2n_k=False):

13

"""

14

Two-stage least squares estimation (no diagnostics).

15

16

Parameters:

17

- y (array): nx1 dependent variable

18

- x (array): nxk exogenous independent variables (excluding constant)

19

- yend (array): nxp endogenous variables

20

- q (array, optional): nxq external instruments (cannot use with h)

21

- h (array, optional): nxl all instruments (cannot use with q)

22

- robust (str, optional): 'white' or 'hac' for robust standard errors

23

- gwk (pysal W object, optional): Kernel weights for HAC estimation

24

- sig2n_k (bool): If True, use n-k for sigma^2 estimation

25

26

Attributes:

27

- betas (array): kx1 estimated coefficients (for x and yend combined)

28

- u (array): nx1 residuals

29

- predy (array): nx1 predicted values

30

- z (array): nxk combined exogenous and endogenous variables

31

- h (array): nxl all instruments

32

- vm (array): kxk variance-covariance matrix

33

- sig2 (float): Sigma squared

34

- n (int): Number of observations

35

- k (int): Number of parameters

36

- kstar (int): Number of endogenous variables

37

"""

38

```

39

40

### Full TSLS with Diagnostics

41

42

Complete TSLS implementation with spatial diagnostics, endogeneity tests, and comprehensive output formatting.

43

44

```python { .api }

45

class TSLS:

46

def __init__(self, y, x, yend, q, h=None, robust=None, gwk=None, sig2n_k=False,

47

nonspat_diag=True, spat_diag=False, w=None, slx_lags=0,

48

slx_vars='All', regimes=None, vm=False, constant_regi='one',

49

cols2regi='all', regime_err_sep=False, regime_lag_sep=False,

50

cores=False, name_y=None, name_x=None, name_yend=None,

51

name_q=None, name_h=None, name_w=None, name_ds=None, latex=False):

52

"""

53

Two-stage least squares with diagnostics.

54

55

Parameters:

56

- y (array): nx1 dependent variable

57

- x (array): nxk exogenous independent variables (constant added automatically)

58

- yend (array): nxp endogenous variables

59

- q (array): nxq external instruments

60

- h (array, optional): nxl all instruments (alternative to q)

61

- robust (str, optional): 'white' or 'hac' for robust standard errors

62

- gwk (pysal W object, optional): Kernel weights for HAC

63

- sig2n_k (bool): Use n-k for sigma^2 estimation

64

- nonspat_diag (bool): Compute non-spatial diagnostics

65

- spat_diag (bool): Compute Anselin-Kelejian test (requires w)

66

- w (pysal W object, optional): Spatial weights for spatial diagnostics

67

- slx_lags (int): Number of spatial lags of X to include

68

- slx_vars (str/list): Variables to be spatially lagged

69

- regimes (list/Series, optional): Regime identifier

70

- vm (bool): Include variance-covariance matrix

71

- constant_regi (str): Regime treatment of constant

72

- cols2regi (str/list): Variables that vary by regime

73

- regime_err_sep (bool): Separate error variance by regime

74

- regime_lag_sep (bool): Separate spatial lag by regime

75

- cores (bool): Use multiprocessing

76

- name_y, name_x, name_yend, name_q, name_h, name_w, name_ds (str): Variable names

77

- latex (bool): LaTeX formatting

78

79

Attributes:

80

- All BaseTSLS attributes plus:

81

- pr2 (float): Pseudo R-squared

82

- z_stat (list): z-statistics with p-values for each coefficient

83

- ak_test (dict): Anselin-Kelejian test for spatial dependence (if spat_diag=True)

84

- dwh (dict): Durbin-Wu-Hausman endogeneity test

85

- summary (str): Comprehensive formatted results

86

- output (DataFrame): Formatted results table

87

"""

88

```

89

90

## Usage Examples

91

92

### Basic TSLS Estimation

93

94

```python

95

import numpy as np

96

import spreg

97

98

# Generate data with endogeneity

99

n = 100

100

# Structural error and measurement error

101

e1 = np.random.randn(n, 1) # structural error

102

e2 = np.random.randn(n, 1) # error in endogenous variable

103

104

# Exogenous variables and instruments

105

x = np.random.randn(n, 2)

106

z = np.random.randn(n, 1) # external instrument

107

108

# Endogenous variable (correlated with error)

109

yend = 2 * z + 0.5 * e1 + e2

110

111

# Dependent variable

112

y = 1 + 2 * x[:, 0:1] + 3 * x[:, 1:2] + 1.5 * yend + e1

113

114

# TSLS estimation

115

tsls_model = spreg.TSLS(y, x, yend, z, name_y='y',

116

name_x=['x1', 'x2'], name_yend=['yend'],

117

name_q=['instrument'])

118

119

print(tsls_model.summary)

120

print("Pseudo R-squared:", tsls_model.pr2)

121

print("Durbin-Wu-Hausman test:", tsls_model.dwh)

122

```

123

124

### TSLS with Multiple Instruments

125

126

```python

127

import numpy as np

128

import spreg

129

130

# Multiple endogenous variables and instruments

131

n = 100

132

x = np.random.randn(n, 2)

133

z1 = np.random.randn(n, 1) # instrument for first endogenous var

134

z2 = np.random.randn(n, 1) # instrument for second endogenous var

135

z3 = np.random.randn(n, 1) # additional instrument (overidentification)

136

137

# Two endogenous variables

138

yend1 = 1.5 * z1 + 0.3 * z3 + np.random.randn(n, 1)

139

yend2 = 2.0 * z2 + 0.4 * z3 + np.random.randn(n, 1)

140

yend = np.hstack([yend1, yend2])

141

142

# All external instruments

143

q = np.hstack([z1, z2, z3])

144

145

# Dependent variable

146

y = 1 + x[:, 0:1] + 2 * x[:, 1:2] + 0.5 * yend1 + 1.2 * yend2 + np.random.randn(n, 1)

147

148

# TSLS with multiple endogenous variables

149

multi_tsls = spreg.TSLS(y, x, yend, q,

150

name_y='y', name_x=['x1', 'x2'],

151

name_yend=['yend1', 'yend2'],

152

name_q=['z1', 'z2', 'z3'])

153

154

print(multi_tsls.summary)

155

print(f"Model is {'over' if multi_tsls.h.shape[1] > multi_tsls.kstar else 'just'}identified")

156

```

157

158

### TSLS with Spatial Diagnostics

159

160

```python

161

import numpy as np

162

import spreg

163

from libpysal import weights

164

165

# Spatial TSLS

166

n = 49 # 7x7 grid

167

x = np.random.randn(n, 1)

168

z = np.random.randn(n, 1) # instrument

169

w = weights.lat2W(7, 7) # spatial weights

170

171

# Endogenous variable

172

yend = 1.5 * z + np.random.randn(n, 1)

173

174

# Dependent variable with spatial structure

175

y = np.random.randn(n, 1)

176

177

# TSLS with Anselin-Kelejian test

178

spatial_tsls = spreg.TSLS(y, x, yend, z, w=w, spat_diag=True,

179

name_y='y', name_x=['x1'],

180

name_yend=['yend'], name_q=['instrument'])

181

182

print(spatial_tsls.summary)

183

print("Anselin-Kelejian test:", spatial_tsls.ak_test)

184

185

if spatial_tsls.ak_test['p-value'] < 0.05:

186

print("Spatial dependence detected in TSLS residuals")

187

```

188

189

### TSLS with SLX Specification

190

191

```python

192

import numpy as np

193

import spreg

194

from libpysal import weights

195

196

# TSLS with spatial lag of X

197

n = 100

198

x = np.random.randn(n, 2)

199

z = np.random.randn(n, 1)

200

w = weights.KNN.from_array(np.random.randn(n, 2), k=5)

201

202

# Endogenous variable

203

yend = 2 * z + np.random.randn(n, 1)

204

205

# Dependent variable

206

y = 1 + x.sum(axis=1, keepdims=True) + 0.8 * yend + np.random.randn(n, 1)

207

208

# Include spatial lags of exogenous variables

209

slx_tsls = spreg.TSLS(y, x, yend, z, w=w, slx_lags=1, slx_vars='All',

210

name_y='y', name_x=['x1', 'x2'],

211

name_yend=['yend'], name_q=['instrument'])

212

213

print(slx_tsls.summary)

214

print("Includes spatial lags of X variables")

215

```

216

217

### TSLS with Robust Standard Errors

218

219

```python

220

import numpy as np

221

import spreg

222

223

# TSLS with heteroskedasticity-robust standard errors

224

n = 100

225

x = np.random.randn(n, 2)

226

z = np.random.randn(n, 2) # two instruments

227

yend = np.random.randn(n, 1)

228

y = np.random.randn(n, 1)

229

230

# White-robust TSLS

231

robust_tsls = spreg.TSLS(y, x, yend, z, robust='white',

232

name_y='y', name_x=['x1', 'x2'],

233

name_yend=['yend'], name_q=['z1', 'z2'])

234

235

print(robust_tsls.summary)

236

print("Uses White-robust standard errors")

237

```

238

239

### Regime-Based TSLS

240

241

```python

242

import numpy as np

243

import spreg

244

245

# TSLS with regimes

246

n = 150

247

x = np.random.randn(n, 2)

248

z = np.random.randn(n, 2)

249

yend = np.random.randn(n, 1)

250

y = np.random.randn(n, 1)

251

regimes = np.random.choice(['North', 'South', 'East'], n)

252

253

# TSLS allowing coefficients to vary by regime

254

regime_tsls = spreg.TSLS(y, x, yend, z, regimes=regimes,

255

constant_regi='many', cols2regi='all',

256

name_y='y', name_x=['x1', 'x2'],

257

name_yend=['yend'], name_q=['z1', 'z2'],

258

name_regimes='region')

259

260

print(regime_tsls.summary)

261

print("Coefficients vary by regime")

262

print("Chow test:", regime_tsls.chow)

263

```

264

265

## Key Diagnostic Tests

266

267

### Endogeneity Testing

268

- `dwh`: Durbin-Wu-Hausman test for endogeneity

269

- Tests whether OLS and TSLS estimates differ significantly

270

- Significant result indicates endogeneity is present

271

272

### Spatial Dependence in TSLS

273

- `ak_test`: Anselin-Kelejian test for spatial dependence in TSLS residuals

274

- Robust to heteroskedasticity and endogeneity

275

- Significant result suggests spatial error dependence

276

277

### Model Fit

278

- `pr2`: Pseudo R-squared for TSLS models

279

- Cannot use standard R-squared due to two-stage estimation

280

- Measures explained variation accounting for instrumentation

281

282

## Instrument Quality Guidelines

283

284

### Instrument Relevance

285

- Instruments must be strongly correlated with endogenous variables

286

- Check first-stage F-statistics (weak instruments if F < 10)

287

- Use multiple instruments when available for overidentification tests

288

289

### Instrument Exogeneity

290

- Instruments must be uncorrelated with structural error

291

- Cannot be directly tested, requires economic reasoning

292

- Overidentification tests can detect some violations

293

294

### Identification Requirements

295

- Need at least as many instruments as endogenous variables

296

- More instruments than endogenous variables allows overidentification testing

297

- Quality is more important than quantity

298

299

## Model Selection Strategy

300

301

1. **Identify endogenous variables** through economic theory and testing

302

2. **Find valid instruments** that are relevant and exogenous

303

3. **Estimate TSLS model** and check Durbin-Wu-Hausman test

304

4. **Test for spatial dependence** using Anselin-Kelejian test if spatial data

305

5. **Consider spatial error models** if spatial dependence detected

306

6. **Use robust standard errors** if heteroskedasticity suspected

307

7. **Apply regime analysis** if parameters vary systematically across groups