tessl/pypi-spreg

Spatial econometric regression models for analyzing geographically-related data interactions.

Overall
score

87%

Overview

Eval results

Files

Spatial Error Models

Name: tessl/pypi-spreg
Author: tessl

GMM estimation of spatial error models with options for heteroskedasticity assumptions, endogenous variables, and combined spatial lag-error specifications (SARAR models).

Capabilities

Heteroskedastic Spatial Error Models

GMM spatial error models that allow for heteroskedasticity, using the Arraiz et al. methodology for robust estimation.

class GM_Error_Het:
    def __init__(self, y, x, w, max_iter=1, epsilon=0.0000001, step1c=False, 
                 inv_method='power_exp', hard_bound=False, vm=False, name_y=None,
                 name_x=None, name_w=None, name_ds=None, latex=False):
        """
        GMM spatial error model with heteroskedasticity.

        Parameters:
        - y (array): nx1 dependent variable
        - x (array): nxk independent variables (constant added automatically)
        - w (sparse matrix): nxn spatial weights matrix (sparse format)
        - max_iter (int): Maximum iterations for steps 2a and 2b (default 1)
        - epsilon (float): Convergence criterion (default 1e-7)
        - step1c (bool): Include Step 1c from Arraiz et al. methodology
        - inv_method (str): 'power_exp' (default) or 'true_inv' for matrix inversion
        - hard_bound (bool): Raise exception if lambda outside [-1,1]
        - vm (bool): Include variance-covariance matrix in output
        - name_y, name_x, name_w, name_ds (str): Variable and dataset names
        - latex (bool): Format output for LaTeX

        Attributes:
        - betas (array): kx1 estimated coefficients (includes lambda)
        - u (array): nx1 residuals
        - predy (array): nx1 predicted values
        - e_filtered (array): nx1 spatially filtered residuals
        - vm (array): kxk variance-covariance matrix (if requested)
        - sig2 (float): Sigma squared
        - pr2 (float): Pseudo R-squared
        - iteration (int): Number of iterations performed
        - iter_stop (str): Convergence criterion reached
        - summary (str): Formatted results
        - output (DataFrame): Results table
        """

class GM_Endog_Error_Het:
    def __init__(self, y, x, yend, q, w, max_iter=1, epsilon=0.0000001, 
                 step1c=False, inv_method='power_exp', hard_bound=False, 
                 vm=False, name_y=None, name_x=None, name_yend=None, 
                 name_q=None, name_w=None, name_ds=None, latex=False):
        """
        GMM spatial error model with heteroskedasticity and endogenous variables.

        Parameters:
        - y (array): nx1 dependent variable
        - x (array): nxk exogenous independent variables
        - yend (array): nxp endogenous variables
        - q (array): nxq external instruments
        - w (sparse matrix): nxn spatial weights matrix
        - Additional parameters same as GM_Error_Het

        Attributes:
        - All GM_Error_Het attributes plus:
        - z (array): Combined exogenous and endogenous variables
        - h (array): All instruments (x and q combined)
        """

class GM_Combo_Het:
    def __init__(self, y, x, yend, q, w, w_lags=1, lag_q=True, max_iter=1,
                 epsilon=0.0000001, step1c=False, inv_method='power_exp',
                 hard_bound=False, vm=False, name_y=None, name_x=None, 
                 name_yend=None, name_q=None, name_w=None, name_ds=None, 
                 latex=False):
        """
        GMM spatial lag and error model (SARAR) with heteroskedasticity.

        Parameters:
        - y (array): nx1 dependent variable
        - x (array): nxk exogenous independent variables
        - yend (array): nxp endogenous variables (should include Wy for SARAR)
        - q (array): nxq external instruments
        - w (sparse matrix): nxn spatial weights matrix
        - w_lags (int): Orders of W to include as instruments (default 1)
        - lag_q (bool): Include spatial lags of additional instruments
        - Additional parameters same as GM_Error_Het

        Attributes:
        - All GM_Endog_Error_Het attributes plus:
        - rho (float): Spatial lag parameter (coefficient on Wy)
        - Contains both rho (lag) and lambda (error) parameters in betas
        """

Homoskedastic Spatial Error Models

GMM spatial error models assuming homoskedasticity, using the Drukker et al. methodology for efficient estimation.

class GM_Error_Hom:
    def __init__(self, y, x, w, hard_bound=False, vm=False, name_y=None,
                 name_x=None, name_w=None, name_ds=None, latex=False):
        """
        GMM spatial error model assuming homoskedasticity.

        Parameters:
        - y (array): nx1 dependent variable
        - x (array): nxk independent variables (constant added automatically)
        - w (sparse matrix): nxn spatial weights matrix
        - hard_bound (bool): Raise exception if lambda outside [-1,1]
        - vm (bool): Include variance-covariance matrix
        - name_y, name_x, name_w, name_ds (str): Variable and dataset names
        - latex (bool): LaTeX formatting

        Attributes:
        - betas (array): kx1 estimated coefficients (includes lambda)
        - u (array): nx1 residuals
        - predy (array): nx1 predicted values
        - e_filtered (array): nx1 spatially filtered residuals
        - vm (array): kxk variance-covariance matrix (if requested)
        - sig2 (float): Sigma squared
        - pr2 (float): Pseudo R-squared
        - summary (str): Formatted results
        - output (DataFrame): Results table
        """

class GM_Endog_Error_Hom:
    def __init__(self, y, x, yend, q, w, hard_bound=False, vm=False, 
                 name_y=None, name_x=None, name_yend=None, name_q=None,
                 name_w=None, name_ds=None, latex=False):
        """
        GMM spatial error model with homoskedasticity and endogenous variables.
        
        Parameters and attributes similar to GM_Endog_Error_Het but with 
        homoskedasticity assumption for more efficient estimation.
        """

class GM_Combo_Hom:
    def __init__(self, y, x, yend, q, w, w_lags=1, lag_q=True, hard_bound=False,
                 vm=False, name_y=None, name_x=None, name_yend=None, 
                 name_q=None, name_w=None, name_ds=None, latex=False):
        """
        GMM spatial lag and error model (SARAR) assuming homoskedasticity.
        
        Parameters and attributes similar to GM_Combo_Het but with
        homoskedasticity assumption.
        """

Wrapper Classes

Convenient wrapper classes that automatically select appropriate spatial error models based on specification.

class GMM_Error:
    def __init__(self, y, x, w, yend=None, q=None, estimator='het', 
                 add_wy=False, slx_lags=0, slx_vars='All', vm=False,
                 name_y=None, name_x=None, name_yend=None, name_q=None,
                 name_w=None, name_ds=None, latex=False, **kwargs):
        """
        Comprehensive wrapper for GMM spatial error models.

        Parameters:
        - y, x, w: Standard regression variables and spatial weights
        - yend (array, optional): Endogenous variables
        - q (array, optional): External instruments
        - estimator (str): 'het' (heteroskedastic), 'hom' (homoskedastic), 
                          or 'kp98' (Kelejian-Prucha 1998)
        - add_wy (bool): Include spatial lag of y (creates SARAR model)
        - slx_lags (int): Number of spatial lags of X to include
        - slx_vars (str/list): Variables to spatially lag
        - Additional naming and formatting parameters
        - **kwargs: Additional parameters passed to underlying estimator

        The wrapper automatically instantiates the appropriate model class
        based on the estimator choice and presence of endogenous variables.
        """

Usage Examples

Basic Spatial Error Model

import numpy as np
import spreg
from libpysal import weights

# Generate spatial data
n = 49  # 7x7 grid
x = np.random.randn(n, 2)
w = weights.lat2W(7, 7)
w_sparse = w.sparse

# Create spatial error structure
lambda_true = 0.5
e = np.random.randn(n, 1)
# Spatial error: v = λWv + e, so v = (I - λW)^(-1)e
I_lW_inv = np.linalg.inv(np.eye(n) - lambda_true * w.full()[0])
v = I_lW_inv @ e

# Dependent variable with spatial error
y = 1 + 2 * x[:, 0:1] + 3 * x[:, 1:2] + v

# Estimate spatial error model (heteroskedastic)
error_model = spreg.GM_Error_Het(y, x, w_sparse, name_y='y', 
                                 name_x=['x1', 'x2'])

print(error_model.summary)
print(f"Estimated lambda: {error_model.betas[-1][0]:.3f} (true: {lambda_true})")
print(f"Pseudo R-squared: {error_model.pr2:.3f}")

Spatial Error Model with Homoskedasticity

import numpy as np
import spreg
from libpysal import weights

# Same data setup as above
n = 49
x = np.random.randn(n, 2)
w = weights.lat2W(7, 7)
y = np.random.randn(n, 1)  # simplified for demonstration

# Homoskedastic spatial error model (more efficient if assumption holds)
hom_error = spreg.GM_Error_Hom(y, x, w.sparse, name_y='y', 
                               name_x=['x1', 'x2'])

print(hom_error.summary)
print("Assumes homoskedastic errors for efficiency")

Spatial Error with Endogenous Variables

import numpy as np
import spreg
from libpysal import weights

# Data with endogeneity and spatial error
n = 100
x = np.random.randn(n, 2)
z = np.random.randn(n, 2)  # instruments
w = weights.KNN.from_array(np.random.randn(n, 2), k=5)

# Endogenous variable
yend = 1.5 * z[:, 0:1] + 0.8 * z[:, 1:2] + np.random.randn(n, 1)

# Dependent variable with endogeneity and spatial error
error_term = np.random.randn(n, 1)
y = 1 + x[:, 0:1] + 2 * x[:, 1:2] + 1.2 * yend + error_term

# Spatial error model with endogenous variables
endog_error = spreg.GM_Endog_Error_Het(y, x, yend, z, w.sparse,
                                       name_y='y', name_x=['x1', 'x2'],
                                       name_yend=['yend'], 
                                       name_q=['z1', 'z2'])

print(endog_error.summary)
print("Handles both endogeneity and spatial error dependence")

SARAR Model (Spatial Lag and Error)

import numpy as np
import spreg
from libpysal import weights
from spreg.utils import lag_spatial

# SARAR model: y = ρWy + Xβ + u, u = λWu + ε
n = 100
x = np.random.randn(n, 2)
w = weights.KNN.from_array(np.random.randn(n, 2), k=5)

# Create Wy as endogenous variable
y_temp = np.random.randn(n, 1)
wy = lag_spatial(w, y_temp)

# Use spatial lags of X as instruments
wx1 = lag_spatial(w, x[:, 0:1])
wx2 = lag_spatial(w, x[:, 1:2])
q = np.hstack([wx1, wx2])  # instruments

# Final y includes spatial lag
y = 1 + 0.4 * wy + x[:, 0:1] + 2 * x[:, 1:2] + np.random.randn(n, 1)

# SARAR model (spatial lag + spatial error)
sarar_model = spreg.GM_Combo_Het(y, x, wy, q, w.sparse, w_lags=1,
                                 name_y='y', name_x=['x1', 'x2'], 
                                 name_yend=['W_y'], name_q=['W_x1', 'W_x2'])

print(sarar_model.summary)
print("Estimates both spatial lag (rho) and spatial error (lambda) parameters")
print(f"Rho (spatial lag): estimated from coefficient on W_y")
print(f"Lambda (spatial error): {sarar_model.betas[-1][0]:.3f}")

Using the GMM_Error Wrapper

import numpy as np
import spreg
from libpysal import weights

# Data setup
n = 100
x = np.random.randn(n, 2)
y = np.random.randn(n, 1)
w = weights.KNN.from_array(np.random.randn(n, 2), k=5)

# Use wrapper for automatic model selection
# Heteroskedastic spatial error
het_wrapper = spreg.GMM_Error(y, x, w, estimator='het', 
                              name_y='y', name_x=['x1', 'x2'])

# Homoskedastic spatial error  
hom_wrapper = spreg.GMM_Error(y, x, w, estimator='hom',
                              name_y='y', name_x=['x1', 'x2'])

# SARAR model using wrapper
sarar_wrapper = spreg.GMM_Error(y, x, w, add_wy=True, estimator='het',
                                name_y='y', name_x=['x1', 'x2'])

print("Wrapper automatically selects appropriate model class")

Convergence and Iteration Control

import numpy as np
import spreg
from libpysal import weights

# Control iteration for heteroskedastic models
n = 100
x = np.random.randn(n, 2)
y = np.random.randn(n, 1)
w = weights.KNN.from_array(np.random.randn(n, 2), k=5)

# Multiple iterations for better convergence
multi_iter = spreg.GM_Error_Het(y, x, w.sparse, max_iter=3, 
                                epsilon=1e-8, step1c=True,
                                name_y='y', name_x=['x1', 'x2'])

print(multi_iter.summary)
print(f"Converged after {multi_iter.iteration} iterations")
print(f"Convergence criterion: {multi_iter.iter_stop}")

# Alternative inversion method for numerical stability
true_inv = spreg.GM_Error_Het(y, x, w.sparse, inv_method='true_inv',
                              name_y='y', name_x=['x1', 'x2'])
print("Uses true matrix inversion instead of power expansion")

Model Selection Guidelines

Heteroskedastic vs Homoskedastic

Use heteroskedastic models (GM_Error_Het) when:
- Error variance is not constant across observations
- Robust estimation is preferred (default choice)
- Working with diverse spatial units (e.g., different sized regions)
Use homoskedastic models (GM_Error_Hom) when:
- Confident that error variance is constant
- Seeking more efficient estimation
- Working with regular spatial grids

Basic vs Endogenous vs SARAR

Basic spatial error (GM_Error_*): Pure spatial error dependence
Endogenous spatial error (GM_Endog_Error_*): Spatial error + endogenous variables
SARAR models (GM_Combo_*): Both spatial lag and spatial error dependence

Iteration and Convergence

max_iter: Start with 1, increase if convergence issues
epsilon: Default 1e-7 usually sufficient
step1c: Include for Arraiz et al. full methodology
inv_method: Use 'true_inv' if power expansion fails

Instrument Selection for SARAR

Use spatial lags of X variables (Wx) as instruments for Wy
Higher-order spatial lags (W²x, W³x) for additional instruments
Ensure instruments are relevant (strong correlation with Wy)

Diagnostic Interpretation

Spatial Parameters

Lambda (λ): Spatial error parameter, should be in [-1,1]
Positive λ indicates positive spatial error correlation
λ near ±1 may indicate model misspecification

Model Fit

Pseudo R-squared: Cannot use standard R² due to spatial transformation
Compare across spatial models with same data

Convergence

Check iteration and iter_stop attributes
Non-convergence may indicate identification problems
Try different starting values or iteration controls

Install with Tessl CLI

npx tessl i tessl/pypi-spreg

tessl/pypi-spreg