tessl/pypi-spreg

Spatial econometric regression models for analyzing geographically-related data interactions.

Overall
score

87%

Overview

Eval results

Files

OLS Models

Name: tessl/pypi-spreg
Author: tessl

Ordinary least squares regression with comprehensive spatial and non-spatial diagnostic capabilities. spreg provides both base OLS estimation and full diagnostic models with extensive testing options.

Capabilities

Base OLS Estimation

Core OLS estimation without diagnostics, providing essential regression coefficients and variance-covariance matrices with optional robust standard error corrections.

class BaseOLS:
    def __init__(self, y, x, robust=None, gwk=None, sig2n_k=False):
        """
        Ordinary least squares estimation (no diagnostics or constant added).

        Parameters:
        - y (array): nx1 dependent variable
        - x (array): nxk independent variables, excluding constant
        - robust (str, optional): 'white' for White correction, 'hac' for HAC correction
        - gwk (pysal W object, optional): Kernel spatial weights for HAC estimation
        - sig2n_k (bool): If True, use n-k for sigma^2 estimation; if False, use n

        Attributes:
        - betas (array): kx1 estimated coefficients
        - u (array): nx1 residuals
        - predy (array): nx1 predicted values
        - vm (array): kxk variance-covariance matrix
        - sig2 (float): Sigma squared
        - n (int): Number of observations
        - k (int): Number of parameters
        """

Full OLS with Diagnostics

Complete OLS implementation with spatial and non-spatial diagnostic tests, supporting SLX specifications and regime-based analysis.

class OLS:
    def __init__(self, y, x, w=None, robust=None, gwk=None, sig2n_k=False,
                 nonspat_diag=True, spat_diag=False, moran=False, 
                 white_test=False, vif=False, slx_lags=0, slx_vars='All',
                 regimes=None, vm=False, constant_regi='one', cols2regi='all',
                 regime_err_sep=False, cores=False, name_y=None, name_x=None,
                 name_w=None, name_ds=None, latex=False):
        """
        Ordinary least squares with extensive diagnostics.

        Parameters:
        - y (array): nx1 dependent variable
        - x (array): nxk independent variables (constant added automatically)
        - w (pysal W object, optional): Spatial weights for spatial diagnostics
        - robust (str, optional): 'white' or 'hac' for robust standard errors
        - gwk (pysal W object, optional): Kernel weights for HAC estimation
        - sig2n_k (bool): Use n-k for sigma^2 estimation
        - nonspat_diag (bool): Compute non-spatial diagnostics (default True)
        - spat_diag (bool): Compute spatial diagnostics (requires w)
        - moran (bool): Compute Moran's I test on residuals
        - white_test (bool): Compute White's heteroskedasticity test
        - vif (bool): Compute variance inflation factors
        - slx_lags (int): Number of spatial lags of X to include
        - slx_vars (str/list): Variables to be spatially lagged ('All' or list)
        - regimes (list/Series, optional): Regime identifier for observations
        - vm (bool): Include variance-covariance matrix in output
        - constant_regi (str): 'one' (constant across regimes) or 'many'
        - cols2regi (str/list): Variables that vary by regime ('all' or list)
        - regime_err_sep (bool): Run separate regressions for each regime
        - cores (bool): Use multiprocessing for regime estimation
        - name_y, name_x, name_w, name_ds (str): Variable and dataset names
        - latex (bool): Format output for LaTeX

        Attributes:
        - All BaseOLS attributes plus:
        - r2 (float): R-squared
        - ar2 (float): Adjusted R-squared
        - f_stat (tuple): F-statistic (value, p-value)
        - t_stat (list): t-statistics with p-values for each coefficient
        - jarque_bera (dict): Jarque-Bera normality test results
        - breusch_pagan (dict): Breusch-Pagan heteroskedasticity test
        - white (dict): White heteroskedasticity test (if white_test=True)
        - koenker_bassett (dict): Koenker-Bassett test results
        - lm_error (dict): LM test for spatial error (if spat_diag=True)
        - lm_lag (dict): LM test for spatial lag (if spat_diag=True)
        - rlm_error (dict): Robust LM test for spatial error
        - rlm_lag (dict): Robust LM test for spatial lag
        - lm_sarma (dict): LM test for SARMA specification
        - moran_res (dict): Moran's I test on residuals (if moran=True)
        - vif (dict): Variance inflation factors (if vif=True)
        - summary (str): Comprehensive formatted results
        - output (DataFrame): Formatted results table
        """

Usage Examples

Basic OLS Regression

import numpy as np
import spreg
from libpysal import weights

# Prepare data
n = 100
y = np.random.randn(n, 1)
x = np.random.randn(n, 3)

# Basic OLS without diagnostics
base_ols = spreg.BaseOLS(y, x)
print("Coefficients:", base_ols.betas.flatten())
print("R-squared would need manual calculation")

# Full OLS with non-spatial diagnostics
ols_model = spreg.OLS(y, x, nonspat_diag=True, name_y='y', 
                      name_x=['x1', 'x2', 'x3'])
print(ols_model.summary)
print("R-squared:", ols_model.r2)
print("F-statistic:", ols_model.f_stat)

OLS with Spatial Diagnostics

import numpy as np
import spreg
from libpysal import weights

# Create spatial data
n = 49  # 7x7 grid
y = np.random.randn(n, 1)
x = np.random.randn(n, 2)
w = weights.lat2W(7, 7)  # 7x7 lattice weights

# OLS with spatial diagnostics
spatial_ols = spreg.OLS(y, x, w=w, spat_diag=True, moran=True, 
                        name_y='y', name_x=['x1', 'x2'])

print(spatial_ols.summary)
print("LM Error test:", spatial_ols.lm_error)
print("LM Lag test:", spatial_ols.lm_lag)
print("Moran's I on residuals:", spatial_ols.moran_res)

# Check if spatial dependence is detected
if spatial_ols.lm_error['p-value'] < 0.05:
    print("Spatial error dependence detected")
if spatial_ols.lm_lag['p-value'] < 0.05:
    print("Spatial lag dependence detected")

OLS with SLX Specification

import numpy as np
import spreg
from libpysal import weights

# Spatial lag of X (SLX) model
n = 100
y = np.random.randn(n, 1)
x = np.random.randn(n, 2)
w = weights.KNN.from_array(np.random.randn(n, 2), k=5)

# Include spatial lags of X variables
slx_model = spreg.OLS(y, x, w=w, slx_lags=1, slx_vars='All',
                      spat_diag=True, name_y='y', name_x=['x1', 'x2'])

print(slx_model.summary)
print("Number of coefficients (includes spatial lags):", slx_model.k)

OLS with Robust Standard Errors

import numpy as np
import spreg

# OLS with White robust standard errors
n = 100
y = np.random.randn(n, 1)
x = np.random.randn(n, 2)

# White correction for heteroskedasticity
white_ols = spreg.OLS(y, x, robust='white', nonspat_diag=True,
                      name_y='y', name_x=['x1', 'x2'])

print(white_ols.summary)
print("Uses White-corrected standard errors")

# HAC correction requires spatial weights kernel
from libpysal import weights
w_kernel = weights.DistanceBand.from_array(np.random.randn(n, 2), 
                                          threshold=1.0, binary=False)
hac_ols = spreg.OLS(y, x, robust='hac', gwk=w_kernel, 
                    name_y='y', name_x=['x1', 'x2'])
print("Uses HAC-corrected standard errors")

Regime-Based OLS

import numpy as np
import spreg

# OLS with regimes
n = 100
y = np.random.randn(n, 1)
x = np.random.randn(n, 2)
regimes = np.random.choice(['A', 'B', 'C'], n)

# Different intercepts and slopes by regime
regime_ols = spreg.OLS(y, x, regimes=regimes, constant_regi='many',
                       cols2regi='all', name_y='y', name_x=['x1', 'x2'],
                       name_regimes='region')

print(regime_ols.summary)
print("Number of regimes:", regime_ols.nr)
print("Chow test results:", regime_ols.chow)

# Separate regression for each regime
separate_ols = spreg.OLS(y, x, regimes=regimes, regime_err_sep=True,
                         name_y='y', name_x=['x1', 'x2'])
print("Individual regime results:", separate_ols.multi.keys())

Common Diagnostic Interpretations

R-squared and Model Fit

r2: Proportion of variance explained by the model
ar2: Adjusted R-squared, penalized for number of parameters
f_stat: Overall model significance test

Heteroskedasticity Tests

breusch_pagan: Tests for heteroskedasticity related to fitted values
white: General heteroskedasticity test (if requested)
koenker_bassett: Studentized version of Breusch-Pagan

Spatial Dependence Tests

lm_error: Tests for spatial error dependence
lm_lag: Tests for spatial lag dependence
rlm_error, rlm_lag: Robust versions accounting for local misspecification
lm_sarma: Joint test for both error and lag dependence
moran_res: Moran's I test on regression residuals

Multicollinearity

vif: Variance inflation factors for detecting multicollinearity

A VIF > 10 typically indicates problematic multicollinearity.

Model Selection Guidelines

Start with basic OLS with non-spatial diagnostics
Add spatial diagnostics if working with spatial data
Check for spatial dependence:
- If LM Error is significant → consider spatial error model
- If LM Lag is significant → consider spatial lag model
- If both significant → use robust tests to distinguish
Check for heteroskedasticity: Use robust standard errors if detected
Consider SLX specification for spatially-lagged independent variables
Use regime models when parameters vary systematically across groups

Install with Tessl CLI

npx tessl i tessl/pypi-spreg