CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-spreg

Spatial econometric regression models for analyzing geographically-related data interactions.

Overall
score

87%

Overview
Eval results
Files

task.mdevals/scenario-3/

Regional Housing Price Analysis with Separate Regime Modeling

Build a spatial econometric analysis tool that examines housing price determinants across different geographic regions, where each region may have fundamentally different market dynamics requiring independent regression models.

Context

You are analyzing housing prices in a metropolitan area divided into distinct regions (urban core, suburban, and rural zones). Economic theory suggests that the relationship between housing characteristics and prices may differ fundamentally across these regions - not just in coefficient magnitudes, but in the entire error structure and spatial dependencies. You need to estimate completely separate regression models for each region rather than constraining them to share error variance or spatial parameters.

Requirements

Your implementation must:

  1. Load and prepare the data

    • Read a CSV file containing housing price data with columns: price (dependent variable), sqft, bedrooms, age (independent variables), lat, lon (coordinates), and region (categorical: 0, 1, or 2)
    • Convert the data into appropriate numpy arrays for analysis
  2. Create spatial weights matrix

    • Construct a spatial weights matrix based on geographic coordinates using k-nearest neighbors (k=5)
    • Row-standardize the weights matrix
  3. Estimate separate regime models

    • Run ordinary least squares regression models where each region gets its own completely independent regression
    • Allow each region to have its own coefficients, error variance, and spatial structure
    • Include a constant term in each regression
  4. Compare with pooled model

    • Also estimate a single pooled OLS model (ignoring regions) for comparison
    • Store both the separate regime results and pooled results
  5. Extract and report results

    • For the regime-separated model, extract and display:
      • The number of separate regressions run
      • The coefficient estimates for each regime
      • The R-squared value for each regime
    • For the pooled model, display the overall R-squared
    • Print a summary showing the model comparison

Test Cases

  • Given a dataset with 300 observations (100 per region), when running separate regime regressions, the output should indicate 3 separate regressions were estimated @test

  • Given sample housing data where region 0 is urban (high density, small lots) and region 2 is rural (low density, large lots), the coefficient on sqft should be different across regimes, demonstrating heterogeneous spatial markets @test

  • Given the same dataset, when comparing the regime-separated model to a pooled model, the regime-separated approach should allow each region to have independent error structures @test

Implementation

@generates

API

import numpy as np
import pandas as pd

def load_housing_data(filepath):
    """
    Load housing data from CSV file.

    Parameters
    ----------
    filepath : str
        Path to CSV file containing columns: price, sqft, bedrooms, age, lat, lon, region

    Returns
    -------
    tuple
        (y, X, coords, regimes) where:
        - y: (n,1) array of prices
        - X: (n,k) array of independent variables
        - coords: (n,2) array of lat/lon coordinates
        - regimes: (n,) array of region identifiers
    """
    pass

def create_spatial_weights(coords, k=5):
    """
    Create a row-standardized k-nearest neighbors spatial weights matrix.

    Parameters
    ----------
    coords : array-like
        (n,2) array of coordinates
    k : int
        Number of nearest neighbors

    Returns
    -------
    W : libpysal.weights.W
        Spatial weights matrix
    """
    pass

def estimate_regime_separated_model(y, X, regimes, w):
    """
    Estimate OLS with separate regressions per regime.

    Parameters
    ----------
    y : array-like
        (n,1) dependent variable
    X : array-like
        (n,k) independent variables
    regimes : array-like
        (n,) regime identifiers
    w : libpysal.weights.W
        Spatial weights matrix

    Returns
    -------
    model : spreg model object
        Fitted regime model with separate regressions
    """
    pass

def estimate_pooled_model(y, X, w):
    """
    Estimate pooled OLS model ignoring regimes.

    Parameters
    ----------
    y : array-like
        (n,1) dependent variable
    X : array-like
        (n,k) independent variables
    w : libpysal.weights.W
        Spatial weights matrix

    Returns
    -------
    model : spreg model object
        Fitted pooled OLS model
    """
    pass

def compare_models(regime_model, pooled_model):
    """
    Compare regime-separated and pooled models.

    Parameters
    ----------
    regime_model : spreg model object
        Fitted regime model
    pooled_model : spreg model object
        Fitted pooled model

    Returns
    -------
    dict
        Comparison metrics including number of regimes, R-squared values, etc.
    """
    pass

Dependencies { .dependencies }

spreg { .dependency }

Provides spatial econometric regression models with regime separation capabilities.

@satisfied-by

libpysal { .dependency }

Provides spatial weights matrix construction and manipulation.

@satisfied-by

pandas { .dependency }

Provides data loading and manipulation support.

@satisfied-by

numpy { .dependency }

Provides numerical array operations.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/pypi-spreg

tile.json