tessl/pypi-kdepy

Kernel Density Estimation in Python with three high-performance algorithms through a unified API.

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview

Eval results

Files

KDE Estimators

Name: tessl/pypi-kdepy
Author: tessl

Three high-performance kernel density estimation algorithms with unified API, each optimized for different use cases while providing consistent interface for fitting data and evaluating probability densities.

Capabilities

NaiveKDE

Direct computation KDE with maximum flexibility for bandwidth, weights, norms, and grids. Suitable for datasets under 1000 points where flexibility is more important than speed.

class NaiveKDE:
    def __init__(self, kernel="gaussian", bw=1, norm=2):
        """
        Initialize naive KDE estimator.
        
        Parameters:
        - kernel: str or callable, kernel function name or custom function
        - bw: float, str, or array-like, bandwidth specification
        - norm: float, p-norm for distance computation (default: 2)
        """
    
    def fit(self, data, weights=None):
        """
        Fit KDE to data.
        
        Parameters:
        - data: array-like, shape (obs,) or (obs, dims), input data
        - weights: array-like or None, optional weights for data points
        
        Returns:
        - self: NaiveKDE instance for method chaining
        """
    
    def evaluate(self, grid_points=None):
        """
        Evaluate KDE on grid points.
        
        Parameters:
        - grid_points: int, tuple, array-like, or None, grid specification
        
        Returns:
        - tuple (x, y) for auto-generated grid, or array y for user grid
        """
    
    def __call__(self, grid_points=None):
        """
        Callable interface (equivalent to evaluate).
        
        Parameters:
        - grid_points: int, tuple, array-like, or None, grid specification
        
        Returns:
        - tuple (x, y) for auto-generated grid, or array y for user grid
        """

Usage Example:

import numpy as np
from KDEpy import NaiveKDE

# Sample data
data = np.random.randn(500)
weights = np.random.exponential(1, 500)

# Variable bandwidth per point
bw_array = np.random.uniform(0.1, 1.0, 500)

# Flexible KDE with custom parameters
kde = NaiveKDE(kernel='triweight', bw=bw_array, norm=1.5)
kde.fit(data, weights=weights)
x, y = kde.evaluate()

# Custom grid evaluation
custom_grid = np.linspace(-4, 4, 200)
y_custom = kde.evaluate(custom_grid)

TreeKDE

Tree-based KDE using k-d tree data structure for efficient nearest neighbor queries. Provides good balance between speed and flexibility for medium-sized datasets.

class TreeKDE:
    def __init__(self, kernel="gaussian", bw=1, norm=2.0):
        """
        Initialize tree-based KDE estimator.
        
        Parameters:
        - kernel: str or callable, kernel function name or custom function  
        - bw: float, str, or array-like, bandwidth specification
        - norm: float, p-norm for distance computation (default: 2.0)
        """
    
    def fit(self, data, weights=None):
        """
        Fit KDE to data and build k-d tree structure.
        
        Parameters:
        - data: array-like, shape (obs,) or (obs, dims), input data
        - weights: array-like or None, optional weights for data points
        
        Returns:
        - self: TreeKDE instance for method chaining
        """
    
    def evaluate(self, grid_points=None, eps=10e-4):
        """
        Evaluate KDE using tree-based queries.
        
        Parameters:
        - grid_points: int, tuple, array-like, or None, grid specification
        - eps: float, numerical precision parameter (default: 10e-4)
        
        Returns:
        - tuple (x, y) for auto-generated grid, or array y for user grid
        """
    
    def __call__(self, grid_points=None, eps=10e-4):
        """
        Callable interface (equivalent to evaluate).
        
        Parameters:
        - grid_points: int, tuple, array-like, or None, grid specification
        - eps: float, numerical precision parameter (default: 10e-4)
        
        Returns:
        - tuple (x, y) for auto-generated grid, or array y for user grid
        """

Usage Example:

import numpy as np
from KDEpy import TreeKDE

# Multi-dimensional data
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 1000)

# Tree-based KDE with automatic bandwidth
kde = TreeKDE(kernel='gaussian', bw='ISJ')
kde.fit(data)

# Evaluate on 2D grid
x, y = kde.evaluate((64, 64))  # 64x64 grid

# High precision evaluation
y_precise = kde.evaluate(grid_points, eps=1e-6)

FFTKDE

FFT-based convolution KDE for ultra-fast computation on equidistant grids. Scales to millions of data points but requires constant bandwidth and equidistant evaluation grids.

class FFTKDE:
    def __init__(self, kernel="gaussian", bw=1, norm=2):
        """
        Initialize FFT-based KDE estimator.
        
        Parameters:
        - kernel: str or callable, kernel function name or custom function
        - bw: float or str, bandwidth (must be constant) or selection method
        - norm: float, p-norm for distance computation (default: 2)
        """
    
    def fit(self, data, weights=None):
        """
        Fit KDE to data for FFT computation.
        
        Parameters:
        - data: array-like, shape (obs,) or (obs, dims), input data
        - weights: array-like or None, optional weights for data points
        
        Returns:
        - self: FFTKDE instance for method chaining
        """
    
    def evaluate(self, grid_points=None):
        """
        Evaluate KDE using FFT convolution on equidistant grid.
        
        Parameters:
        - grid_points: int, tuple, or None, grid specification (must be equidistant)
        
        Returns:
        - tuple (x, y) for auto-generated grid, or array y for user grid
        
        Note: User-supplied grids must be equidistant for FFT computation
        """
    
    def __call__(self, grid_points=None):
        """
        Callable interface (equivalent to evaluate).
        
        Parameters:
        - grid_points: int, tuple, or None, grid specification (must be equidistant)
        
        Returns:
        - tuple (x, y) for auto-generated grid, or array y for user grid
        
        Note: User-supplied grids must be equidistant for FFT computation
        """

Usage Example:

import numpy as np
from KDEpy import FFTKDE

# Large dataset
data = np.random.randn(100000)

# Ultra-fast FFT-based KDE
kde = FFTKDE(kernel='gaussian', bw='scott')
kde.fit(data)

# Fast evaluation on fine grid
x, y = kde.evaluate(2048)  # 2048 equidistant points

# Weighted data
weights = np.random.exponential(1, 100000)
kde_weighted = FFTKDE(bw=0.5).fit(data, weights)
x, y = kde_weighted.evaluate()

Common Usage Patterns

Method Chaining

All KDE estimators support method chaining for concise usage:

# Concise single-line KDE
x, y = FFTKDE(bw='ISJ').fit(data).evaluate(512)

# With weights
x, y = NaiveKDE(kernel='epa').fit(data, weights).evaluate()

# Custom evaluation
result = TreeKDE(bw=1.5).fit(data).evaluate(custom_grid)

Callable Interface

KDE instances can be called directly (equivalent to evaluate):

kde = TreeKDE().fit(data)
y = kde(grid_points)  # Same as kde.evaluate(grid_points)

Grid Specifications

All estimators accept flexible grid specifications:

# Integer: number of equidistant points
x, y = kde.evaluate(256)

# Tuple: points per dimension for multi-dimensional data  
x, y = kde.evaluate((64, 64, 32))

# Array: explicit grid points
grid = np.linspace(-3, 3, 100)
y = kde.evaluate(grid)

# None: automatic grid generation
x, y = kde.evaluate()

Types

from typing import Union, Optional, Sequence, Tuple
import numpy as np

# Constructor parameter types
KernelSpec = Union[str, callable]
BandwidthSpec = Union[float, str, np.ndarray, Sequence]
NormSpec = float

# Method parameter types  
DataSpec = Union[np.ndarray, Sequence]
WeightsSpec = Optional[Union[np.ndarray, Sequence]]
GridSpec = Optional[Union[int, Tuple[int, ...], np.ndarray, Sequence]]

# Return types
GridResult = Tuple[np.ndarray, np.ndarray]  # (x, y) 
ValueResult = np.ndarray                    # y values only
EvaluateResult = Union[GridResult, ValueResult]

Install with Tessl CLI