Kernel Density Estimation in Python with three high-performance algorithms through a unified API.
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Three high-performance kernel density estimation algorithms with unified API, each optimized for different use cases while providing consistent interface for fitting data and evaluating probability densities.
Direct computation KDE with maximum flexibility for bandwidth, weights, norms, and grids. Suitable for datasets under 1000 points where flexibility is more important than speed.
class NaiveKDE:
def __init__(self, kernel="gaussian", bw=1, norm=2):
"""
Initialize naive KDE estimator.
Parameters:
- kernel: str or callable, kernel function name or custom function
- bw: float, str, or array-like, bandwidth specification
- norm: float, p-norm for distance computation (default: 2)
"""
def fit(self, data, weights=None):
"""
Fit KDE to data.
Parameters:
- data: array-like, shape (obs,) or (obs, dims), input data
- weights: array-like or None, optional weights for data points
Returns:
- self: NaiveKDE instance for method chaining
"""
def evaluate(self, grid_points=None):
"""
Evaluate KDE on grid points.
Parameters:
- grid_points: int, tuple, array-like, or None, grid specification
Returns:
- tuple (x, y) for auto-generated grid, or array y for user grid
"""
def __call__(self, grid_points=None):
"""
Callable interface (equivalent to evaluate).
Parameters:
- grid_points: int, tuple, array-like, or None, grid specification
Returns:
- tuple (x, y) for auto-generated grid, or array y for user grid
"""Usage Example:
import numpy as np
from KDEpy import NaiveKDE
# Sample data
data = np.random.randn(500)
weights = np.random.exponential(1, 500)
# Variable bandwidth per point
bw_array = np.random.uniform(0.1, 1.0, 500)
# Flexible KDE with custom parameters
kde = NaiveKDE(kernel='triweight', bw=bw_array, norm=1.5)
kde.fit(data, weights=weights)
x, y = kde.evaluate()
# Custom grid evaluation
custom_grid = np.linspace(-4, 4, 200)
y_custom = kde.evaluate(custom_grid)Tree-based KDE using k-d tree data structure for efficient nearest neighbor queries. Provides good balance between speed and flexibility for medium-sized datasets.
class TreeKDE:
def __init__(self, kernel="gaussian", bw=1, norm=2.0):
"""
Initialize tree-based KDE estimator.
Parameters:
- kernel: str or callable, kernel function name or custom function
- bw: float, str, or array-like, bandwidth specification
- norm: float, p-norm for distance computation (default: 2.0)
"""
def fit(self, data, weights=None):
"""
Fit KDE to data and build k-d tree structure.
Parameters:
- data: array-like, shape (obs,) or (obs, dims), input data
- weights: array-like or None, optional weights for data points
Returns:
- self: TreeKDE instance for method chaining
"""
def evaluate(self, grid_points=None, eps=10e-4):
"""
Evaluate KDE using tree-based queries.
Parameters:
- grid_points: int, tuple, array-like, or None, grid specification
- eps: float, numerical precision parameter (default: 10e-4)
Returns:
- tuple (x, y) for auto-generated grid, or array y for user grid
"""
def __call__(self, grid_points=None, eps=10e-4):
"""
Callable interface (equivalent to evaluate).
Parameters:
- grid_points: int, tuple, array-like, or None, grid specification
- eps: float, numerical precision parameter (default: 10e-4)
Returns:
- tuple (x, y) for auto-generated grid, or array y for user grid
"""Usage Example:
import numpy as np
from KDEpy import TreeKDE
# Multi-dimensional data
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 1000)
# Tree-based KDE with automatic bandwidth
kde = TreeKDE(kernel='gaussian', bw='ISJ')
kde.fit(data)
# Evaluate on 2D grid
x, y = kde.evaluate((64, 64)) # 64x64 grid
# High precision evaluation
y_precise = kde.evaluate(grid_points, eps=1e-6)FFT-based convolution KDE for ultra-fast computation on equidistant grids. Scales to millions of data points but requires constant bandwidth and equidistant evaluation grids.
class FFTKDE:
def __init__(self, kernel="gaussian", bw=1, norm=2):
"""
Initialize FFT-based KDE estimator.
Parameters:
- kernel: str or callable, kernel function name or custom function
- bw: float or str, bandwidth (must be constant) or selection method
- norm: float, p-norm for distance computation (default: 2)
"""
def fit(self, data, weights=None):
"""
Fit KDE to data for FFT computation.
Parameters:
- data: array-like, shape (obs,) or (obs, dims), input data
- weights: array-like or None, optional weights for data points
Returns:
- self: FFTKDE instance for method chaining
"""
def evaluate(self, grid_points=None):
"""
Evaluate KDE using FFT convolution on equidistant grid.
Parameters:
- grid_points: int, tuple, or None, grid specification (must be equidistant)
Returns:
- tuple (x, y) for auto-generated grid, or array y for user grid
Note: User-supplied grids must be equidistant for FFT computation
"""
def __call__(self, grid_points=None):
"""
Callable interface (equivalent to evaluate).
Parameters:
- grid_points: int, tuple, or None, grid specification (must be equidistant)
Returns:
- tuple (x, y) for auto-generated grid, or array y for user grid
Note: User-supplied grids must be equidistant for FFT computation
"""Usage Example:
import numpy as np
from KDEpy import FFTKDE
# Large dataset
data = np.random.randn(100000)
# Ultra-fast FFT-based KDE
kde = FFTKDE(kernel='gaussian', bw='scott')
kde.fit(data)
# Fast evaluation on fine grid
x, y = kde.evaluate(2048) # 2048 equidistant points
# Weighted data
weights = np.random.exponential(1, 100000)
kde_weighted = FFTKDE(bw=0.5).fit(data, weights)
x, y = kde_weighted.evaluate()All KDE estimators support method chaining for concise usage:
# Concise single-line KDE
x, y = FFTKDE(bw='ISJ').fit(data).evaluate(512)
# With weights
x, y = NaiveKDE(kernel='epa').fit(data, weights).evaluate()
# Custom evaluation
result = TreeKDE(bw=1.5).fit(data).evaluate(custom_grid)KDE instances can be called directly (equivalent to evaluate):
kde = TreeKDE().fit(data)
y = kde(grid_points) # Same as kde.evaluate(grid_points)All estimators accept flexible grid specifications:
# Integer: number of equidistant points
x, y = kde.evaluate(256)
# Tuple: points per dimension for multi-dimensional data
x, y = kde.evaluate((64, 64, 32))
# Array: explicit grid points
grid = np.linspace(-3, 3, 100)
y = kde.evaluate(grid)
# None: automatic grid generation
x, y = kde.evaluate()from typing import Union, Optional, Sequence, Tuple
import numpy as np
# Constructor parameter types
KernelSpec = Union[str, callable]
BandwidthSpec = Union[float, str, np.ndarray, Sequence]
NormSpec = float
# Method parameter types
DataSpec = Union[np.ndarray, Sequence]
WeightsSpec = Optional[Union[np.ndarray, Sequence]]
GridSpec = Optional[Union[int, Tuple[int, ...], np.ndarray, Sequence]]
# Return types
GridResult = Tuple[np.ndarray, np.ndarray] # (x, y)
ValueResult = np.ndarray # y values only
EvaluateResult = Union[GridResult, ValueResult]Install with Tessl CLI
npx tessl i tessl/pypi-kdepy