Kernel Density Estimation in Python with three high-performance algorithms through a unified API.
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Automatic bandwidth selection methods for optimal kernel density estimation without manual parameter tuning. These methods analyze data distribution to determine the bandwidth that minimizes estimation error.
Advanced bandwidth selection method using plug-in estimation with improved accuracy over traditional methods. Recommended default choice for most applications.
def improved_sheather_jones(data, weights=None):
"""
Improved Sheather-Jones bandwidth selection method.
Uses plug-in approach with improved functional estimation for
optimal bandwidth selection in kernel density estimation.
Parameters:
- data: array-like, shape (obs, dims), input data for bandwidth estimation
- weights: array-like or None, optional weights for data points
Returns:
- float: Optimal bandwidth value
Raises:
- ValueError: If data is empty or has invalid shape
"""Usage Example:
import numpy as np
from KDEpy import FFTKDE
from KDEpy.bw_selection import improved_sheather_jones
# Sample data
data = np.random.gamma(2, 1, 1000).reshape(-1, 1)
# Calculate optimal bandwidth
optimal_bw = improved_sheather_jones(data)
print(f"Optimal bandwidth: {optimal_bw:.4f}")
# Use in KDE
kde = FFTKDE(bw=optimal_bw).fit(data)
x, y = kde.evaluate()
# Or use directly in constructor
kde_auto = FFTKDE(bw='ISJ').fit(data) # Same resultSimple bandwidth selection based on data standard deviation and sample size. Fast computation with reasonable results for most distributions.
def scotts_rule(data, weights=None):
"""
Scott's rule for bandwidth selection.
Computes bandwidth as 1.06 * std * n^(-1/5) where std is the
standard deviation and n is the sample size.
Parameters:
- data: array-like, shape (obs, dims), input data for bandwidth estimation
- weights: array-like or None, optional weights for data points
Returns:
- float: Bandwidth estimate using Scott's rule
Raises:
- ValueError: If data is empty or has invalid shape
"""Usage Example:
import numpy as np
from KDEpy import TreeKDE
from KDEpy.bw_selection import scotts_rule
# Multi-modal data
data1 = np.random.normal(-2, 0.5, 500)
data2 = np.random.normal(2, 0.8, 500)
data = np.concatenate([data1, data2]).reshape(-1, 1)
# Scott's rule bandwidth
scott_bw = scotts_rule(data)
print(f"Scott's bandwidth: {scott_bw:.4f}")
# Apply to KDE
kde = TreeKDE(bw=scott_bw).fit(data)
x, y = kde.evaluate()
# Or use string identifier
kde_auto = TreeKDE(bw='scott').fit(data)Classic bandwidth selection rule similar to Scott's but with different scaling factor. Works well for normal-like distributions.
def silvermans_rule(data, weights=None):
"""
Silverman's rule for bandwidth selection.
Computes bandwidth using Silverman's rule of thumb:
0.9 * min(std, IQR/1.34) * n^(-1/5)
Parameters:
- data: array-like, shape (obs, 1), input data (1D only)
- weights: array-like or None, optional weights (currently ignored)
Returns:
- float: Bandwidth estimate using Silverman's rule
Raises:
- ValueError: If data is not 1-dimensional or empty
Note: Currently only supports 1D data, weights are ignored
"""Usage Example:
import numpy as np
from KDEpy import NaiveKDE
from KDEpy.bw_selection import silvermans_rule
# 1D data (required for Silverman's rule)
data = np.random.lognormal(0, 1, 800)
# Silverman's rule bandwidth
silverman_bw = silvermans_rule(data.reshape(-1, 1))
print(f"Silverman's bandwidth: {silverman_bw:.4f}")
# Use in KDE estimation
kde = NaiveKDE(bw=silverman_bw).fit(data)
x, y = kde.evaluate()
# String identifier usage
kde_auto = NaiveKDE(bw='silverman').fit(data)All bandwidth selection methods can be used via string identifiers:
from KDEpy import FFTKDE, TreeKDE, NaiveKDE
# String identifiers for automatic selection
kde_isj = FFTKDE(bw='ISJ') # Improved Sheather-Jones
kde_scott = TreeKDE(bw='scott') # Scott's rule
kde_silver = NaiveKDE(bw='silverman') # Silverman's rule
# Fit and evaluate
kde_isj.fit(data)
x, y = kde_isj.evaluate()Calculate bandwidth values explicitly for inspection or custom usage:
from KDEpy.bw_selection import improved_sheather_jones, scotts_rule, silvermans_rule
# Calculate bandwidth values
isj_bw = improved_sheather_jones(data)
scott_bw = scotts_rule(data)
silver_bw = silvermans_rule(data)
print(f"ISJ: {isj_bw:.4f}")
print(f"Scott: {scott_bw:.4f}")
print(f"Silverman: {silver_bw:.4f}")
# Use explicit values
kde = FFTKDE(bw=isj_bw).fit(data)ISJ and Scott's rule support weighted data:
import numpy as np
from KDEpy.bw_selection import improved_sheather_jones, scotts_rule
# Weighted data
data = np.random.randn(1000).reshape(-1, 1)
weights = np.random.exponential(1, 1000)
# Weighted bandwidth selection
isj_weighted = improved_sheather_jones(data, weights=weights)
scott_weighted = scotts_rule(data, weights=weights)
# Note: Silverman's rule currently ignores weightsImproved Sheather-Jones (ISJ):
Scott's Rule:
Silverman's Rule:
import numpy as np
import time
from KDEpy.bw_selection import improved_sheather_jones, scotts_rule, silvermans_rule
# Large dataset for timing comparison
large_data = np.random.randn(10000).reshape(-1, 1)
# Time ISJ method
start = time.time()
isj_bw = improved_sheather_jones(large_data)
isj_time = time.time() - start
# Time Scott's rule
start = time.time()
scott_bw = scotts_rule(large_data)
scott_time = time.time() - start
# Time Silverman's rule
start = time.time()
silver_bw = silvermans_rule(large_data)
silver_time = time.time() - start
print(f"ISJ: {isj_bw:.4f} ({isj_time:.4f}s)")
print(f"Scott: {scott_bw:.4f} ({scott_time:.4f}s)")
print(f"Silverman: {silver_bw:.4f} ({silver_time:.4f}s)")from typing import Optional, Union
import numpy as np
# Input types
DataType = Union[np.ndarray, list] # Shape (obs, dims)
WeightsType = Optional[Union[np.ndarray, list]] # Shape (obs,) or None
# Function signatures
BandwidthFunction = callable[[DataType, WeightsType], float]
# Available methods mapping
BandwidthMethods = dict[str, BandwidthFunction]
AVAILABLE_METHODS = {
"ISJ": improved_sheather_jones,
"scott": scotts_rule,
"silverman": silvermans_rule
}Install with Tessl CLI
npx tessl i tessl/pypi-kdepy