0
# KDEpy
1
2
A comprehensive kernel density estimation library for Python that implements three high-performance algorithms through a unified API: NaiveKDE for accurate d-dimensional data with variable bandwidth support, TreeKDE for fast tree-based computation with arbitrary grid evaluation, and FFTKDE for ultra-fast convolution-based computation on equidistant grids.
3
4
## Package Information
5
6
- **Package Name**: KDEpy
7
- **Language**: Python
8
- **Installation**: `pip install KDEpy`
9
- **Requires**: numpy>=1.14.2, scipy>=1.0.1
10
11
## Core Imports
12
13
```python
14
import KDEpy
15
```
16
17
Import specific estimators:
18
19
```python
20
from KDEpy import FFTKDE, NaiveKDE, TreeKDE
21
```
22
23
## Basic Usage
24
25
```python
26
import numpy as np
27
from KDEpy import FFTKDE
28
29
# Generate sample data
30
data = np.random.randn(1000)
31
32
# Create and fit KDE with automatic bandwidth selection
33
kde = FFTKDE(kernel='gaussian', bw='ISJ')
34
kde.fit(data)
35
36
# Evaluate on automatic grid
37
x, y = kde.evaluate()
38
39
# Or evaluate on custom grid
40
grid_points = np.linspace(-3, 3, 100)
41
y_custom = kde.evaluate(grid_points)
42
43
# Chain operations for concise usage
44
x, y = FFTKDE(bw='scott').fit(data).evaluate(256)
45
```
46
47
## Architecture
48
49
KDEpy provides three complementary algorithms optimized for different use cases:
50
51
- **NaiveKDE**: Direct computation with maximum flexibility for bandwidth, weights, norms, and grids. Suitable for <1000 data points.
52
- **TreeKDE**: k-d tree-based computation using scipy's cKDTree for efficient nearest neighbor queries. Good balance of speed and flexibility.
53
- **FFTKDE**: FFT-based convolution for ultra-fast computation on equidistant grids. Requires constant bandwidth but scales to millions of points.
54
55
All estimators inherit from BaseKDE, providing a consistent API while allowing algorithm-specific optimizations. The modular design enables easy bandwidth selection method integration and kernel function customization.
56
57
## Capabilities
58
59
### KDE Estimators
60
61
Three high-performance kernel density estimation algorithms with unified API for fitting data and evaluating probability densities.
62
63
```python { .api }
64
class NaiveKDE:
65
def __init__(self, kernel="gaussian", bw=1, norm=2): ...
66
def fit(self, data, weights=None): ...
67
def evaluate(self, grid_points=None): ...
68
def __call__(self, grid_points=None): ...
69
70
class TreeKDE:
71
def __init__(self, kernel="gaussian", bw=1, norm=2.0): ...
72
def fit(self, data, weights=None): ...
73
def evaluate(self, grid_points=None, eps=10e-4): ...
74
def __call__(self, grid_points=None): ...
75
76
class FFTKDE:
77
def __init__(self, kernel="gaussian", bw=1, norm=2): ...
78
def fit(self, data, weights=None): ...
79
def evaluate(self, grid_points=None): ...
80
def __call__(self, grid_points=None): ...
81
```
82
83
[KDE Estimators](./kde-estimators.md)
84
85
### Bandwidth Selection
86
87
Automatic bandwidth selection methods for optimal kernel density estimation without manual parameter tuning.
88
89
```python { .api }
90
def improved_sheather_jones(data, weights=None): ...
91
def scotts_rule(data, weights=None): ...
92
def silvermans_rule(data, weights=None): ...
93
```
94
95
[Bandwidth Selection](./bandwidth-selection.md)
96
97
### Kernel Functions
98
99
Built-in kernel functions with finite and infinite support for probability density estimation.
100
101
```python { .api }
102
# Available kernel names for use in KDE constructors
103
AVAILABLE_KERNELS = [
104
"gaussian", "exponential", "box", "tri", "epa",
105
"biweight", "triweight", "tricube", "cosine"
106
]
107
108
class Kernel:
109
def __init__(self, function, var=1, support=3): ...
110
def evaluate(self, x, bw=1, norm=2): ...
111
```
112
113
[Kernel Functions](./kernel-functions.md)
114
115
### Utility Functions
116
117
Helper functions for grid generation, array manipulation, and data processing in kernel density estimation workflows.
118
119
```python { .api }
120
def autogrid(data, boundary_abs=3, num_points=None, boundary_rel=0.05): ...
121
def cartesian(arrays): ...
122
def linear_binning(data, grid_points, weights=None): ...
123
```
124
125
[Utilities](./utilities.md)
126
127
## Types
128
129
```python { .api }
130
from typing import Union, Optional, Sequence
131
import numpy as np
132
133
# Data types
134
DataType = Union[np.ndarray, Sequence]
135
WeightsType = Optional[Union[np.ndarray, Sequence]]
136
GridType = Union[int, tuple, np.ndarray, Sequence]
137
138
# Bandwidth specification
139
BandwidthType = Union[
140
float, # Explicit bandwidth value
141
str, # Selection method: "ISJ", "scott", "silverman"
142
np.ndarray, # Per-point bandwidth array
143
Sequence # Per-point bandwidth sequence
144
]
145
146
# Kernel specification
147
KernelType = Union[str, callable] # Kernel name or custom function
148
149
# Return types
150
EvaluationResult = Union[
151
tuple[np.ndarray, np.ndarray], # (x, y) for auto-generated grid
152
np.ndarray # y values for user-supplied grid
153
]
154
```