0
# KDE Estimators
1
2
Three high-performance kernel density estimation algorithms with unified API, each optimized for different use cases while providing consistent interface for fitting data and evaluating probability densities.
3
4
## Capabilities
5
6
### NaiveKDE
7
8
Direct computation KDE with maximum flexibility for bandwidth, weights, norms, and grids. Suitable for datasets under 1000 points where flexibility is more important than speed.
9
10
```python { .api }
11
class NaiveKDE:
12
def __init__(self, kernel="gaussian", bw=1, norm=2):
13
"""
14
Initialize naive KDE estimator.
15
16
Parameters:
17
- kernel: str or callable, kernel function name or custom function
18
- bw: float, str, or array-like, bandwidth specification
19
- norm: float, p-norm for distance computation (default: 2)
20
"""
21
22
def fit(self, data, weights=None):
23
"""
24
Fit KDE to data.
25
26
Parameters:
27
- data: array-like, shape (obs,) or (obs, dims), input data
28
- weights: array-like or None, optional weights for data points
29
30
Returns:
31
- self: NaiveKDE instance for method chaining
32
"""
33
34
def evaluate(self, grid_points=None):
35
"""
36
Evaluate KDE on grid points.
37
38
Parameters:
39
- grid_points: int, tuple, array-like, or None, grid specification
40
41
Returns:
42
- tuple (x, y) for auto-generated grid, or array y for user grid
43
"""
44
45
def __call__(self, grid_points=None):
46
"""
47
Callable interface (equivalent to evaluate).
48
49
Parameters:
50
- grid_points: int, tuple, array-like, or None, grid specification
51
52
Returns:
53
- tuple (x, y) for auto-generated grid, or array y for user grid
54
"""
55
```
56
57
**Usage Example:**
58
59
```python
60
import numpy as np
61
from KDEpy import NaiveKDE
62
63
# Sample data
64
data = np.random.randn(500)
65
weights = np.random.exponential(1, 500)
66
67
# Variable bandwidth per point
68
bw_array = np.random.uniform(0.1, 1.0, 500)
69
70
# Flexible KDE with custom parameters
71
kde = NaiveKDE(kernel='triweight', bw=bw_array, norm=1.5)
72
kde.fit(data, weights=weights)
73
x, y = kde.evaluate()
74
75
# Custom grid evaluation
76
custom_grid = np.linspace(-4, 4, 200)
77
y_custom = kde.evaluate(custom_grid)
78
```
79
80
### TreeKDE
81
82
Tree-based KDE using k-d tree data structure for efficient nearest neighbor queries. Provides good balance between speed and flexibility for medium-sized datasets.
83
84
```python { .api }
85
class TreeKDE:
86
def __init__(self, kernel="gaussian", bw=1, norm=2.0):
87
"""
88
Initialize tree-based KDE estimator.
89
90
Parameters:
91
- kernel: str or callable, kernel function name or custom function
92
- bw: float, str, or array-like, bandwidth specification
93
- norm: float, p-norm for distance computation (default: 2.0)
94
"""
95
96
def fit(self, data, weights=None):
97
"""
98
Fit KDE to data and build k-d tree structure.
99
100
Parameters:
101
- data: array-like, shape (obs,) or (obs, dims), input data
102
- weights: array-like or None, optional weights for data points
103
104
Returns:
105
- self: TreeKDE instance for method chaining
106
"""
107
108
def evaluate(self, grid_points=None, eps=10e-4):
109
"""
110
Evaluate KDE using tree-based queries.
111
112
Parameters:
113
- grid_points: int, tuple, array-like, or None, grid specification
114
- eps: float, numerical precision parameter (default: 10e-4)
115
116
Returns:
117
- tuple (x, y) for auto-generated grid, or array y for user grid
118
"""
119
120
def __call__(self, grid_points=None, eps=10e-4):
121
"""
122
Callable interface (equivalent to evaluate).
123
124
Parameters:
125
- grid_points: int, tuple, array-like, or None, grid specification
126
- eps: float, numerical precision parameter (default: 10e-4)
127
128
Returns:
129
- tuple (x, y) for auto-generated grid, or array y for user grid
130
"""
131
```
132
133
**Usage Example:**
134
135
```python
136
import numpy as np
137
from KDEpy import TreeKDE
138
139
# Multi-dimensional data
140
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 1000)
141
142
# Tree-based KDE with automatic bandwidth
143
kde = TreeKDE(kernel='gaussian', bw='ISJ')
144
kde.fit(data)
145
146
# Evaluate on 2D grid
147
x, y = kde.evaluate((64, 64)) # 64x64 grid
148
149
# High precision evaluation
150
y_precise = kde.evaluate(grid_points, eps=1e-6)
151
```
152
153
### FFTKDE
154
155
FFT-based convolution KDE for ultra-fast computation on equidistant grids. Scales to millions of data points but requires constant bandwidth and equidistant evaluation grids.
156
157
```python { .api }
158
class FFTKDE:
159
def __init__(self, kernel="gaussian", bw=1, norm=2):
160
"""
161
Initialize FFT-based KDE estimator.
162
163
Parameters:
164
- kernel: str or callable, kernel function name or custom function
165
- bw: float or str, bandwidth (must be constant) or selection method
166
- norm: float, p-norm for distance computation (default: 2)
167
"""
168
169
def fit(self, data, weights=None):
170
"""
171
Fit KDE to data for FFT computation.
172
173
Parameters:
174
- data: array-like, shape (obs,) or (obs, dims), input data
175
- weights: array-like or None, optional weights for data points
176
177
Returns:
178
- self: FFTKDE instance for method chaining
179
"""
180
181
def evaluate(self, grid_points=None):
182
"""
183
Evaluate KDE using FFT convolution on equidistant grid.
184
185
Parameters:
186
- grid_points: int, tuple, or None, grid specification (must be equidistant)
187
188
Returns:
189
- tuple (x, y) for auto-generated grid, or array y for user grid
190
191
Note: User-supplied grids must be equidistant for FFT computation
192
"""
193
194
def __call__(self, grid_points=None):
195
"""
196
Callable interface (equivalent to evaluate).
197
198
Parameters:
199
- grid_points: int, tuple, or None, grid specification (must be equidistant)
200
201
Returns:
202
- tuple (x, y) for auto-generated grid, or array y for user grid
203
204
Note: User-supplied grids must be equidistant for FFT computation
205
"""
206
```
207
208
**Usage Example:**
209
210
```python
211
import numpy as np
212
from KDEpy import FFTKDE
213
214
# Large dataset
215
data = np.random.randn(100000)
216
217
# Ultra-fast FFT-based KDE
218
kde = FFTKDE(kernel='gaussian', bw='scott')
219
kde.fit(data)
220
221
# Fast evaluation on fine grid
222
x, y = kde.evaluate(2048) # 2048 equidistant points
223
224
# Weighted data
225
weights = np.random.exponential(1, 100000)
226
kde_weighted = FFTKDE(bw=0.5).fit(data, weights)
227
x, y = kde_weighted.evaluate()
228
```
229
230
## Common Usage Patterns
231
232
### Method Chaining
233
234
All KDE estimators support method chaining for concise usage:
235
236
```python
237
# Concise single-line KDE
238
x, y = FFTKDE(bw='ISJ').fit(data).evaluate(512)
239
240
# With weights
241
x, y = NaiveKDE(kernel='epa').fit(data, weights).evaluate()
242
243
# Custom evaluation
244
result = TreeKDE(bw=1.5).fit(data).evaluate(custom_grid)
245
```
246
247
### Callable Interface
248
249
KDE instances can be called directly (equivalent to evaluate):
250
251
```python
252
kde = TreeKDE().fit(data)
253
y = kde(grid_points) # Same as kde.evaluate(grid_points)
254
```
255
256
### Grid Specifications
257
258
All estimators accept flexible grid specifications:
259
260
```python
261
# Integer: number of equidistant points
262
x, y = kde.evaluate(256)
263
264
# Tuple: points per dimension for multi-dimensional data
265
x, y = kde.evaluate((64, 64, 32))
266
267
# Array: explicit grid points
268
grid = np.linspace(-3, 3, 100)
269
y = kde.evaluate(grid)
270
271
# None: automatic grid generation
272
x, y = kde.evaluate()
273
```
274
275
## Types
276
277
```python { .api }
278
from typing import Union, Optional, Sequence, Tuple
279
import numpy as np
280
281
# Constructor parameter types
282
KernelSpec = Union[str, callable]
283
BandwidthSpec = Union[float, str, np.ndarray, Sequence]
284
NormSpec = float
285
286
# Method parameter types
287
DataSpec = Union[np.ndarray, Sequence]
288
WeightsSpec = Optional[Union[np.ndarray, Sequence]]
289
GridSpec = Optional[Union[int, Tuple[int, ...], np.ndarray, Sequence]]
290
291
# Return types
292
GridResult = Tuple[np.ndarray, np.ndarray] # (x, y)
293
ValueResult = np.ndarray # y values only
294
EvaluateResult = Union[GridResult, ValueResult]
295
```