0
# Bandwidth Selection
1
2
Automatic bandwidth selection methods for optimal kernel density estimation without manual parameter tuning. These methods analyze data distribution to determine the bandwidth that minimizes estimation error.
3
4
## Capabilities
5
6
### Improved Sheather-Jones Method
7
8
Advanced bandwidth selection method using plug-in estimation with improved accuracy over traditional methods. Recommended default choice for most applications.
9
10
```python { .api }
11
def improved_sheather_jones(data, weights=None):
12
"""
13
Improved Sheather-Jones bandwidth selection method.
14
15
Uses plug-in approach with improved functional estimation for
16
optimal bandwidth selection in kernel density estimation.
17
18
Parameters:
19
- data: array-like, shape (obs, dims), input data for bandwidth estimation
20
- weights: array-like or None, optional weights for data points
21
22
Returns:
23
- float: Optimal bandwidth value
24
25
Raises:
26
- ValueError: If data is empty or has invalid shape
27
"""
28
```
29
30
**Usage Example:**
31
32
```python
33
import numpy as np
34
from KDEpy import FFTKDE
35
from KDEpy.bw_selection import improved_sheather_jones
36
37
# Sample data
38
data = np.random.gamma(2, 1, 1000).reshape(-1, 1)
39
40
# Calculate optimal bandwidth
41
optimal_bw = improved_sheather_jones(data)
42
print(f"Optimal bandwidth: {optimal_bw:.4f}")
43
44
# Use in KDE
45
kde = FFTKDE(bw=optimal_bw).fit(data)
46
x, y = kde.evaluate()
47
48
# Or use directly in constructor
49
kde_auto = FFTKDE(bw='ISJ').fit(data) # Same result
50
```
51
52
### Scott's Rule
53
54
Simple bandwidth selection based on data standard deviation and sample size. Fast computation with reasonable results for most distributions.
55
56
```python { .api }
57
def scotts_rule(data, weights=None):
58
"""
59
Scott's rule for bandwidth selection.
60
61
Computes bandwidth as 1.06 * std * n^(-1/5) where std is the
62
standard deviation and n is the sample size.
63
64
Parameters:
65
- data: array-like, shape (obs, dims), input data for bandwidth estimation
66
- weights: array-like or None, optional weights for data points
67
68
Returns:
69
- float: Bandwidth estimate using Scott's rule
70
71
Raises:
72
- ValueError: If data is empty or has invalid shape
73
"""
74
```
75
76
**Usage Example:**
77
78
```python
79
import numpy as np
80
from KDEpy import TreeKDE
81
from KDEpy.bw_selection import scotts_rule
82
83
# Multi-modal data
84
data1 = np.random.normal(-2, 0.5, 500)
85
data2 = np.random.normal(2, 0.8, 500)
86
data = np.concatenate([data1, data2]).reshape(-1, 1)
87
88
# Scott's rule bandwidth
89
scott_bw = scotts_rule(data)
90
print(f"Scott's bandwidth: {scott_bw:.4f}")
91
92
# Apply to KDE
93
kde = TreeKDE(bw=scott_bw).fit(data)
94
x, y = kde.evaluate()
95
96
# Or use string identifier
97
kde_auto = TreeKDE(bw='scott').fit(data)
98
```
99
100
### Silverman's Rule
101
102
Classic bandwidth selection rule similar to Scott's but with different scaling factor. Works well for normal-like distributions.
103
104
```python { .api }
105
def silvermans_rule(data, weights=None):
106
"""
107
Silverman's rule for bandwidth selection.
108
109
Computes bandwidth using Silverman's rule of thumb:
110
0.9 * min(std, IQR/1.34) * n^(-1/5)
111
112
Parameters:
113
- data: array-like, shape (obs, 1), input data (1D only)
114
- weights: array-like or None, optional weights (currently ignored)
115
116
Returns:
117
- float: Bandwidth estimate using Silverman's rule
118
119
Raises:
120
- ValueError: If data is not 1-dimensional or empty
121
122
Note: Currently only supports 1D data, weights are ignored
123
"""
124
```
125
126
**Usage Example:**
127
128
```python
129
import numpy as np
130
from KDEpy import NaiveKDE
131
from KDEpy.bw_selection import silvermans_rule
132
133
# 1D data (required for Silverman's rule)
134
data = np.random.lognormal(0, 1, 800)
135
136
# Silverman's rule bandwidth
137
silverman_bw = silvermans_rule(data.reshape(-1, 1))
138
print(f"Silverman's bandwidth: {silverman_bw:.4f}")
139
140
# Use in KDE estimation
141
kde = NaiveKDE(bw=silverman_bw).fit(data)
142
x, y = kde.evaluate()
143
144
# String identifier usage
145
kde_auto = NaiveKDE(bw='silverman').fit(data)
146
```
147
148
## Using Bandwidth Methods
149
150
### In KDE Constructors
151
152
All bandwidth selection methods can be used via string identifiers:
153
154
```python
155
from KDEpy import FFTKDE, TreeKDE, NaiveKDE
156
157
# String identifiers for automatic selection
158
kde_isj = FFTKDE(bw='ISJ') # Improved Sheather-Jones
159
kde_scott = TreeKDE(bw='scott') # Scott's rule
160
kde_silver = NaiveKDE(bw='silverman') # Silverman's rule
161
162
# Fit and evaluate
163
kde_isj.fit(data)
164
x, y = kde_isj.evaluate()
165
```
166
167
### Direct Function Calls
168
169
Calculate bandwidth values explicitly for inspection or custom usage:
170
171
```python
172
from KDEpy.bw_selection import improved_sheather_jones, scotts_rule, silvermans_rule
173
174
# Calculate bandwidth values
175
isj_bw = improved_sheather_jones(data)
176
scott_bw = scotts_rule(data)
177
silver_bw = silvermans_rule(data)
178
179
print(f"ISJ: {isj_bw:.4f}")
180
print(f"Scott: {scott_bw:.4f}")
181
print(f"Silverman: {silver_bw:.4f}")
182
183
# Use explicit values
184
kde = FFTKDE(bw=isj_bw).fit(data)
185
```
186
187
### With Weighted Data
188
189
ISJ and Scott's rule support weighted data:
190
191
```python
192
import numpy as np
193
from KDEpy.bw_selection import improved_sheather_jones, scotts_rule
194
195
# Weighted data
196
data = np.random.randn(1000).reshape(-1, 1)
197
weights = np.random.exponential(1, 1000)
198
199
# Weighted bandwidth selection
200
isj_weighted = improved_sheather_jones(data, weights=weights)
201
scott_weighted = scotts_rule(data, weights=weights)
202
203
# Note: Silverman's rule currently ignores weights
204
```
205
206
## Method Comparison
207
208
### When to Use Each Method
209
210
**Improved Sheather-Jones (ISJ)**:
211
- Recommended default for most applications
212
- More accurate than simple rules of thumb
213
- Handles various distribution shapes well
214
- Supports weighted data
215
- Computational cost higher than simple rules
216
217
**Scott's Rule**:
218
- Fast computation, good for large datasets
219
- Works well for approximately normal distributions
220
- Simple and interpretable
221
- Supports weighted data
222
- May not be optimal for multi-modal or skewed data
223
224
**Silverman's Rule**:
225
- Classic method, widely used reference
226
- Similar to Scott's rule but different scaling
227
- Only supports 1D data currently
228
- Fast computation
229
- Best for normal-like distributions
230
231
### Performance Characteristics
232
233
```python
234
import numpy as np
235
import time
236
from KDEpy.bw_selection import improved_sheather_jones, scotts_rule, silvermans_rule
237
238
# Large dataset for timing comparison
239
large_data = np.random.randn(10000).reshape(-1, 1)
240
241
# Time ISJ method
242
start = time.time()
243
isj_bw = improved_sheather_jones(large_data)
244
isj_time = time.time() - start
245
246
# Time Scott's rule
247
start = time.time()
248
scott_bw = scotts_rule(large_data)
249
scott_time = time.time() - start
250
251
# Time Silverman's rule
252
start = time.time()
253
silver_bw = silvermans_rule(large_data)
254
silver_time = time.time() - start
255
256
print(f"ISJ: {isj_bw:.4f} ({isj_time:.4f}s)")
257
print(f"Scott: {scott_bw:.4f} ({scott_time:.4f}s)")
258
print(f"Silverman: {silver_bw:.4f} ({silver_time:.4f}s)")
259
```
260
261
## Types
262
263
```python { .api }
264
from typing import Optional, Union
265
import numpy as np
266
267
# Input types
268
DataType = Union[np.ndarray, list] # Shape (obs, dims)
269
WeightsType = Optional[Union[np.ndarray, list]] # Shape (obs,) or None
270
271
# Function signatures
272
BandwidthFunction = callable[[DataType, WeightsType], float]
273
274
# Available methods mapping
275
BandwidthMethods = dict[str, BandwidthFunction]
276
AVAILABLE_METHODS = {
277
"ISJ": improved_sheather_jones,
278
"scott": scotts_rule,
279
"silverman": silvermans_rule
280
}
281
```