Tessl Tile for pypi/kdepy@1.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

bandwidth-selection.md index.md kde-estimators.md kernel-functions.md utilities.md

bandwidth-selection.mddocs/

0
# Bandwidth Selection
1

2
Automatic bandwidth selection methods for optimal kernel density estimation without manual parameter tuning. These methods analyze data distribution to determine the bandwidth that minimizes estimation error.
3

4
## Capabilities
5

6
### Improved Sheather-Jones Method
7

8
Advanced bandwidth selection method using plug-in estimation with improved accuracy over traditional methods. Recommended default choice for most applications.
9

10
```python { .api }
11
def improved_sheather_jones(data, weights=None):
12
    """
13
    Improved Sheather-Jones bandwidth selection method.
14
    
15
    Uses plug-in approach with improved functional estimation for
16
    optimal bandwidth selection in kernel density estimation.
17
    
18
    Parameters:
19
    - data: array-like, shape (obs, dims), input data for bandwidth estimation
20
    - weights: array-like or None, optional weights for data points
21
    
22
    Returns:
23
    - float: Optimal bandwidth value
24
    
25
    Raises:
26
    - ValueError: If data is empty or has invalid shape
27
    """
28
```
29

30
**Usage Example:**
31

32
```python
33
import numpy as np
34
from KDEpy import FFTKDE
35
from KDEpy.bw_selection import improved_sheather_jones
36

37
# Sample data
38
data = np.random.gamma(2, 1, 1000).reshape(-1, 1)
39

40
# Calculate optimal bandwidth
41
optimal_bw = improved_sheather_jones(data)
42
print(f"Optimal bandwidth: {optimal_bw:.4f}")
43

44
# Use in KDE
45
kde = FFTKDE(bw=optimal_bw).fit(data)
46
x, y = kde.evaluate()
47

48
# Or use directly in constructor
49
kde_auto = FFTKDE(bw='ISJ').fit(data)  # Same result
50
```
51

52
### Scott's Rule
53

54
Simple bandwidth selection based on data standard deviation and sample size. Fast computation with reasonable results for most distributions.
55

56
```python { .api }
57
def scotts_rule(data, weights=None):
58
    """
59
    Scott's rule for bandwidth selection.
60
    
61
    Computes bandwidth as 1.06 * std * n^(-1/5) where std is the
62
    standard deviation and n is the sample size.
63
    
64
    Parameters:
65
    - data: array-like, shape (obs, dims), input data for bandwidth estimation
66
    - weights: array-like or None, optional weights for data points
67
    
68
    Returns:
69
    - float: Bandwidth estimate using Scott's rule
70
    
71
    Raises:
72
    - ValueError: If data is empty or has invalid shape
73
    """
74
```
75

76
**Usage Example:**
77

78
```python
79
import numpy as np
80
from KDEpy import TreeKDE
81
from KDEpy.bw_selection import scotts_rule
82

83
# Multi-modal data
84
data1 = np.random.normal(-2, 0.5, 500)
85
data2 = np.random.normal(2, 0.8, 500)
86
data = np.concatenate([data1, data2]).reshape(-1, 1)
87

88
# Scott's rule bandwidth
89
scott_bw = scotts_rule(data)
90
print(f"Scott's bandwidth: {scott_bw:.4f}")
91

92
# Apply to KDE
93
kde = TreeKDE(bw=scott_bw).fit(data)
94
x, y = kde.evaluate()
95

96
# Or use string identifier
97
kde_auto = TreeKDE(bw='scott').fit(data)
98
```
99

100
### Silverman's Rule
101

102
Classic bandwidth selection rule similar to Scott's but with different scaling factor. Works well for normal-like distributions.
103

104
```python { .api }
105
def silvermans_rule(data, weights=None):
106
    """
107
    Silverman's rule for bandwidth selection.
108
    
109
    Computes bandwidth using Silverman's rule of thumb:
110
    0.9 * min(std, IQR/1.34) * n^(-1/5)
111
    
112
    Parameters:
113
    - data: array-like, shape (obs, 1), input data (1D only)
114
    - weights: array-like or None, optional weights (currently ignored)
115
    
116
    Returns:
117
    - float: Bandwidth estimate using Silverman's rule
118
    
119
    Raises:
120
    - ValueError: If data is not 1-dimensional or empty
121
    
122
    Note: Currently only supports 1D data, weights are ignored
123
    """
124
```
125

126
**Usage Example:**
127

128
```python
129
import numpy as np
130
from KDEpy import NaiveKDE
131
from KDEpy.bw_selection import silvermans_rule
132

133
# 1D data (required for Silverman's rule)
134
data = np.random.lognormal(0, 1, 800)
135

136
# Silverman's rule bandwidth
137
silverman_bw = silvermans_rule(data.reshape(-1, 1))
138
print(f"Silverman's bandwidth: {silverman_bw:.4f}")
139

140
# Use in KDE estimation
141
kde = NaiveKDE(bw=silverman_bw).fit(data)
142
x, y = kde.evaluate()
143

144
# String identifier usage
145
kde_auto = NaiveKDE(bw='silverman').fit(data)
146
```
147

148
## Using Bandwidth Methods
149

150
### In KDE Constructors
151

152
All bandwidth selection methods can be used via string identifiers:
153

154
```python
155
from KDEpy import FFTKDE, TreeKDE, NaiveKDE
156

157
# String identifiers for automatic selection
158
kde_isj = FFTKDE(bw='ISJ')        # Improved Sheather-Jones
159
kde_scott = TreeKDE(bw='scott')   # Scott's rule  
160
kde_silver = NaiveKDE(bw='silverman')  # Silverman's rule
161

162
# Fit and evaluate
163
kde_isj.fit(data)
164
x, y = kde_isj.evaluate()
165
```
166

167
### Direct Function Calls
168

169
Calculate bandwidth values explicitly for inspection or custom usage:
170

171
```python
172
from KDEpy.bw_selection import improved_sheather_jones, scotts_rule, silvermans_rule
173

174
# Calculate bandwidth values
175
isj_bw = improved_sheather_jones(data)
176
scott_bw = scotts_rule(data)
177
silver_bw = silvermans_rule(data)
178

179
print(f"ISJ: {isj_bw:.4f}")
180
print(f"Scott: {scott_bw:.4f}")  
181
print(f"Silverman: {silver_bw:.4f}")
182

183
# Use explicit values
184
kde = FFTKDE(bw=isj_bw).fit(data)
185
```
186

187
### With Weighted Data
188

189
ISJ and Scott's rule support weighted data:
190

191
```python
192
import numpy as np
193
from KDEpy.bw_selection import improved_sheather_jones, scotts_rule
194

195
# Weighted data
196
data = np.random.randn(1000).reshape(-1, 1)
197
weights = np.random.exponential(1, 1000)
198

199
# Weighted bandwidth selection
200
isj_weighted = improved_sheather_jones(data, weights=weights)
201
scott_weighted = scotts_rule(data, weights=weights)
202

203
# Note: Silverman's rule currently ignores weights
204
```
205

206
## Method Comparison
207

208
### When to Use Each Method
209

210
**Improved Sheather-Jones (ISJ)**:
211
- Recommended default for most applications
212
- More accurate than simple rules of thumb
213
- Handles various distribution shapes well
214
- Supports weighted data
215
- Computational cost higher than simple rules
216

217
**Scott's Rule**:
218
- Fast computation, good for large datasets
219
- Works well for approximately normal distributions
220
- Simple and interpretable
221
- Supports weighted data
222
- May not be optimal for multi-modal or skewed data
223

224
**Silverman's Rule**:
225
- Classic method, widely used reference
226
- Similar to Scott's rule but different scaling
227
- Only supports 1D data currently
228
- Fast computation
229
- Best for normal-like distributions
230

231
### Performance Characteristics
232

233
```python
234
import numpy as np
235
import time
236
from KDEpy.bw_selection import improved_sheather_jones, scotts_rule, silvermans_rule
237

238
# Large dataset for timing comparison
239
large_data = np.random.randn(10000).reshape(-1, 1)
240

241
# Time ISJ method
242
start = time.time()
243
isj_bw = improved_sheather_jones(large_data)
244
isj_time = time.time() - start
245

246
# Time Scott's rule
247
start = time.time()  
248
scott_bw = scotts_rule(large_data)
249
scott_time = time.time() - start
250

251
# Time Silverman's rule
252
start = time.time()
253
silver_bw = silvermans_rule(large_data)  
254
silver_time = time.time() - start
255

256
print(f"ISJ: {isj_bw:.4f} ({isj_time:.4f}s)")
257
print(f"Scott: {scott_bw:.4f} ({scott_time:.4f}s)")  
258
print(f"Silverman: {silver_bw:.4f} ({silver_time:.4f}s)")
259
```
260

261
## Types
262

263
```python { .api }
264
from typing import Optional, Union
265
import numpy as np
266

267
# Input types
268
DataType = Union[np.ndarray, list]      # Shape (obs, dims)
269
WeightsType = Optional[Union[np.ndarray, list]]  # Shape (obs,) or None
270

271
# Function signatures
272
BandwidthFunction = callable[[DataType, WeightsType], float]
273

274
# Available methods mapping
275
BandwidthMethods = dict[str, BandwidthFunction]
276
AVAILABLE_METHODS = {
277
    "ISJ": improved_sheather_jones,
278
    "scott": scotts_rule, 
279
    "silverman": silvermans_rule
280
}
281
```

Version

Tile

Files

bandwidth-selection.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

bandwidth-selection.mddocs/