or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bandwidth-selection.mdindex.mdkde-estimators.mdkernel-functions.mdutilities.md

bandwidth-selection.mddocs/

0

# Bandwidth Selection

1

2

Automatic bandwidth selection methods for optimal kernel density estimation without manual parameter tuning. These methods analyze data distribution to determine the bandwidth that minimizes estimation error.

3

4

## Capabilities

5

6

### Improved Sheather-Jones Method

7

8

Advanced bandwidth selection method using plug-in estimation with improved accuracy over traditional methods. Recommended default choice for most applications.

9

10

```python { .api }

11

def improved_sheather_jones(data, weights=None):

12

"""

13

Improved Sheather-Jones bandwidth selection method.

14

15

Uses plug-in approach with improved functional estimation for

16

optimal bandwidth selection in kernel density estimation.

17

18

Parameters:

19

- data: array-like, shape (obs, dims), input data for bandwidth estimation

20

- weights: array-like or None, optional weights for data points

21

22

Returns:

23

- float: Optimal bandwidth value

24

25

Raises:

26

- ValueError: If data is empty or has invalid shape

27

"""

28

```

29

30

**Usage Example:**

31

32

```python

33

import numpy as np

34

from KDEpy import FFTKDE

35

from KDEpy.bw_selection import improved_sheather_jones

36

37

# Sample data

38

data = np.random.gamma(2, 1, 1000).reshape(-1, 1)

39

40

# Calculate optimal bandwidth

41

optimal_bw = improved_sheather_jones(data)

42

print(f"Optimal bandwidth: {optimal_bw:.4f}")

43

44

# Use in KDE

45

kde = FFTKDE(bw=optimal_bw).fit(data)

46

x, y = kde.evaluate()

47

48

# Or use directly in constructor

49

kde_auto = FFTKDE(bw='ISJ').fit(data) # Same result

50

```

51

52

### Scott's Rule

53

54

Simple bandwidth selection based on data standard deviation and sample size. Fast computation with reasonable results for most distributions.

55

56

```python { .api }

57

def scotts_rule(data, weights=None):

58

"""

59

Scott's rule for bandwidth selection.

60

61

Computes bandwidth as 1.06 * std * n^(-1/5) where std is the

62

standard deviation and n is the sample size.

63

64

Parameters:

65

- data: array-like, shape (obs, dims), input data for bandwidth estimation

66

- weights: array-like or None, optional weights for data points

67

68

Returns:

69

- float: Bandwidth estimate using Scott's rule

70

71

Raises:

72

- ValueError: If data is empty or has invalid shape

73

"""

74

```

75

76

**Usage Example:**

77

78

```python

79

import numpy as np

80

from KDEpy import TreeKDE

81

from KDEpy.bw_selection import scotts_rule

82

83

# Multi-modal data

84

data1 = np.random.normal(-2, 0.5, 500)

85

data2 = np.random.normal(2, 0.8, 500)

86

data = np.concatenate([data1, data2]).reshape(-1, 1)

87

88

# Scott's rule bandwidth

89

scott_bw = scotts_rule(data)

90

print(f"Scott's bandwidth: {scott_bw:.4f}")

91

92

# Apply to KDE

93

kde = TreeKDE(bw=scott_bw).fit(data)

94

x, y = kde.evaluate()

95

96

# Or use string identifier

97

kde_auto = TreeKDE(bw='scott').fit(data)

98

```

99

100

### Silverman's Rule

101

102

Classic bandwidth selection rule similar to Scott's but with different scaling factor. Works well for normal-like distributions.

103

104

```python { .api }

105

def silvermans_rule(data, weights=None):

106

"""

107

Silverman's rule for bandwidth selection.

108

109

Computes bandwidth using Silverman's rule of thumb:

110

0.9 * min(std, IQR/1.34) * n^(-1/5)

111

112

Parameters:

113

- data: array-like, shape (obs, 1), input data (1D only)

114

- weights: array-like or None, optional weights (currently ignored)

115

116

Returns:

117

- float: Bandwidth estimate using Silverman's rule

118

119

Raises:

120

- ValueError: If data is not 1-dimensional or empty

121

122

Note: Currently only supports 1D data, weights are ignored

123

"""

124

```

125

126

**Usage Example:**

127

128

```python

129

import numpy as np

130

from KDEpy import NaiveKDE

131

from KDEpy.bw_selection import silvermans_rule

132

133

# 1D data (required for Silverman's rule)

134

data = np.random.lognormal(0, 1, 800)

135

136

# Silverman's rule bandwidth

137

silverman_bw = silvermans_rule(data.reshape(-1, 1))

138

print(f"Silverman's bandwidth: {silverman_bw:.4f}")

139

140

# Use in KDE estimation

141

kde = NaiveKDE(bw=silverman_bw).fit(data)

142

x, y = kde.evaluate()

143

144

# String identifier usage

145

kde_auto = NaiveKDE(bw='silverman').fit(data)

146

```

147

148

## Using Bandwidth Methods

149

150

### In KDE Constructors

151

152

All bandwidth selection methods can be used via string identifiers:

153

154

```python

155

from KDEpy import FFTKDE, TreeKDE, NaiveKDE

156

157

# String identifiers for automatic selection

158

kde_isj = FFTKDE(bw='ISJ') # Improved Sheather-Jones

159

kde_scott = TreeKDE(bw='scott') # Scott's rule

160

kde_silver = NaiveKDE(bw='silverman') # Silverman's rule

161

162

# Fit and evaluate

163

kde_isj.fit(data)

164

x, y = kde_isj.evaluate()

165

```

166

167

### Direct Function Calls

168

169

Calculate bandwidth values explicitly for inspection or custom usage:

170

171

```python

172

from KDEpy.bw_selection import improved_sheather_jones, scotts_rule, silvermans_rule

173

174

# Calculate bandwidth values

175

isj_bw = improved_sheather_jones(data)

176

scott_bw = scotts_rule(data)

177

silver_bw = silvermans_rule(data)

178

179

print(f"ISJ: {isj_bw:.4f}")

180

print(f"Scott: {scott_bw:.4f}")

181

print(f"Silverman: {silver_bw:.4f}")

182

183

# Use explicit values

184

kde = FFTKDE(bw=isj_bw).fit(data)

185

```

186

187

### With Weighted Data

188

189

ISJ and Scott's rule support weighted data:

190

191

```python

192

import numpy as np

193

from KDEpy.bw_selection import improved_sheather_jones, scotts_rule

194

195

# Weighted data

196

data = np.random.randn(1000).reshape(-1, 1)

197

weights = np.random.exponential(1, 1000)

198

199

# Weighted bandwidth selection

200

isj_weighted = improved_sheather_jones(data, weights=weights)

201

scott_weighted = scotts_rule(data, weights=weights)

202

203

# Note: Silverman's rule currently ignores weights

204

```

205

206

## Method Comparison

207

208

### When to Use Each Method

209

210

**Improved Sheather-Jones (ISJ)**:

211

- Recommended default for most applications

212

- More accurate than simple rules of thumb

213

- Handles various distribution shapes well

214

- Supports weighted data

215

- Computational cost higher than simple rules

216

217

**Scott's Rule**:

218

- Fast computation, good for large datasets

219

- Works well for approximately normal distributions

220

- Simple and interpretable

221

- Supports weighted data

222

- May not be optimal for multi-modal or skewed data

223

224

**Silverman's Rule**:

225

- Classic method, widely used reference

226

- Similar to Scott's rule but different scaling

227

- Only supports 1D data currently

228

- Fast computation

229

- Best for normal-like distributions

230

231

### Performance Characteristics

232

233

```python

234

import numpy as np

235

import time

236

from KDEpy.bw_selection import improved_sheather_jones, scotts_rule, silvermans_rule

237

238

# Large dataset for timing comparison

239

large_data = np.random.randn(10000).reshape(-1, 1)

240

241

# Time ISJ method

242

start = time.time()

243

isj_bw = improved_sheather_jones(large_data)

244

isj_time = time.time() - start

245

246

# Time Scott's rule

247

start = time.time()

248

scott_bw = scotts_rule(large_data)

249

scott_time = time.time() - start

250

251

# Time Silverman's rule

252

start = time.time()

253

silver_bw = silvermans_rule(large_data)

254

silver_time = time.time() - start

255

256

print(f"ISJ: {isj_bw:.4f} ({isj_time:.4f}s)")

257

print(f"Scott: {scott_bw:.4f} ({scott_time:.4f}s)")

258

print(f"Silverman: {silver_bw:.4f} ({silver_time:.4f}s)")

259

```

260

261

## Types

262

263

```python { .api }

264

from typing import Optional, Union

265

import numpy as np

266

267

# Input types

268

DataType = Union[np.ndarray, list] # Shape (obs, dims)

269

WeightsType = Optional[Union[np.ndarray, list]] # Shape (obs,) or None

270

271

# Function signatures

272

BandwidthFunction = callable[[DataType, WeightsType], float]

273

274

# Available methods mapping

275

BandwidthMethods = dict[str, BandwidthFunction]

276

AVAILABLE_METHODS = {

277

"ISJ": improved_sheather_jones,

278

"scott": scotts_rule,

279

"silverman": silvermans_rule

280

}

281

```