0
# Nearest Neighbors
1
2
K-nearest neighbors algorithms for classification, regression, outlier detection, and manifold learning. These algorithms are based on the principle that similar data points tend to have similar labels or values.
3
4
## Classification
5
6
### KNeighborsClassifier
7
8
Classifier implementing k-nearest neighbors vote.
9
10
```python { .api }
11
from sklearn.neighbors import KNeighborsClassifier
12
13
KNeighborsClassifier(
14
n_neighbors: int = 5,
15
weights: str | callable = "uniform",
16
algorithm: str = "auto",
17
leaf_size: int = 30,
18
p: int = 2,
19
metric: str | callable = "minkowski",
20
metric_params: dict | None = None,
21
n_jobs: int | None = None
22
)
23
```
24
25
### RadiusNeighborsClassifier
26
27
Classifier implementing radius-based neighbors vote.
28
29
```python { .api }
30
from sklearn.neighbors import RadiusNeighborsClassifier
31
32
RadiusNeighborsClassifier(
33
radius: float = 1.0,
34
weights: str | callable = "uniform",
35
algorithm: str = "auto",
36
leaf_size: int = 30,
37
p: int = 2,
38
metric: str | callable = "minkowski",
39
metric_params: dict | None = None,
40
outlier_label: int | str | None = None,
41
n_jobs: int | None = None
42
)
43
```
44
45
## Regression
46
47
### KNeighborsRegressor
48
49
Regression based on k-nearest neighbors.
50
51
```python { .api }
52
from sklearn.neighbors import KNeighborsRegressor
53
54
KNeighborsRegressor(
55
n_neighbors: int = 5,
56
weights: str | callable = "uniform",
57
algorithm: str = "auto",
58
leaf_size: int = 30,
59
p: int = 2,
60
metric: str | callable = "minkowski",
61
metric_params: dict | None = None,
62
n_jobs: int | None = None
63
)
64
```
65
66
### RadiusNeighborsRegressor
67
68
Regression based on neighbors within a fixed radius.
69
70
```python { .api }
71
from sklearn.neighbors import RadiusNeighborsRegressor
72
73
RadiusNeighborsRegressor(
74
radius: float = 1.0,
75
weights: str | callable = "uniform",
76
algorithm: str = "auto",
77
leaf_size: int = 30,
78
p: int = 2,
79
metric: str | callable = "minkowski",
80
metric_params: dict | None = None,
81
n_jobs: int | None = None
82
)
83
```
84
85
## Unsupervised Learning
86
87
### NearestNeighbors
88
89
Unsupervised learner for implementing neighbor searches.
90
91
```python { .api }
92
from sklearn.neighbors import NearestNeighbors
93
94
NearestNeighbors(
95
n_neighbors: int = 5,
96
radius: float = 1.0,
97
algorithm: str = "auto",
98
leaf_size: int = 30,
99
metric: str | callable = "minkowski",
100
p: int = 2,
101
metric_params: dict | None = None,
102
n_jobs: int | None = None
103
)
104
```
105
106
### NearestCentroid
107
108
Nearest centroid classifier using class centroids.
109
110
```python { .api }
111
from sklearn.neighbors import NearestCentroid
112
113
NearestCentroid(
114
metric: str = "euclidean",
115
shrink_threshold: float | None = None
116
)
117
```
118
119
## Transformers
120
121
### KNeighborsTransformer
122
123
Transform X into a weighted graph of k nearest neighbors.
124
125
```python { .api }
126
from sklearn.neighbors import KNeighborsTransformer
127
128
KNeighborsTransformer(
129
mode: str = "connectivity",
130
n_neighbors: int = 5,
131
algorithm: str = "auto",
132
leaf_size: int = 30,
133
metric: str | callable = "minkowski",
134
p: int = 2,
135
metric_params: dict | None = None,
136
n_jobs: int | None = None
137
)
138
```
139
140
### RadiusNeighborsTransformer
141
142
Transform X into a weighted graph of neighbors within a fixed radius.
143
144
```python { .api }
145
from sklearn.neighbors import RadiusNeighborsTransformer
146
147
RadiusNeighborsTransformer(
148
mode: str = "connectivity",
149
radius: float = 1.0,
150
algorithm: str = "auto",
151
leaf_size: int = 30,
152
metric: str | callable = "minkowski",
153
p: int = 2,
154
metric_params: dict | None = None,
155
n_jobs: int | None = None
156
)
157
```
158
159
## Outlier Detection
160
161
### LocalOutlierFactor
162
163
Unsupervised outlier detection using Local Outlier Factor.
164
165
```python { .api }
166
from sklearn.neighbors import LocalOutlierFactor
167
168
LocalOutlierFactor(
169
n_neighbors: int = 20,
170
algorithm: str = "auto",
171
leaf_size: int = 30,
172
metric: str | callable = "minkowski",
173
p: int = 2,
174
metric_params: dict | None = None,
175
contamination: str | float = "auto",
176
novelty: bool = False,
177
n_jobs: int | None = None
178
)
179
```
180
181
## Dimensionality Reduction
182
183
### NeighborhoodComponentsAnalysis
184
185
Neighborhood Components Analysis for dimensionality reduction.
186
187
```python { .api }
188
from sklearn.neighbors import NeighborhoodComponentsAnalysis
189
190
NeighborhoodComponentsAnalysis(
191
n_components: int | None = None,
192
init: str | ndarray = "auto",
193
warm_start: bool = False,
194
max_iter: int = 50,
195
tol: float = 1e-5,
196
callback: callable | None = None,
197
verbose: int = 0,
198
random_state: int | RandomState | None = None
199
)
200
```
201
202
## Density Estimation
203
204
### KernelDensity
205
206
Kernel Density Estimation using various kernel functions.
207
208
```python { .api }
209
from sklearn.neighbors import KernelDensity
210
211
KernelDensity(
212
bandwidth: float | str = 1.0,
213
algorithm: str = "auto",
214
kernel: str = "gaussian",
215
metric: str | callable = "euclidean",
216
atol: float = 0,
217
rtol: float = 0,
218
breadth_first: bool = True,
219
leaf_size: int = 40,
220
metric_params: dict | None = None
221
)
222
```
223
224
## Tree Data Structures
225
226
### KDTree
227
228
K-dimensional tree for fast nearest neighbor queries.
229
230
```python { .api }
231
from sklearn.neighbors import KDTree
232
233
KDTree(
234
X: ArrayLike,
235
leaf_size: int = 10,
236
metric: str | callable = "euclidean",
237
**kwargs
238
)
239
```
240
241
### BallTree
242
243
Ball tree for fast nearest neighbor queries in high dimensions.
244
245
```python { .api }
246
from sklearn.neighbors import BallTree
247
248
BallTree(
249
X: ArrayLike,
250
leaf_size: int = 10,
251
metric: str | callable = "euclidean",
252
**kwargs
253
)
254
```
255
256
## Graph Construction Functions
257
258
### kneighbors_graph
259
260
Compute k-neighbors graph of points.
261
262
```python { .api }
263
from sklearn.neighbors import kneighbors_graph
264
265
def kneighbors_graph(
266
X: ArrayLike,
267
n_neighbors: int,
268
mode: str = "connectivity",
269
metric: str | callable = "minkowski",
270
p: int = 2,
271
metric_params: dict | None = None,
272
include_self: bool = False,
273
n_jobs: int | None = None
274
) -> csr_matrix: ...
275
```
276
277
### radius_neighbors_graph
278
279
Compute radius-based neighbors graph of points.
280
281
```python { .api }
282
from sklearn.neighbors import radius_neighbors_graph
283
284
def radius_neighbors_graph(
285
X: ArrayLike,
286
radius: float,
287
mode: str = "connectivity",
288
metric: str | callable = "minkowski",
289
p: int = 2,
290
metric_params: dict | None = None,
291
include_self: bool = False,
292
n_jobs: int | None = None
293
) -> csr_matrix: ...
294
```
295
296
## Utility Functions
297
298
### sort_graph_by_row_values
299
300
Sort sparse graph by row values in-place.
301
302
```python { .api }
303
from sklearn.neighbors import sort_graph_by_row_values
304
305
def sort_graph_by_row_values(
306
graph: csr_matrix,
307
copy: bool = False,
308
warn_when_not_sorted: bool = True
309
) -> csr_matrix: ...
310
```
311
312
## Usage Examples
313
314
### Classification
315
316
```python
317
from sklearn.neighbors import KNeighborsClassifier
318
from sklearn.datasets import load_iris
319
from sklearn.model_selection import train_test_split
320
321
# Load data
322
iris = load_iris()
323
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target)
324
325
# Train k-NN classifier
326
knn = KNeighborsClassifier(n_neighbors=3)
327
knn.fit(X_train, y_train)
328
329
# Make predictions
330
y_pred = knn.predict(X_test)
331
accuracy = knn.score(X_test, y_test)
332
```
333
334
### Regression
335
336
```python
337
from sklearn.neighbors import KNeighborsRegressor
338
from sklearn.datasets import load_diabetes
339
340
# Load data
341
diabetes = load_diabetes()
342
X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target)
343
344
# Train k-NN regressor
345
knn_reg = KNeighborsRegressor(n_neighbors=5, weights='distance')
346
knn_reg.fit(X_train, y_train)
347
348
# Make predictions
349
y_pred = knn_reg.predict(X_test)
350
score = knn_reg.score(X_test, y_test)
351
```
352
353
### Outlier Detection
354
355
```python
356
from sklearn.neighbors import LocalOutlierFactor
357
import numpy as np
358
359
# Generate sample data with outliers
360
np.random.seed(42)
361
X = np.random.randn(100, 2)
362
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
363
X = np.r_[X, X_outliers]
364
365
# Detect outliers
366
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
367
y_pred = lof.fit_predict(X)
368
outlier_scores = lof.negative_outlier_factor_
369
```
370
371
### Nearest Neighbor Search
372
373
```python
374
from sklearn.neighbors import NearestNeighbors
375
376
# Fit nearest neighbors
377
nbrs = NearestNeighbors(n_neighbors=5, algorithm='ball_tree')
378
nbrs.fit(X_train)
379
380
# Find neighbors
381
distances, indices = nbrs.kneighbors(X_test)
382
```
383
384
### Graph Construction
385
386
```python
387
from sklearn.neighbors import kneighbors_graph, radius_neighbors_graph
388
389
# Create k-neighbors graph
390
knn_graph = kneighbors_graph(X, n_neighbors=5, mode='connectivity')
391
392
# Create radius neighbors graph
393
radius_graph = radius_neighbors_graph(X, radius=1.0, mode='distance')
394
```
395
396
### Dimensionality Reduction with NCA
397
398
```python
399
from sklearn.neighbors import NeighborhoodComponentsAnalysis
400
from sklearn.preprocessing import StandardScaler
401
402
# Standardize features
403
scaler = StandardScaler()
404
X_scaled = scaler.fit_transform(X_train)
405
406
# Apply NCA
407
nca = NeighborhoodComponentsAnalysis(n_components=2, random_state=42)
408
X_nca = nca.fit_transform(X_scaled, y_train)
409
410
# Use with k-NN classifier
411
knn = KNeighborsClassifier(n_neighbors=3)
412
knn.fit(X_nca, y_train)
413
```
414
415
## Constants
416
417
```python { .api }
418
from sklearn.neighbors import VALID_METRICS, VALID_METRICS_SPARSE
419
420
VALID_METRICS: dict # Valid metrics for dense matrices
421
VALID_METRICS_SPARSE: dict # Valid metrics for sparse matrices
422
```