Tessl Tile for pypi/scikit-learn-intelex@2024.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced.md clustering.md daal4py-mb.md decomposition.md ensemble.md index.md linear-models.md metrics-model-selection.md neighbors.md patching-config.md stats-manifold.md svm.md

neighbors.mddocs/

0
# Nearest Neighbors
1

2
High-performance implementations of k-nearest neighbors algorithms with Intel hardware acceleration. These algorithms provide significant speedups for classification, regression, and unsupervised neighbor searches on large datasets.
3

4
## Capabilities
5

6
### K-Nearest Neighbors Classifier
7

8
Intel-accelerated k-nearest neighbors classifier with optimized distance calculations and neighbor search algorithms.
9

10
```python { .api }
11
class KNeighborsClassifier:
12
    """
13
    K-nearest neighbors classifier with Intel optimization.
14
    
15
    Provides significant speedup over standard scikit-learn implementation
16
    through vectorized distance computations and Intel hardware acceleration.
17
    """
18
    
19
    def __init__(
20
        self,
21
        n_neighbors=5,
22
        weights='uniform',
23
        algorithm='auto',
24
        leaf_size=30,
25
        p=2,
26
        metric='minkowski',
27
        metric_params=None,
28
        n_jobs=None
29
    ):
30
        """
31
        Initialize k-nearest neighbors classifier.
32
        
33
        Parameters:
34
            n_neighbors (int): Number of neighbors to use
35
            weights (str): Weight function ('uniform', 'distance')
36
            algorithm (str): Algorithm to compute nearest neighbors
37
            leaf_size (int): Leaf size for tree algorithms
38
            p (int): Power parameter for Minkowski metric
39
            metric (str): Distance metric to use
40
            metric_params (dict): Additional parameters for distance metric
41
            n_jobs (int): Number of parallel jobs
42
        """
43
    
44
    def fit(self, X, y):
45
        """
46
        Fit the k-nearest neighbors classifier.
47
        
48
        Parameters:
49
            X (array-like): Training data of shape (n_samples, n_features)
50
            y (array-like): Target values of shape (n_samples,)
51
            
52
        Returns:
53
            self: Fitted estimator
54
        """
55
    
56
    def predict(self, X):
57
        """
58
        Predict class labels for samples.
59
        
60
        Parameters:
61
            X (array-like): Test samples of shape (n_samples, n_features)
62
            
63
        Returns:
64
            array: Predicted class labels
65
        """
66
    
67
    def predict_proba(self, X):
68
        """
69
        Return probability estimates for test data.
70
        
71
        Parameters:
72
            X (array-like): Test samples
73
            
74
        Returns:
75
            array: Class probabilities of shape (n_samples, n_classes)
76
        """
77
    
78
    def kneighbors(self, X=None, n_neighbors=None, return_distance=True):
79
        """
80
        Find k-neighbors of a point.
81
        
82
        Parameters:
83
            X (array-like): Query points, or None for training data
84
            n_neighbors (int): Number of neighbors to get
85
            return_distance (bool): Whether to return distances
86
            
87
        Returns:
88
            tuple: (distances, indices) or just indices
89
        """
90
    
91
    def score(self, X, y, sample_weight=None):
92
        """
93
        Return mean accuracy on given test data and labels.
94
        
95
        Parameters:
96
            X (array-like): Test samples
97
            y (array-like): True labels
98
            sample_weight (array-like): Sample weights
99
            
100
        Returns:
101
            float: Mean accuracy score
102
        """
103
    
104
    # Attributes available after fitting
105
    classes_: ...           # Unique class labels
106
    effective_metric_: ...  # Effective distance metric used
107
    effective_metric_params_: ... # Effective additional metric parameters
108
    n_features_in_: ...     # Number of features in training data
109
    n_samples_fit_: ...     # Number of samples in training data
110
```
111

112
### K-Nearest Neighbors Regressor
113

114
Intel-accelerated k-nearest neighbors regressor for continuous target prediction.
115

116
```python { .api }
117
class KNeighborsRegressor:
118
    """
119
    K-nearest neighbors regressor with Intel optimization.
120
    
121
    Efficient regression based on k-nearest neighbors with optimized
122
    distance calculations and neighbor averaging.
123
    """
124
    
125
    def __init__(
126
        self,
127
        n_neighbors=5,
128
        weights='uniform',
129
        algorithm='auto',
130
        leaf_size=30,
131
        p=2,
132
        metric='minkowski',
133
        metric_params=None,
134
        n_jobs=None
135
    ):
136
        """
137
        Initialize k-nearest neighbors regressor.
138
        
139
        Parameters:
140
            n_neighbors (int): Number of neighbors to use
141
            weights (str): Weight function ('uniform', 'distance')
142
            algorithm (str): Algorithm to compute nearest neighbors
143
            leaf_size (int): Leaf size for tree algorithms
144
            p (int): Power parameter for Minkowski metric
145
            metric (str): Distance metric to use
146
            metric_params (dict): Additional parameters for distance metric
147
            n_jobs (int): Number of parallel jobs
148
        """
149
    
150
    def fit(self, X, y):
151
        """
152
        Fit the k-nearest neighbors regressor.
153
        
154
        Parameters:
155
            X (array-like): Training data of shape (n_samples, n_features)
156
            y (array-like): Target values of shape (n_samples,) or (n_samples, n_outputs)
157
            
158
        Returns:
159
            self: Fitted estimator
160
        """
161
    
162
    def predict(self, X):
163
        """
164
        Predict target values for samples.
165
        
166
        Parameters:
167
            X (array-like): Test samples of shape (n_samples, n_features)
168
            
169
        Returns:
170
            array: Predicted target values
171
        """
172
    
173
    def kneighbors(self, X=None, n_neighbors=None, return_distance=True):
174
        """
175
        Find k-neighbors of a point.
176
        
177
        Parameters:
178
            X (array-like): Query points, or None for training data
179
            n_neighbors (int): Number of neighbors to get
180
            return_distance (bool): Whether to return distances
181
            
182
        Returns:
183
            tuple: (distances, indices) or just indices
184
        """
185
    
186
    def score(self, X, y, sample_weight=None):
187
        """
188
        Return coefficient of determination R^2 of prediction.
189
        
190
        Parameters:
191
            X (array-like): Test samples
192
            y (array-like): True values
193
            sample_weight (array-like): Sample weights
194
            
195
        Returns:
196
            float: R^2 score
197
        """
198
    
199
    # Attributes available after fitting
200
    effective_metric_: ...  # Effective distance metric used
201
    effective_metric_params_: ... # Effective additional metric parameters
202
    n_features_in_: ...     # Number of features in training data
203
    n_samples_fit_: ...     # Number of samples in training data
204
```
205

206
### Nearest Neighbors (Unsupervised)
207

208
Intel-accelerated unsupervised nearest neighbors for neighbor queries without target labels.
209

210
```python { .api }
211
class NearestNeighbors:
212
    """
213
    Unsupervised nearest neighbors with Intel optimization.
214
    
215
    Efficient neighbor search algorithm for finding nearest neighbors
216
    without requiring target labels.
217
    """
218
    
219
    def __init__(
220
        self,
221
        n_neighbors=5,
222
        radius=1.0,
223
        algorithm='auto',
224
        leaf_size=30,
225
        metric='minkowski',
226
        p=2,
227
        metric_params=None,
228
        n_jobs=None
229
    ):
230
        """
231
        Initialize nearest neighbors searcher.
232
        
233
        Parameters:
234
            n_neighbors (int): Number of neighbors to use
235
            radius (float): Range of parameter space for radius neighbors
236
            algorithm (str): Algorithm to compute nearest neighbors
237
            leaf_size (int): Leaf size for tree algorithms
238
            metric (str): Distance metric to use
239
            p (int): Power parameter for Minkowski metric
240
            metric_params (dict): Additional parameters for distance metric
241
            n_jobs (int): Number of parallel jobs
242
        """
243
    
244
    def fit(self, X, y=None):
245
        """
246
        Fit the nearest neighbors estimator.
247
        
248
        Parameters:
249
            X (array-like): Training data of shape (n_samples, n_features)
250
            y: Ignored, present for API consistency
251
            
252
        Returns:
253
            self: Fitted estimator
254
        """
255
    
256
    def kneighbors(self, X=None, n_neighbors=None, return_distance=True):
257
        """
258
        Find k-neighbors of a point.
259
        
260
        Parameters:
261
            X (array-like): Query points, or None for training data
262
            n_neighbors (int): Number of neighbors to get
263
            return_distance (bool): Whether to return distances
264
            
265
        Returns:
266
            tuple: (distances, indices) or just indices
267
        """
268
    
269
    def radius_neighbors(self, X=None, radius=None, return_distance=True, sort_results=False):
270
        """
271
        Find neighbors within given radius.
272
        
273
        Parameters:
274
            X (array-like): Query points, or None for training data
275
            radius (float): Limiting distance of neighbors to return
276
            return_distance (bool): Whether to return distances
277
            sort_results (bool): Whether to sort results by distance
278
            
279
        Returns:
280
            tuple: (distances, indices) arrays of neighbors
281
        """
282
    
283
    def kneighbors_graph(self, X=None, n_neighbors=None, mode='connectivity'):
284
        """
285
        Compute k-neighbors connectivity graph.
286
        
287
        Parameters:
288
            X (array-like): Query points, or None for training data
289
            n_neighbors (int): Number of neighbors for each sample
290
            mode (str): Type of returned matrix ('connectivity', 'distance')
291
            
292
        Returns:
293
            sparse matrix: Graph representing k-neighbors connectivity
294
        """
295
    
296
    def radius_neighbors_graph(self, X=None, radius=None, mode='connectivity', sort_results=False):
297
        """
298
        Compute radius-based neighbors connectivity graph.
299
        
300
        Parameters:
301
            X (array-like): Query points, or None for training data
302
            radius (float): Radius of neighborhoods
303
            mode (str): Type of returned matrix ('connectivity', 'distance')
304
            sort_results (bool): Whether to sort results by distance
305
            
306
        Returns:
307
            sparse matrix: Graph representing radius neighbors connectivity
308
        """
309
    
310
    # Attributes available after fitting
311
    effective_metric_: ...  # Effective distance metric used
312
    effective_metric_params_: ... # Effective additional metric parameters
313
    n_features_in_: ...     # Number of features in training data
314
    n_samples_fit_: ...     # Number of samples in training data
315
```
316

317
### Local Outlier Factor
318

319
Intel-accelerated Local Outlier Factor for unsupervised outlier detection.
320

321
```python { .api }
322
class LocalOutlierFactor:
323
    """
324
    Local Outlier Factor with Intel optimization.
325
    
326
    Efficient unsupervised outlier detection using local density
327
    deviation of given data point with respect to its neighbors.
328
    """
329
    
330
    def __init__(
331
        self,
332
        n_neighbors=20,
333
        algorithm='auto',
334
        leaf_size=30,
335
        metric='minkowski',
336
        p=2,
337
        metric_params=None,
338
        contamination='auto',
339
        novelty=False,
340
        n_jobs=None
341
    ):
342
        """
343
        Initialize Local Outlier Factor.
344
        
345
        Parameters:
346
            n_neighbors (int): Number of neighbors to use
347
            algorithm (str): Algorithm to compute nearest neighbors
348
            leaf_size (int): Leaf size for tree algorithms
349
            metric (str): Distance metric to use
350
            p (int): Power parameter for Minkowski metric
351
            metric_params (dict): Additional parameters for distance metric
352
            contamination (str or float): Proportion of outliers in data set
353
            novelty (bool): Whether to use LOF for novelty detection
354
            n_jobs (int): Number of parallel jobs
355
        """
356
    
357
    def fit(self, X, y=None):
358
        """
359
        Fit the Local Outlier Factor detector.
360
        
361
        Parameters:
362
            X (array-like): Training data of shape (n_samples, n_features)
363
            y: Ignored, present for API consistency
364
            
365
        Returns:
366
            self: Fitted estimator
367
        """
368
    
369
    def fit_predict(self, X, y=None):
370
        """
371
        Fit detector and return binary labels (1 for inliers, -1 for outliers).
372
        
373
        Parameters:
374
            X (array-like): Training data
375
            y: Ignored
376
            
377
        Returns:
378
            array: Binary labels for each sample
379
        """
380
    
381
    def decision_function(self, X):
382
        """
383
        Shifted opposite of Local Outlier Factor of X.
384
        
385
        Parameters:
386
            X (array-like): Query samples
387
            
388
        Returns:
389
            array: Shifted opposite of LOF scores
390
        """
391
    
392
    def score_samples(self, X):
393
        """
394
        Opposite of Local Outlier Factor of X.
395
        
396
        Parameters:
397
            X (array-like): Query samples
398
            
399
        Returns:
400
            array: Opposite of LOF scores
401
        """
402
    
403
    def predict(self, X):
404
        """
405
        Predict raw anomaly score of X using fitted detector.
406
        
407
        Only available when novelty=True.
408
        
409
        Parameters:
410
            X (array-like): Query samples
411
            
412
        Returns:
413
            array: Anomaly scores for each sample
414
        """
415
    
416
    def kneighbors(self, X=None, n_neighbors=None, return_distance=True):
417
        """
418
        Find k-neighbors of a point.
419
        
420
        Parameters:
421
            X (array-like): Query points, or None for training data
422
            n_neighbors (int): Number of neighbors to get
423
            return_distance (bool): Whether to return distances
424
            
425
        Returns:
426
            tuple: (distances, indices) or just indices
427
        """
428
    
429
    # Attributes available after fitting
430
    negative_outlier_factor_: ... # Opposite of LOF of training samples
431
    n_neighbors_: ...             # Actual number of neighbors used
432
    offset_: ...                  # Offset used to obtain binary labels
433
    effective_metric_: ...        # Effective distance metric used
434
    effective_metric_params_: ... # Effective additional metric parameters
435
    n_features_in_: ...          # Number of features in training data
436
    n_samples_fit_: ...          # Number of samples in training data
437
```
438

439
## Usage Examples
440

441
### K-Nearest Neighbors Classification
442

443
```python
444
import numpy as np
445
from sklearnex.neighbors import KNeighborsClassifier
446
from sklearn.datasets import make_classification
447
from sklearn.model_selection import train_test_split
448

449
# Generate sample data
450
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
451
                          n_redundant=10, n_classes=3, random_state=42)
452
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
453

454
# Create and fit KNN classifier
455
knn = KNeighborsClassifier(n_neighbors=5, weights='distance')
456
knn.fit(X_train, y_train)
457

458
# Make predictions
459
y_pred = knn.predict(X_test)
460
y_proba = knn.predict_proba(X_test)
461

462
print(f"Accuracy: {knn.score(X_test, y_test):.3f}")
463
print(f"Classes: {knn.classes_}")
464
print(f"Prediction shape: {y_pred.shape}")
465
print(f"Probability shape: {y_proba.shape}")
466

467
# Find neighbors for specific points
468
distances, indices = knn.kneighbors(X_test[:5], n_neighbors=3)
469
print(f"Neighbor distances shape: {distances.shape}")
470
print(f"Neighbor indices shape: {indices.shape}")
471
```
472

473
### K-Nearest Neighbors Regression
474

475
```python
476
import numpy as np
477
from sklearnex.neighbors import KNeighborsRegressor
478
from sklearn.datasets import make_regression
479
from sklearn.model_selection import train_test_split
480
from sklearn.metrics import mean_squared_error, r2_score
481

482
# Generate sample data
483
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
484
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
485

486
# Create and fit KNN regressor
487
knn_reg = KNeighborsRegressor(n_neighbors=5, weights='distance')
488
knn_reg.fit(X_train, y_train)
489

490
# Make predictions
491
y_pred = knn_reg.predict(X_test)
492

493
print(f"R² Score: {knn_reg.score(X_test, y_test):.3f}")
494
print(f"MSE: {mean_squared_error(y_test, y_pred):.3f}")
495
print(f"Features in training: {knn_reg.n_features_in_}")
496
print(f"Samples in training: {knn_reg.n_samples_fit_}")
497

498
# Multi-output regression example
499
y_multi = np.column_stack([y, y * 0.5 + np.random.normal(0, 0.1, len(y))])
500
X_train, X_test, y_train_multi, y_test_multi = train_test_split(X, y_multi, test_size=0.2, random_state=42)
501

502
knn_multi = KNeighborsRegressor(n_neighbors=3)
503
knn_multi.fit(X_train, y_train_multi)
504
y_pred_multi = knn_multi.predict(X_test)
505

506
print(f"Multi-output prediction shape: {y_pred_multi.shape}")
507
```
508

509
### Unsupervised Nearest Neighbors
510

511
```python
512
import numpy as np
513
from sklearnex.neighbors import NearestNeighbors
514
from sklearn.datasets import make_blobs
515

516
# Generate sample data
517
X, _ = make_blobs(n_samples=500, centers=5, n_features=10, random_state=42)
518

519
# Create and fit nearest neighbors model
520
nn = NearestNeighbors(n_neighbors=10, metric='euclidean')
521
nn.fit(X)
522

523
# Find k-nearest neighbors
524
distances, indices = nn.kneighbors(X[:5])
525
print(f"Distances shape: {distances.shape}")
526
print(f"Indices shape: {indices.shape}")
527

528
# Find radius neighbors
529
radius_distances, radius_indices = nn.radius_neighbors(X[:3], radius=2.0)
530
print(f"Radius neighbors found for first 3 points:")
531
for i, (dists, idxs) in enumerate(zip(radius_distances, radius_indices)):
532
    print(f"  Point {i}: {len(idxs)} neighbors within radius 2.0")
533

534
# Create connectivity graphs
535
knn_graph = nn.kneighbors_graph(X[:10], n_neighbors=3, mode='connectivity')
536
print(f"KNN graph shape: {knn_graph.shape}")
537
print(f"KNN graph density: {knn_graph.nnz / (knn_graph.shape[0] * knn_graph.shape[1]):.3f}")
538

539
radius_graph = nn.radius_neighbors_graph(X[:10], radius=1.5, mode='distance')
540
print(f"Radius graph shape: {radius_graph.shape}")
541
print(f"Radius graph density: {radius_graph.nnz / (radius_graph.shape[0] * radius_graph.shape[1]):.3f}")
542
```
543

544
### Local Outlier Factor for Outlier Detection
545

546
```python
547
import numpy as np
548
from sklearnex.neighbors import LocalOutlierFactor
549
from sklearn.datasets import make_blobs
550

551
# Generate normal data with some outliers
552
X_inliers, _ = make_blobs(n_samples=200, centers=1, cluster_std=0.5, random_state=42)
553
X_outliers = np.random.uniform(low=-6, high=6, size=(20, 2))
554
X = np.concatenate([X_inliers, X_outliers])
555

556
# Outlier detection (unsupervised)
557
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
558
y_pred = lof.fit_predict(X)
559

560
# Count inliers and outliers
561
n_outliers = np.sum(y_pred == -1)
562
n_inliers = np.sum(y_pred == 1)
563

564
print(f"Outliers detected: {n_outliers}")
565
print(f"Inliers detected: {n_inliers}")
566
print(f"Outlier factor shape: {lof.negative_outlier_factor_.shape}")
567

568
# Get outlier scores
569
outlier_scores = lof.negative_outlier_factor_
570
print(f"Score range: [{outlier_scores.min():.3f}, {outlier_scores.max():.3f}]")
571

572
# Novelty detection example
573
X_train = X_inliers  # Train only on inliers
574
X_test = np.concatenate([X_inliers[:50], X_outliers[:10]])  # Test on mix
575

576
lof_novelty = LocalOutlierFactor(n_neighbors=20, novelty=True, contamination=0.1)
577
lof_novelty.fit(X_train)
578

579
# Predict on new data
580
y_pred_novelty = lof_novelty.predict(X_test)
581
decision_scores = lof_novelty.decision_function(X_test)
582

583
print(f"Novelty detection - Outliers: {np.sum(y_pred_novelty == -1)}")
584
print(f"Decision scores range: [{decision_scores.min():.3f}, {decision_scores.max():.3f}]")
585
```
586

587
### Performance Comparison
588

589
```python
590
import time
591
import numpy as np
592
from sklearn.datasets import make_classification
593

594
# Generate large dataset
595
X, y = make_classification(n_samples=50000, n_features=20, n_informative=15,
596
                          n_redundant=5, n_classes=5, random_state=42)
597
X_train, X_test = X[:40000], X[40000:]
598
y_train, y_test = y[:40000], y[40000:]
599

600
# Intel-optimized version
601
from sklearnex.neighbors import KNeighborsClassifier as IntelKNN
602

603
start_time = time.time()
604
intel_knn = IntelKNN(n_neighbors=5)
605
intel_knn.fit(X_train, y_train)
606
intel_pred = intel_knn.predict(X_test)
607
intel_time = time.time() - start_time
608

609
print(f"Intel KNN time: {intel_time:.2f} seconds")
610
print(f"Intel KNN accuracy: {intel_knn.score(X_test, y_test):.3f}")
611

612
# Standard scikit-learn version (for comparison)
613
from sklearn.neighbors import KNeighborsClassifier as StandardKNN
614

615
start_time = time.time()
616
standard_knn = StandardKNN(n_neighbors=5)
617
standard_knn.fit(X_train, y_train)
618
standard_pred = standard_knn.predict(X_test)
619
standard_time = time.time() - start_time
620

621
print(f"Standard KNN time: {standard_time:.2f} seconds")
622
print(f"Standard KNN accuracy: {standard_knn.score(X_test, y_test):.3f}")
623
print(f"Speedup: {standard_time / intel_time:.1f}x")
624

625
# Verify results are identical
626
print(f"Predictions identical: {np.allclose(intel_pred, standard_pred)}")
627
```
628

629
## Performance Notes
630

631
- K-nearest neighbors algorithms show significant speedups on datasets with >5000 samples
632
- Distance calculations benefit most from Intel optimization, especially with high-dimensional data
633
- LOF shows substantial improvements on datasets with >10000 samples
634
- Memory usage is comparable to standard scikit-learn versions
635
- All algorithms maintain identical results to scikit-learn implementations

Version

Tile

Files

neighbors.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

neighbors.mddocs/