0
# Unsupervised Learning
1
2
This document covers all unsupervised learning algorithms in scikit-learn, including clustering, dimensionality reduction, and mixture models.
3
4
## Clustering
5
6
### Core Clustering Algorithms
7
8
#### KMeans { .api }
9
```python
10
from sklearn.cluster import KMeans
11
12
KMeans(
13
n_clusters: int = 8,
14
init: str | ArrayLike | Callable = "k-means++",
15
n_init: int | str = "auto",
16
max_iter: int = 300,
17
tol: float = 0.0001,
18
verbose: int = 0,
19
random_state: int | RandomState | None = None,
20
copy_x: bool = True,
21
algorithm: str = "lloyd"
22
)
23
```
24
K-Means clustering.
25
26
#### MiniBatchKMeans { .api }
27
```python
28
from sklearn.cluster import MiniBatchKMeans
29
30
MiniBatchKMeans(
31
n_clusters: int = 8,
32
init: str | ArrayLike | Callable = "k-means++",
33
max_iter: int = 100,
34
batch_size: int = 1024,
35
verbose: int = 0,
36
compute_labels: bool = True,
37
random_state: int | RandomState | None = None,
38
tol: float = 0.0,
39
max_no_improvement: int = 10,
40
init_size: int | None = None,
41
n_init: int | str = 3,
42
reassignment_ratio: float = 0.01
43
)
44
```
45
Mini-Batch K-Means clustering.
46
47
#### BisectingKMeans { .api }
48
```python
49
from sklearn.cluster import BisectingKMeans
50
51
BisectingKMeans(
52
n_clusters: int = 8,
53
init: str | Callable = "random",
54
n_init: int = 1,
55
random_state: int | RandomState | None = None,
56
max_iter: int = 300,
57
verbose: int = 0,
58
tol: float = 0.0001,
59
copy_x: bool = True,
60
algorithm: str = "lloyd",
61
bisecting_strategy: str = "biggest_inertia"
62
)
63
```
64
Bisecting K-Means clustering.
65
66
#### DBSCAN { .api }
67
```python
68
from sklearn.cluster import DBSCAN
69
70
DBSCAN(
71
eps: float = 0.5,
72
min_samples: int = 5,
73
metric: str | Callable = "euclidean",
74
metric_params: dict | None = None,
75
algorithm: str = "auto",
76
leaf_size: int = 30,
77
p: float | None = None,
78
n_jobs: int | None = None
79
)
80
```
81
Perform DBSCAN clustering from vector array or distance matrix.
82
83
#### HDBSCAN { .api }
84
```python
85
from sklearn.cluster import HDBSCAN
86
87
HDBSCAN(
88
min_cluster_size: int = 5,
89
min_samples: int | None = None,
90
cluster_selection_epsilon: float = 0.0,
91
max_cluster_size: int | None = None,
92
metric: str | Callable = "euclidean",
93
metric_params: dict | None = None,
94
alpha: float = 1.0,
95
algorithm: str = "auto",
96
leaf_size: int = 40,
97
n_jobs: int | None = None,
98
cluster_selection_method: str = "eom",
99
allow_single_cluster: bool = False,
100
store_centers: str | None = None,
101
copy: bool = True
102
)
103
```
104
Perform HDBSCAN clustering from vector array or distance matrix.
105
106
#### OPTICS { .api }
107
```python
108
from sklearn.cluster import OPTICS
109
110
OPTICS(
111
min_samples: int = 5,
112
max_eps: float = ...,
113
metric: str | Callable = "minkowski",
114
p: int = 2,
115
metric_params: dict | None = None,
116
cluster_method: str = "xi",
117
eps: float | None = None,
118
xi: float = 0.05,
119
predecessor_correction: bool = True,
120
min_cluster_size: int | float | None = None,
121
algorithm: str = "auto",
122
leaf_size: int = 30,
123
memory: str | object | None = None,
124
n_jobs: int | None = None
125
)
126
```
127
Estimate clustering structure from vector array.
128
129
#### MeanShift { .api }
130
```python
131
from sklearn.cluster import MeanShift
132
133
MeanShift(
134
bandwidth: float | None = None,
135
seeds: ArrayLike | None = None,
136
bin_seeding: bool = False,
137
min_bin_freq: int = 1,
138
cluster_all: bool = True,
139
n_jobs: int | None = None,
140
max_iter: int = 300
141
)
142
```
143
Mean shift clustering using a flat kernel.
144
145
#### AgglomerativeClustering { .api }
146
```python
147
from sklearn.cluster import AgglomerativeClustering
148
149
AgglomerativeClustering(
150
n_clusters: int | None = 2,
151
metric: str | Callable | None = None,
152
memory: str | object | None = None,
153
connectivity: ArrayLike | Callable | None = None,
154
compute_full_tree: bool | str = "auto",
155
linkage: str = "ward",
156
distance_threshold: float | None = None,
157
compute_distances: bool = False
158
)
159
```
160
Agglomerative Clustering.
161
162
#### FeatureAgglomeration { .api }
163
```python
164
from sklearn.cluster import FeatureAgglomeration
165
166
FeatureAgglomeration(
167
n_clusters: int | None = 2,
168
metric: str | Callable | None = None,
169
memory: str | object | None = None,
170
connectivity: ArrayLike | Callable | None = None,
171
compute_full_tree: bool | str = "auto",
172
linkage: str = "ward",
173
pooling_func: Callable = ...,
174
distance_threshold: float | None = None,
175
compute_distances: bool = False
176
)
177
```
178
Agglomerate features.
179
180
#### Birch { .api }
181
```python
182
from sklearn.cluster import Birch
183
184
Birch(
185
n_clusters: int | None = 3,
186
threshold: float = 0.5,
187
branching_factor: int = 50,
188
compute_labels: bool = True,
189
copy: bool = True
190
)
191
```
192
Implements the BIRCH clustering algorithm.
193
194
#### AffinityPropagation { .api }
195
```python
196
from sklearn.cluster import AffinityPropagation
197
198
AffinityPropagation(
199
damping: float = 0.5,
200
max_iter: int = 200,
201
convergence_iter: int = 15,
202
copy: bool = True,
203
preference: ArrayLike | float | None = None,
204
affinity: str = "euclidean",
205
verbose: bool = False,
206
random_state: int | RandomState | None = None
207
)
208
```
209
Perform Affinity Propagation Clustering of data.
210
211
#### SpectralClustering { .api }
212
```python
213
from sklearn.cluster import SpectralClustering
214
215
SpectralClustering(
216
n_clusters: int = 8,
217
eigen_solver: str | None = None,
218
n_components: int | None = None,
219
random_state: int | RandomState | None = None,
220
n_init: int = 10,
221
gamma: float = 1.0,
222
affinity: str | Callable = "rbf",
223
n_neighbors: int = 10,
224
eigen_tol: float | str = "auto",
225
assign_labels: str = "kmeans",
226
degree: float = 3,
227
coef0: float = 1,
228
kernel_params: dict | None = None,
229
n_jobs: int | None = None,
230
verbose: bool = False
231
)
232
```
233
Apply clustering to a projection of the normalized Laplacian.
234
235
#### SpectralBiclustering { .api }
236
```python
237
from sklearn.cluster import SpectralBiclustering
238
239
SpectralBiclustering(
240
n_clusters: int | tuple = 3,
241
method: str = "bistochastic",
242
n_components: int = 6,
243
n_best: int = 3,
244
svd_method: str = "randomized",
245
n_svd_vecs: int | None = None,
246
mini_batch: bool = False,
247
init: str | ArrayLike = "k-means++",
248
n_init: int = 10,
249
random_state: int | RandomState | None = None
250
)
251
```
252
Spectral biclustering (Kluger, 2003).
253
254
#### SpectralCoclustering { .api }
255
```python
256
from sklearn.cluster import SpectralCoclustering
257
258
SpectralCoclustering(
259
n_clusters: int = 3,
260
svd_method: str = "randomized",
261
n_svd_vecs: int | None = None,
262
mini_batch: bool = False,
263
init: str | ArrayLike = "k-means++",
264
n_init: int = 10,
265
random_state: int | RandomState | None = None
266
)
267
```
268
Spectral Co-Clustering algorithm (Dhillon, 2001).
269
270
### Clustering Functions
271
272
#### k_means { .api }
273
```python
274
from sklearn.cluster import k_means
275
276
k_means(
277
X: ArrayLike,
278
n_clusters: int,
279
sample_weight: ArrayLike | None = None,
280
init: str | ArrayLike | Callable = "k-means++",
281
n_init: int | str = 10,
282
max_iter: int = 300,
283
verbose: bool = False,
284
tol: float = 0.0001,
285
random_state: int | RandomState | None = None,
286
copy_x: bool = True,
287
algorithm: str = "lloyd",
288
return_n_iter: bool = False
289
) -> tuple[ArrayLike, ArrayLike, float, int] | tuple[ArrayLike, ArrayLike, float]
290
```
291
K-means clustering algorithm.
292
293
#### kmeans_plusplus { .api }
294
```python
295
from sklearn.cluster import kmeans_plusplus
296
297
kmeans_plusplus(
298
X: ArrayLike,
299
n_clusters: int,
300
x_squared_norms: ArrayLike | None = None,
301
random_state: int | RandomState | None = None,
302
n_local_trials: int | None = None
303
) -> tuple[ArrayLike, ArrayLike]
304
```
305
Init n_clusters seeds according to k-means++.
306
307
#### dbscan { .api }
308
```python
309
from sklearn.cluster import dbscan
310
311
dbscan(
312
X: ArrayLike,
313
eps: float = 0.5,
314
min_samples: int = 5,
315
metric: str | Callable = "euclidean",
316
metric_params: dict | None = None,
317
algorithm: str = "auto",
318
leaf_size: int = 30,
319
p: float | None = None,
320
sample_weight: ArrayLike | None = None,
321
n_jobs: int | None = None
322
) -> tuple[ArrayLike, ArrayLike]
323
```
324
Perform DBSCAN clustering from vector array or distance matrix.
325
326
#### affinity_propagation { .api }
327
```python
328
from sklearn.cluster import affinity_propagation
329
330
affinity_propagation(
331
S: ArrayLike,
332
preference: ArrayLike | float | None = None,
333
convergence_iter: int = 15,
334
max_iter: int = 200,
335
damping: float = 0.5,
336
copy: bool = True,
337
verbose: bool = False,
338
return_n_iter: bool = False,
339
random_state: int | RandomState | None = None
340
) -> tuple[ArrayLike, ArrayLike, int] | tuple[ArrayLike, ArrayLike]
341
```
342
Perform Affinity Propagation Clustering of data.
343
344
#### spectral_clustering { .api }
345
```python
346
from sklearn.cluster import spectral_clustering
347
348
spectral_clustering(
349
affinity: ArrayLike,
350
n_clusters: int = 8,
351
n_components: int | None = None,
352
eigen_solver: str | None = None,
353
random_state: int | RandomState | None = None,
354
n_init: int = 10,
355
eigen_tol: float | str = "auto",
356
assign_labels: str = "kmeans",
357
verbose: bool = False
358
) -> ArrayLike
359
```
360
Apply clustering to a projection of the normalized Laplacian.
361
362
#### mean_shift { .api }
363
```python
364
from sklearn.cluster import mean_shift
365
366
mean_shift(
367
X: ArrayLike,
368
bandwidth: float | None = None,
369
seeds: ArrayLike | None = None,
370
bin_seeding: bool = False,
371
min_bin_freq: int = 1,
372
cluster_all: bool = True,
373
max_iter: int = 300,
374
n_jobs: int | None = None
375
) -> tuple[ArrayLike, ArrayLike]
376
```
377
Perform mean shift clustering of data using a flat kernel.
378
379
#### estimate_bandwidth { .api }
380
```python
381
from sklearn.cluster import estimate_bandwidth
382
383
estimate_bandwidth(
384
X: ArrayLike,
385
quantile: float = 0.3,
386
n_samples: int | None = None,
387
random_state: int | RandomState | None = None,
388
n_jobs: int | None = None
389
) -> float
390
```
391
Estimate the bandwidth to use with the mean-shift algorithm.
392
393
#### ward_tree { .api }
394
```python
395
from sklearn.cluster import ward_tree
396
397
ward_tree(
398
X: ArrayLike,
399
connectivity: ArrayLike | None = None,
400
n_clusters: int | None = None,
401
return_distance: bool = False
402
) -> tuple[ArrayLike, int, int, ArrayLike, ArrayLike] | tuple[ArrayLike, int, int, ArrayLike]
403
```
404
Ward clustering based on a Feature matrix.
405
406
#### linkage_tree { .api }
407
```python
408
from sklearn.cluster import linkage_tree
409
410
linkage_tree(
411
X: ArrayLike,
412
connectivity: ArrayLike | None = None,
413
n_clusters: int | None = None,
414
linkage: str = "complete",
415
affinity: str = "euclidean",
416
return_distance: bool = False
417
) -> tuple[ArrayLike, int, int, ArrayLike, ArrayLike] | tuple[ArrayLike, int, int, ArrayLike]
418
```
419
Linkage agglomerative clustering based on a Feature matrix.
420
421
#### get_bin_seeds { .api }
422
```python
423
from sklearn.cluster import get_bin_seeds
424
425
get_bin_seeds(
426
X: ArrayLike,
427
bin_size: float,
428
min_bin_freq: int = 1
429
) -> ArrayLike
430
```
431
Find seeds for mean_shift.
432
433
#### cluster_optics_dbscan { .api }
434
```python
435
from sklearn.cluster import cluster_optics_dbscan
436
437
cluster_optics_dbscan(
438
reachability: ArrayLike,
439
core_distances: ArrayLike,
440
ordering: ArrayLike,
441
eps: float
442
) -> ArrayLike
443
```
444
Performs DBSCAN extraction for an arbitrary epsilon.
445
446
#### cluster_optics_xi { .api }
447
```python
448
from sklearn.cluster import cluster_optics_xi
449
450
cluster_optics_xi(
451
reachability: ArrayLike,
452
predecessor: ArrayLike,
453
ordering: ArrayLike,
454
min_samples: int,
455
min_cluster_size: int | float | None = None,
456
xi: float = 0.05,
457
predecessor_correction: bool = True
458
) -> tuple[ArrayLike, ArrayLike]
459
```
460
Automatically extract clusters according to the Xi-steep method.
461
462
#### compute_optics_graph { .api }
463
```python
464
from sklearn.cluster import compute_optics_graph
465
466
compute_optics_graph(
467
X: ArrayLike,
468
min_samples: int,
469
max_eps: float,
470
metric: str | Callable,
471
p: int,
472
metric_params: dict | None,
473
algorithm: str,
474
leaf_size: int,
475
n_jobs: int | None
476
) -> ArrayLike
477
```
478
Compute the OPTICS reachability graph.
479
480
## Dimensionality Reduction
481
482
### Principal Component Analysis
483
484
#### PCA { .api }
485
```python
486
from sklearn.decomposition import PCA
487
488
PCA(
489
n_components: int | float | str | None = None,
490
copy: bool = True,
491
whiten: bool = False,
492
svd_solver: str = "auto",
493
tol: float = 0.0,
494
iterated_power: int | str = "auto",
495
n_oversamples: int = 10,
496
power_iteration_normalizer: str = "auto",
497
random_state: int | RandomState | None = None
498
)
499
```
500
Principal component analysis (PCA).
501
502
#### IncrementalPCA { .api }
503
```python
504
from sklearn.decomposition import IncrementalPCA
505
506
IncrementalPCA(
507
n_components: int | None = None,
508
whiten: bool = False,
509
copy: bool = True,
510
batch_size: int | None = None
511
)
512
```
513
Incremental principal components analysis (IPCA).
514
515
#### KernelPCA { .api }
516
```python
517
from sklearn.decomposition import KernelPCA
518
519
KernelPCA(
520
n_components: int | None = None,
521
kernel: str | Callable = "linear",
522
gamma: float | None = None,
523
degree: int = 3,
524
coef0: float = 1,
525
kernel_params: dict | None = None,
526
alpha: float = 1.0,
527
fit_inverse_transform: bool = False,
528
eigen_solver: str = "auto",
529
tol: float = 0,
530
max_iter: int | None = None,
531
iterated_power: int | str = "auto",
532
remove_zero_eig: bool = False,
533
random_state: int | RandomState | None = None,
534
copy_X: bool = True,
535
n_jobs: int | None = None
536
)
537
```
538
Kernel Principal component analysis (KPCA).
539
540
#### SparsePCA { .api }
541
```python
542
from sklearn.decomposition import SparsePCA
543
544
SparsePCA(
545
n_components: int | None = None,
546
alpha: float = 1,
547
ridge_alpha: float = 0.01,
548
max_iter: int = 1000,
549
tol: float = 1e-08,
550
method: str = "lars",
551
n_jobs: int | None = None,
552
U_init: ArrayLike | None = None,
553
V_init: ArrayLike | None = None,
554
verbose: bool | int = False,
555
random_state: int | RandomState | None = None
556
)
557
```
558
Sparse Principal Components Analysis (SparsePCA).
559
560
#### MiniBatchSparsePCA { .api }
561
```python
562
from sklearn.decomposition import MiniBatchSparsePCA
563
564
MiniBatchSparsePCA(
565
n_components: int | None = None,
566
alpha: float = 1,
567
ridge_alpha: float = 0.01,
568
n_iter: int = 100,
569
callback: Callable | None = None,
570
batch_size: int = 3,
571
verbose: bool | int = False,
572
shuffle: bool = True,
573
n_jobs: int | None = None,
574
method: str = "lars",
575
random_state: int | RandomState | None = None
576
)
577
```
578
Mini-batch Sparse Principal Components Analysis.
579
580
#### TruncatedSVD { .api }
581
```python
582
from sklearn.decomposition import TruncatedSVD
583
584
TruncatedSVD(
585
n_components: int = 2,
586
algorithm: str = "randomized",
587
n_iter: int = 5,
588
n_oversamples: int = 10,
589
power_iteration_normalizer: str = "auto",
590
random_state: int | RandomState | None = None,
591
tol: float = 0.0
592
)
593
```
594
Dimensionality reduction using truncated SVD (aka LSA).
595
596
### Independent Component Analysis
597
598
#### FastICA { .api }
599
```python
600
from sklearn.decomposition import FastICA
601
602
FastICA(
603
n_components: int | None = None,
604
algorithm: str = "parallel",
605
whiten: str | bool = "unit-variance",
606
fun: str | Callable = "logcosh",
607
fun_args: dict | None = None,
608
max_iter: int = 200,
609
tol: float = 0.0001,
610
w_init: ArrayLike | None = None,
611
whiten_solver: str = "svd",
612
random_state: int | RandomState | None = None
613
)
614
```
615
FastICA: a fast algorithm for Independent Component Analysis.
616
617
### Factor Analysis
618
619
#### FactorAnalysis { .api }
620
```python
621
from sklearn.decomposition import FactorAnalysis
622
623
FactorAnalysis(
624
n_components: int | None = None,
625
tol: float = 0.01,
626
copy: bool = True,
627
max_iter: int = 1000,
628
noise_variance_init: ArrayLike | None = None,
629
svd_method: str = "randomized",
630
iterated_power: int = 3,
631
rotation: str | None = None,
632
random_state: int | RandomState | None = None
633
)
634
```
635
Factor Analysis (FA).
636
637
### Dictionary Learning
638
639
#### DictionaryLearning { .api }
640
```python
641
from sklearn.decomposition import DictionaryLearning
642
643
DictionaryLearning(
644
n_components: int | None = None,
645
alpha: float = 1,
646
max_iter: int = 1000,
647
tol: float = 1e-08,
648
fit_algorithm: str = "lars",
649
transform_algorithm: str = "omp",
650
transform_n_nonzero_coefs: int | None = None,
651
transform_alpha: float | None = None,
652
n_jobs: int | None = None,
653
code_init: ArrayLike | None = None,
654
dict_init: ArrayLike | None = None,
655
verbose: bool = False,
656
split_sign: bool = False,
657
random_state: int | RandomState | None = None,
658
positive_code: bool = False,
659
positive_dict: bool = False,
660
transform_max_iter: int = 1000
661
)
662
```
663
Dictionary learning.
664
665
#### MiniBatchDictionaryLearning { .api }
666
```python
667
from sklearn.decomposition import MiniBatchDictionaryLearning
668
669
MiniBatchDictionaryLearning(
670
n_components: int | None = None,
671
alpha: float = 1,
672
max_iter: int = 1000,
673
fit_algorithm: str = "lars",
674
n_jobs: int | None = None,
675
batch_size: int = 256,
676
shuffle: bool = True,
677
dict_init: ArrayLike | None = None,
678
transform_algorithm: str = "omp",
679
transform_n_nonzero_coefs: int | None = None,
680
transform_alpha: float | None = None,
681
verbose: bool = False,
682
split_sign: bool = False,
683
random_state: int | RandomState | None = None,
684
positive_code: bool = False,
685
positive_dict: bool = False,
686
transform_max_iter: int = 1000
687
)
688
```
689
Mini-batch dictionary learning.
690
691
#### SparseCoder { .api }
692
```python
693
from sklearn.decomposition import SparseCoder
694
695
SparseCoder(
696
dictionary: ArrayLike,
697
transform_algorithm: str = "omp",
698
transform_n_nonzero_coefs: int | None = None,
699
transform_alpha: float | None = None,
700
split_sign: bool = False,
701
n_jobs: int | None = None,
702
positive_code: bool = False,
703
transform_max_iter: int = 1000
704
)
705
```
706
Sparse coding.
707
708
### Non-negative Matrix Factorization
709
710
#### NMF { .api }
711
```python
712
from sklearn.decomposition import NMF
713
714
NMF(
715
n_components: int | None = None,
716
init: str | ArrayLike | None = None,
717
solver: str = "cd",
718
beta_loss: float | str = "frobenius",
719
tol: float = 0.0001,
720
max_iter: int = 200,
721
random_state: int | RandomState | None = None,
722
alpha_W: float = 0.0,
723
alpha_H: float | str = "same",
724
l1_ratio: float = 0.0,
725
verbose: int = 0,
726
shuffle: bool = False
727
)
728
```
729
Non-negative Matrix Factorization (NMF).
730
731
#### MiniBatchNMF { .api }
732
```python
733
from sklearn.decomposition import MiniBatchNMF
734
735
MiniBatchNMF(
736
n_components: int | None = None,
737
init: str | ArrayLike | None = None,
738
batch_size: int = 1024,
739
beta_loss: float | str = "frobenius",
740
tol: float = 0.0001,
741
max_no_improvement: int = 10,
742
max_iter: int = 200,
743
alpha_W: float = 0.0,
744
alpha_H: float | str = "same",
745
l1_ratio: float = 0.0,
746
forget_factor: float = 0.7,
747
fresh_restarts: bool = False,
748
fresh_restarts_max_iter: int = 30,
749
transform_max_iter: int | None = None,
750
random_state: int | RandomState | None = None,
751
verbose: int = 0
752
)
753
```
754
Mini-Batch Non-Negative Matrix Factorization (NMF).
755
756
### Latent Dirichlet Allocation
757
758
#### LatentDirichletAllocation { .api }
759
```python
760
from sklearn.decomposition import LatentDirichletAllocation
761
762
LatentDirichletAllocation(
763
n_components: int = 10,
764
doc_topic_prior: float | None = None,
765
topic_word_prior: float | None = None,
766
learning_method: str = "batch",
767
learning_decay: float = 0.7,
768
learning_offset: float = 10.0,
769
max_iter: int = 10,
770
batch_size: int = 128,
771
evaluate_every: int = 0,
772
total_samples: int = 1000000.0,
773
perp_tol: float = 0.1,
774
mean_change_tol: float = 0.001,
775
max_doc_update_iter: int = 100,
776
n_jobs: int | None = None,
777
verbose: int = 0,
778
random_state: int | RandomState | None = None
779
)
780
```
781
Latent Dirichlet Allocation with online variational Bayes algorithm.
782
783
### Decomposition Functions
784
785
#### randomized_svd { .api }
786
```python
787
from sklearn.decomposition import randomized_svd
788
789
randomized_svd(
790
M: ArrayLike,
791
n_components: int,
792
n_oversamples: int = 10,
793
n_iter: int | str = "auto",
794
power_iteration_normalizer: str = "auto",
795
transpose: bool | str = "auto",
796
flip_sign: bool = True,
797
random_state: int | RandomState | None = None,
798
svd_lapack_driver: str = "gesdd"
799
) -> tuple[ArrayLike, ArrayLike, ArrayLike]
800
```
801
Compute a truncated randomized SVD.
802
803
#### fastica { .api }
804
```python
805
from sklearn.decomposition import fastica
806
807
fastica(
808
X: ArrayLike,
809
n_components: int | None = None,
810
algorithm: str = "parallel",
811
whiten: str | bool = "unit-variance",
812
fun: str | Callable = "logcosh",
813
fun_args: dict | None = None,
814
max_iter: int = 200,
815
tol: float = 0.0001,
816
w_init: ArrayLike | None = None,
817
whiten_solver: str = "svd",
818
random_state: int | RandomState | None = None,
819
return_X_mean: bool = False,
820
compute_sources: bool = True,
821
return_n_iter: bool = False
822
) -> tuple[ArrayLike, ArrayLike, ArrayLike] | tuple[ArrayLike, ArrayLike, ArrayLike, int] | tuple[ArrayLike, ArrayLike, ArrayLike, ArrayLike] | tuple[ArrayLike, ArrayLike, ArrayLike, ArrayLike, int]
823
```
824
Perform Fast Independent Component Analysis.
825
826
#### dict_learning { .api }
827
```python
828
from sklearn.decomposition import dict_learning
829
830
dict_learning(
831
X: ArrayLike,
832
n_components: int,
833
alpha: float,
834
max_iter: int = 100,
835
tol: float = 1e-08,
836
method: str = "lars",
837
n_jobs: int | None = None,
838
dict_init: ArrayLike | None = None,
839
code_init: ArrayLike | None = None,
840
callback: Callable | None = None,
841
verbose: bool = False,
842
random_state: int | RandomState | None = None,
843
return_n_iter: bool = False,
844
positive_dict: bool = False,
845
positive_code: bool = False,
846
method_max_iter: int = 1000
847
) -> tuple[ArrayLike, ArrayLike, ArrayLike] | tuple[ArrayLike, ArrayLike, ArrayLike, int]
848
```
849
Solve a dictionary learning matrix factorization problem.
850
851
#### dict_learning_online { .api }
852
```python
853
from sklearn.decomposition import dict_learning_online
854
855
dict_learning_online(
856
X: ArrayLike,
857
n_components: int = 2,
858
alpha: float = 1,
859
max_iter: int = 100,
860
return_code: bool = True,
861
dict_init: ArrayLike | None = None,
862
callback: Callable | None = None,
863
batch_size: int = 256,
864
verbose: bool = False,
865
shuffle: bool = True,
866
n_jobs: int | None = None,
867
method: str = "lars",
868
iter_offset: int = 0,
869
random_state: int | RandomState | None = None,
870
return_inner_stats: bool = False,
871
inner_stats: tuple | None = None,
872
return_n_iter: bool = False,
873
positive_dict: bool = False,
874
positive_code: bool = False,
875
method_max_iter: int = 1000
876
) -> ArrayLike | tuple[ArrayLike, ArrayLike] | tuple[ArrayLike, tuple] | tuple[ArrayLike, ArrayLike, tuple] | tuple[ArrayLike, int] | tuple[ArrayLike, ArrayLike, int] | tuple[ArrayLike, tuple, int] | tuple[ArrayLike, ArrayLike, tuple, int]
877
```
878
Solve a dictionary learning matrix factorization problem online.
879
880
#### sparse_encode { .api }
881
```python
882
from sklearn.decomposition import sparse_encode
883
884
sparse_encode(
885
X: ArrayLike,
886
dictionary: ArrayLike,
887
gram: ArrayLike | None = None,
888
cov: ArrayLike | None = None,
889
algorithm: str = "lasso_lars",
890
n_nonzero_coefs: int | None = None,
891
alpha: float | None = None,
892
copy_cov: bool = True,
893
init: ArrayLike | None = None,
894
max_iter: int = 1000,
895
n_jobs: int | None = None,
896
check_input: bool = True,
897
verbose: int = 0,
898
positive: bool = False
899
) -> ArrayLike
900
```
901
Sparse coding.
902
903
#### non_negative_factorization { .api }
904
```python
905
from sklearn.decomposition import non_negative_factorization
906
907
non_negative_factorization(
908
X: ArrayLike,
909
W: ArrayLike | None = None,
910
H: ArrayLike | None = None,
911
n_components: int | None = None,
912
init: str | ArrayLike | None = None,
913
update_H: bool = True,
914
solver: str = "cd",
915
beta_loss: float | str = "frobenius",
916
tol: float = 0.0001,
917
max_iter: int = 200,
918
alpha_W: float = 0.0,
919
alpha_H: float | str = "same",
920
l1_ratio: float = 0.0,
921
regularization: str | None = None,
922
random_state: int | RandomState | None = None,
923
verbose: int = 0,
924
shuffle: bool = False
925
) -> tuple[ArrayLike, ArrayLike, int]
926
```
927
Compute Non-negative Matrix Factorization (NMF).
928
929
## Manifold Learning
930
931
#### Isomap { .api }
932
```python
933
from sklearn.manifold import Isomap
934
935
Isomap(
936
n_neighbors: int = 5,
937
radius: float | None = None,
938
n_components: int = 2,
939
eigen_solver: str = "auto",
940
tol: float = 0,
941
max_iter: int | None = None,
942
path_method: str = "auto",
943
neighbors_algorithm: str = "auto",
944
n_jobs: int | None = None,
945
metric: str | Callable = "minkowski",
946
p: int = 2,
947
metric_params: dict | None = None
948
)
949
```
950
Isomap Embedding.
951
952
#### LocallyLinearEmbedding { .api }
953
```python
954
from sklearn.manifold import LocallyLinearEmbedding
955
956
LocallyLinearEmbedding(
957
n_neighbors: int = 5,
958
n_components: int = 2,
959
reg: float = 0.001,
960
eigen_solver: str = "auto",
961
tol: float = 1e-06,
962
max_iter: int = 100,
963
method: str = "standard",
964
hessian_tol: float = 0.0001,
965
modified_tol: float = 1e-12,
966
neighbors_algorithm: str = "auto",
967
random_state: int | RandomState | None = None,
968
n_jobs: int | None = None
969
)
970
```
971
Locally Linear Embedding.
972
973
#### MDS { .api }
974
```python
975
from sklearn.manifold import MDS
976
977
MDS(
978
n_components: int = 2,
979
metric: bool = True,
980
n_init: int = 4,
981
max_iter: int = 300,
982
verbose: int = 0,
983
eps: float = 0.001,
984
n_jobs: int | None = None,
985
random_state: int | RandomState | None = None,
986
dissimilarity: str = "euclidean",
987
normalized_stress: str | bool = "auto"
988
)
989
```
990
Multidimensional scaling.
991
992
#### SpectralEmbedding { .api }
993
```python
994
from sklearn.manifold import SpectralEmbedding
995
996
SpectralEmbedding(
997
n_components: int = 2,
998
affinity: str | Callable = "nearest_neighbors",
999
gamma: float | None = None,
1000
random_state: int | RandomState | None = None,
1001
eigen_solver: str | None = None,
1002
n_neighbors: int | None = None,
1003
n_jobs: int | None = None
1004
)
1005
```
1006
Spectral embedding for non-linear dimensionality reduction.
1007
1008
#### TSNE { .api }
1009
```python
1010
from sklearn.manifold import TSNE
1011
1012
TSNE(
1013
n_components: int = 2,
1014
perplexity: float = 30.0,
1015
early_exaggeration: float = 12.0,
1016
learning_rate: float | str = "warn",
1017
n_iter: int = 1000,
1018
n_iter_without_progress: int = 300,
1019
min_grad_norm: float = 1e-07,
1020
metric: str | Callable = "euclidean",
1021
metric_params: dict | None = None,
1022
init: str | ArrayLike = "warn",
1023
verbose: int = 0,
1024
random_state: int | RandomState | None = None,
1025
method: str = "barnes_hut",
1026
angle: float = 0.5,
1027
n_jobs: int | None = None,
1028
square_distances: str | bool = "deprecated"
1029
)
1030
```
1031
t-distributed Stochastic Neighbor Embedding.
1032
1033
### Manifold Learning Functions
1034
1035
#### locally_linear_embedding { .api }
1036
```python
1037
from sklearn.manifold import locally_linear_embedding
1038
1039
locally_linear_embedding(
1040
X: ArrayLike,
1041
n_neighbors: int,
1042
n_components: int,
1043
reg: float = 0.001,
1044
eigen_solver: str = "auto",
1045
tol: float = 1e-06,
1046
max_iter: int = 100,
1047
method: str = "standard",
1048
hessian_tol: float = 0.0001,
1049
modified_tol: float = 1e-12,
1050
random_state: int | RandomState | None = None,
1051
n_jobs: int | None = None
1052
) -> tuple[ArrayLike, float]
1053
```
1054
Perform a Locally Linear Embedding analysis on the data.
1055
1056
#### spectral_embedding { .api }
1057
```python
1058
from sklearn.manifold import spectral_embedding
1059
1060
spectral_embedding(
1061
adjacency: ArrayLike,
1062
n_components: int = 8,
1063
eigen_solver: str | None = None,
1064
random_state: int | RandomState | None = None,
1065
eigen_tol: float | str = "auto",
1066
norm_laplacian: bool = True,
1067
drop_first: bool = True
1068
) -> ArrayLike
1069
```
1070
Project the sample on the first eigenvectors of the graph Laplacian.
1071
1072
#### smacof { .api }
1073
```python
1074
from sklearn.manifold import smacof
1075
1076
smacof(
1077
dissimilarities: ArrayLike,
1078
metric: bool = True,
1079
n_components: int = 2,
1080
init: ArrayLike | None = None,
1081
n_init: int = 8,
1082
n_jobs: int | None = None,
1083
max_iter: int = 300,
1084
verbose: int = 0,
1085
eps: float = 0.001,
1086
random_state: int | RandomState | None = None,
1087
return_n_iter: bool = False,
1088
normalized_stress: str | bool = "auto"
1089
) -> tuple[ArrayLike, float, int] | tuple[ArrayLike, float]
1090
```
1091
Compute multidimensional scaling using the SMACOF algorithm.
1092
1093
#### trustworthiness { .api }
1094
```python
1095
from sklearn.manifold import trustworthiness
1096
1097
trustworthiness(
1098
X: ArrayLike,
1099
X_embedded: ArrayLike,
1100
n_neighbors: int = 5,
1101
metric: str | Callable = "euclidean"
1102
) -> float
1103
```
1104
Indicate to what extent the local structure is retained.
1105
1106
## Mixture Models
1107
1108
#### GaussianMixture { .api }
1109
```python
1110
from sklearn.mixture import GaussianMixture
1111
1112
GaussianMixture(
1113
n_components: int = 1,
1114
covariance_type: str = "full",
1115
tol: float = 0.001,
1116
reg_covar: float = 1e-06,
1117
max_iter: int = 100,
1118
n_init: int = 1,
1119
init_params: str = "kmeans",
1120
weights_init: ArrayLike | None = None,
1121
means_init: ArrayLike | None = None,
1122
precisions_init: ArrayLike | None = None,
1123
random_state: int | RandomState | None = None,
1124
warm_start: bool = False,
1125
verbose: int = 0,
1126
verbose_interval: int = 10
1127
)
1128
```
1129
Gaussian Mixture Model.
1130
1131
#### BayesianGaussianMixture { .api }
1132
```python
1133
from sklearn.mixture import BayesianGaussianMixture
1134
1135
BayesianGaussianMixture(
1136
n_components: int = 1,
1137
covariance_type: str = "full",
1138
tol: float = 0.001,
1139
reg_covar: float = 1e-06,
1140
max_iter: int = 100,
1141
n_init: int = 1,
1142
init_params: str = "kmeans",
1143
weight_concentration_prior_type: str = "dirichlet_process",
1144
weight_concentration_prior: float | None = None,
1145
mean_precision_prior: float | None = None,
1146
mean_prior: ArrayLike | None = None,
1147
degrees_of_freedom_prior: float | None = None,
1148
covariance_prior: float | ArrayLike | None = None,
1149
random_state: int | RandomState | None = None,
1150
warm_start: bool = False,
1151
verbose: int = 0,
1152
verbose_interval: int = 10
1153
)
1154
```
1155
Variational Bayesian estimation of a Gaussian mixture.
1156
1157
## Covariance Estimation
1158
1159
#### EmpiricalCovariance { .api }
1160
```python
1161
from sklearn.covariance import EmpiricalCovariance
1162
1163
EmpiricalCovariance(
1164
store_precision: bool = True,
1165
assume_centered: bool = False
1166
)
1167
```
1168
Maximum likelihood covariance estimator.
1169
1170
#### ShrunkCovariance { .api }
1171
```python
1172
from sklearn.covariance import ShrunkCovariance
1173
1174
ShrunkCovariance(
1175
store_precision: bool = True,
1176
assume_centered: bool = False,
1177
shrinkage: float = 0.1
1178
)
1179
```
1180
Covariance estimator with shrinkage.
1181
1182
#### LedoitWolf { .api }
1183
```python
1184
from sklearn.covariance import LedoitWolf
1185
1186
LedoitWolf(
1187
store_precision: bool = True,
1188
assume_centered: bool = False,
1189
block_size: int = 1000
1190
)
1191
```
1192
LedoitWolf Estimator.
1193
1194
#### OAS { .api }
1195
```python
1196
from sklearn.covariance import OAS
1197
1198
OAS(
1199
store_precision: bool = True,
1200
assume_centered: bool = False
1201
)
1202
```
1203
Oracle Approximating Shrinkage Estimator.
1204
1205
#### MinCovDet { .api }
1206
```python
1207
from sklearn.covariance import MinCovDet
1208
1209
MinCovDet(
1210
store_precision: bool = True,
1211
assume_centered: bool = False,
1212
support_fraction: float | None = None,
1213
random_state: int | RandomState | None = None
1214
)
1215
```
1216
Minimum Covariance Determinant (Robust covariance estimation).
1217
1218
#### GraphicalLasso { .api }
1219
```python
1220
from sklearn.covariance import GraphicalLasso
1221
1222
GraphicalLasso(
1223
alpha: float = 0.01,
1224
mode: str = "cd",
1225
tol: float = 0.0001,
1226
enet_tol: float = 0.0001,
1227
max_iter: int = 100,
1228
verbose: bool = False,
1229
assume_centered: bool = False
1230
)
1231
```
1232
Sparse inverse covariance estimation with an l1-penalized estimator.
1233
1234
#### GraphicalLassoCV { .api }
1235
```python
1236
from sklearn.covariance import GraphicalLassoCV
1237
1238
GraphicalLassoCV(
1239
alphas: int | ArrayLike = 4,
1240
n_refinements: int = 4,
1241
cv: int | BaseCrossValidator | Iterable | None = None,
1242
tol: float = 0.0001,
1243
enet_tol: float = 0.0001,
1244
max_iter: int = 100,
1245
mode: str = "cd",
1246
n_jobs: int | None = None,
1247
verbose: bool = False,
1248
assume_centered: bool = False
1249
)
1250
```
1251
Sparse inverse covariance w/ cross-validated choice of the l1 penalty.
1252
1253
#### EllipticEnvelope { .api }
1254
```python
1255
from sklearn.covariance import EllipticEnvelope
1256
1257
EllipticEnvelope(
1258
store_precision: bool = True,
1259
assume_centered: bool = False,
1260
support_fraction: float | None = None,
1261
contamination: float = 0.1,
1262
random_state: int | RandomState | None = None
1263
)
1264
```
1265
An object for detecting outliers in a Gaussian distributed dataset.
1266
1267
### Covariance Functions
1268
1269
#### empirical_covariance { .api }
1270
```python
1271
from sklearn.covariance import empirical_covariance
1272
1273
empirical_covariance(
1274
X: ArrayLike,
1275
assume_centered: bool = False
1276
) -> ArrayLike
1277
```
1278
Compute the Maximum likelihood covariance estimator.
1279
1280
#### shrunk_covariance { .api }
1281
```python
1282
from sklearn.covariance import shrunk_covariance
1283
1284
shrunk_covariance(
1285
emp_cov: ArrayLike,
1286
shrinkage: float = 0.1
1287
) -> ArrayLike
1288
```
1289
Calculate a covariance matrix shrunk on the diagonal.
1290
1291
#### ledoit_wolf { .api }
1292
```python
1293
from sklearn.covariance import ledoit_wolf
1294
1295
ledoit_wolf(
1296
X: ArrayLike,
1297
assume_centered: bool = False,
1298
block_size: int = 1000
1299
) -> tuple[ArrayLike, float]
1300
```
1301
Estimate covariance with the Ledoit-Wolf estimator.
1302
1303
#### ledoit_wolf_shrinkage { .api }
1304
```python
1305
from sklearn.covariance import ledoit_wolf_shrinkage
1306
1307
ledoit_wolf_shrinkage(
1308
X: ArrayLike,
1309
assume_centered: bool = False,
1310
block_size: int = 1000
1311
) -> float
1312
```
1313
Calculate the Ledoit-Wolf shrinkage coefficient.
1314
1315
#### oas { .api }
1316
```python
1317
from sklearn.covariance import oas
1318
1319
oas(
1320
X: ArrayLike,
1321
assume_centered: bool = False
1322
) -> tuple[ArrayLike, float]
1323
```
1324
Estimate covariance with the Oracle Approximating Shrinkage algorithm.
1325
1326
#### fast_mcd { .api }
1327
```python
1328
from sklearn.covariance import fast_mcd
1329
1330
fast_mcd(
1331
X: ArrayLike,
1332
support_fraction: float | None = None,
1333
cov_computation_method: Callable = ...,
1334
random_state: int | RandomState | None = None
1335
) -> tuple[ArrayLike, ArrayLike, ArrayLike, ArrayLike]
1336
```
1337
Estimates the Minimum Covariance Determinant matrix.
1338
1339
#### graphical_lasso { .api }
1340
```python
1341
from sklearn.covariance import graphical_lasso
1342
1343
graphical_lasso(
1344
emp_cov: ArrayLike,
1345
alpha: float,
1346
cov_init: ArrayLike | None = None,
1347
mode: str = "cd",
1348
tol: float = 0.0001,
1349
enet_tol: float = 0.0001,
1350
max_iter: int = 100,
1351
verbose: bool = False,
1352
return_costs: bool = False,
1353
eps: float = ...,
1354
return_n_iter: bool = False
1355
) -> tuple[ArrayLike, ArrayLike] | tuple[ArrayLike, ArrayLike, list] | tuple[ArrayLike, ArrayLike, int] | tuple[ArrayLike, ArrayLike, list, int]
1356
```
1357
L1-penalized covariance estimator.
1358
1359
#### log_likelihood { .api }
1360
```python
1361
from sklearn.covariance import log_likelihood
1362
1363
log_likelihood(
1364
emp_cov: ArrayLike,
1365
precision: ArrayLike
1366
) -> float
1367
```
1368
Compute the sample mean of the log_likelihood under a covariance model.
1369
1370
## Cross Decomposition
1371
1372
#### CCA { .api }
1373
```python
1374
from sklearn.cross_decomposition import CCA
1375
1376
CCA(
1377
n_components: int = 2,
1378
scale: bool = True,
1379
max_iter: int = 500,
1380
tol: float = 1e-06,
1381
copy: bool = True
1382
)
1383
```
1384
Canonical Correlation Analysis.
1385
1386
#### PLSCanonical { .api }
1387
```python
1388
from sklearn.cross_decomposition import PLSCanonical
1389
1390
PLSCanonical(
1391
n_components: int = 2,
1392
scale: bool = True,
1393
algorithm: str = "nipals",
1394
max_iter: int = 500,
1395
tol: float = 1e-06,
1396
copy: bool = True
1397
)
1398
```
1399
Partial Least Squares transformer and regressor.
1400
1401
#### PLSRegression { .api }
1402
```python
1403
from sklearn.cross_decomposition import PLSRegression
1404
1405
PLSRegression(
1406
n_components: int = 2,
1407
scale: bool = True,
1408
max_iter: int = 500,
1409
tol: float = 1e-06,
1410
copy: bool = True
1411
)
1412
```
1413
PLS regression.
1414
1415
#### PLSSVD { .api }
1416
```python
1417
from sklearn.cross_decomposition import PLSSVD
1418
1419
PLSSVD(
1420
n_components: int = 2,
1421
scale: bool = True,
1422
copy: bool = True
1423
)
1424
```
1425
Partial Least Square SVD.
1426
1427
## Outlier Detection
1428
1429
Outlier detection algorithms are also available in the ensemble module:
1430
1431
#### LocalOutlierFactor { .api }
1432
```python
1433
from sklearn.neighbors import LocalOutlierFactor
1434
1435
LocalOutlierFactor(
1436
n_neighbors: int = 20,
1437
algorithm: str = "auto",
1438
leaf_size: int = 30,
1439
metric: str | Callable = "minkowski",
1440
p: int = 2,
1441
metric_params: dict | None = None,
1442
contamination: float | str = "auto",
1443
novelty: bool = False,
1444
n_jobs: int | None = None
1445
)
1446
```
1447
Unsupervised Outlier Detection using Local Outlier Factor (LOF).
1448
1449
Note: Additional outlier detection methods are available in:
1450
- `sklearn.ensemble.IsolationForest` - Isolation Forest Algorithm
1451
- `sklearn.svm.OneClassSVM` - One-Class Support Vector Machine
1452
- `sklearn.covariance.EllipticEnvelope` - Outlier detection for Gaussian data