0
# Utilities and Core Functions
1
2
This document covers core utilities, configuration functions, pipelines, composition tools, and other utility functions in scikit-learn.
3
4
## Core Utilities
5
6
### Base Functions
7
8
#### clone { .api }
9
```python
10
from sklearn.base import clone
11
12
clone(
13
estimator: BaseEstimator,
14
safe: bool = True
15
) -> BaseEstimator
16
```
17
Construct a new unfitted estimator with the same parameters.
18
19
### Configuration Functions
20
21
#### get_config { .api }
22
```python
23
from sklearn import get_config
24
25
get_config() -> dict
26
```
27
Retrieve current scikit-learn configuration.
28
29
#### set_config { .api }
30
```python
31
from sklearn import set_config
32
33
set_config(
34
assume_finite: bool | None = None,
35
working_memory: int | None = None,
36
print_changed_only: bool | None = None,
37
display: str | None = None,
38
pairwise_distances_chunk_size: int | None = None,
39
enable_cython_pairwise_dist: bool | None = None,
40
array_api_dispatch: bool | None = None,
41
transform_output: str | None = None,
42
enable_metadata_routing: bool | None = None,
43
skip_parameter_validation: bool | None = None
44
) -> dict
45
```
46
Set global scikit-learn configuration.
47
48
#### config_context { .api }
49
```python
50
from sklearn import config_context
51
52
config_context(**new_config) -> ContextManager
53
```
54
Temporarily change global configuration.
55
56
### Version Information
57
58
#### show_versions { .api }
59
```python
60
from sklearn import show_versions
61
62
show_versions() -> None
63
```
64
Print system and dependency version information.
65
66
#### __version__ { .api }
67
```python
68
import sklearn
69
sklearn.__version__ # "1.7.1"
70
```
71
Current scikit-learn version string.
72
73
## Pipeline
74
75
### Pipeline Classes
76
77
#### Pipeline { .api }
78
```python
79
from sklearn.pipeline import Pipeline
80
81
Pipeline(
82
steps: list[tuple[str, BaseEstimator]],
83
memory: str | object | None = None,
84
verbose: bool = False
85
)
86
```
87
Pipeline of transforms with a final estimator.
88
89
#### FeatureUnion { .api }
90
```python
91
from sklearn.pipeline import FeatureUnion
92
93
FeatureUnion(
94
transformer_list: list[tuple[str, BaseTransformer]],
95
n_jobs: int | None = None,
96
transformer_weights: dict | None = None,
97
verbose: bool = False,
98
verbose_feature_names_out: bool = True
99
)
100
```
101
Concatenates results of multiple transformer objects.
102
103
### Pipeline Functions
104
105
#### make_pipeline { .api }
106
```python
107
from sklearn.pipeline import make_pipeline
108
109
make_pipeline(
110
*steps: BaseEstimator,
111
memory: str | object | None = None,
112
verbose: bool = False
113
) -> Pipeline
114
```
115
Construct a Pipeline from the given estimators.
116
117
#### make_union { .api }
118
```python
119
from sklearn.pipeline import make_union
120
121
make_union(
122
*transformers: BaseTransformer,
123
n_jobs: int | None = None,
124
verbose: bool = False
125
) -> FeatureUnion
126
```
127
Construct a FeatureUnion from the given transformers.
128
129
## Compose
130
131
### Column Transformer
132
133
#### ColumnTransformer { .api }
134
```python
135
from sklearn.compose import ColumnTransformer
136
137
ColumnTransformer(
138
transformers: list[tuple[str, BaseTransformer, ArrayLike | str | Callable]],
139
remainder: str | BaseTransformer = "drop",
140
sparse_threshold: float = 0.3,
141
n_jobs: int | None = None,
142
transformer_weights: dict | None = None,
143
verbose: bool = False,
144
verbose_feature_names_out: bool = True,
145
force_int_remainder_cols: bool = True
146
)
147
```
148
Applies transformers to columns of an array or pandas DataFrame.
149
150
#### TransformedTargetRegressor { .api }
151
```python
152
from sklearn.compose import TransformedTargetRegressor
153
154
TransformedTargetRegressor(
155
regressor: BaseRegressor | None = None,
156
transformer: BaseTransformer | None = None,
157
func: Callable | None = None,
158
inverse_func: Callable | None = None,
159
check_inverse: bool = True
160
)
161
```
162
Meta-estimator to regress on a transformed target.
163
164
### Compose Functions
165
166
#### make_column_transformer { .api }
167
```python
168
from sklearn.compose import make_column_transformer
169
170
make_column_transformer(
171
*transformers: tuple[BaseTransformer, ArrayLike | str | Callable],
172
remainder: str | BaseTransformer = "drop",
173
sparse_threshold: float = 0.3,
174
n_jobs: int | None = None,
175
verbose: bool = False,
176
verbose_feature_names_out: bool = True,
177
force_int_remainder_cols: bool = True
178
) -> ColumnTransformer
179
```
180
Construct a ColumnTransformer from the given transformers.
181
182
#### make_column_selector { .api }
183
```python
184
from sklearn.compose import make_column_selector
185
186
make_column_selector(
187
pattern: str | None = None,
188
dtype_include: type | str | list | None = None,
189
dtype_exclude: type | str | list | None = None
190
) -> Callable
191
```
192
Create a callable to select columns to be used with ColumnTransformer.
193
194
## Inspection
195
196
### Partial Dependence
197
198
#### partial_dependence { .api }
199
```python
200
from sklearn.inspection import partial_dependence
201
202
partial_dependence(
203
estimator: BaseEstimator,
204
X: ArrayLike,
205
features: int | str | ArrayLike | list,
206
response_method: str = "auto",
207
percentiles: tuple[float, float] = (0.05, 0.95),
208
grid_resolution: int = 100,
209
method: str = "auto",
210
kind: str = "average",
211
subsample: int | float | None = 1000,
212
n_jobs: int | None = None,
213
verbose: int = 0,
214
feature_names: ArrayLike | None = None,
215
categorical_features: ArrayLike | None = None
216
) -> dict
217
```
218
Partial dependence of features.
219
220
#### permutation_importance { .api }
221
```python
222
from sklearn.inspection import permutation_importance
223
224
permutation_importance(
225
estimator: BaseEstimator,
226
X: ArrayLike,
227
y: ArrayLike,
228
scoring: str | Callable | list | tuple | dict | None = None,
229
n_repeats: int = 5,
230
n_jobs: int | None = None,
231
random_state: int | RandomState | None = None,
232
sample_weight: ArrayLike | None = None,
233
max_samples: int | float = 1.0
234
) -> dict
235
```
236
Permutation importance for feature evaluation.
237
238
### Display Classes
239
240
#### PartialDependenceDisplay { .api }
241
```python
242
from sklearn.inspection import PartialDependenceDisplay
243
244
PartialDependenceDisplay(
245
pd_results: list[dict],
246
features: list,
247
feature_names: ArrayLike | None = None,
248
target_idx: int | None = None,
249
deciles: dict | None = None
250
)
251
```
252
Partial Dependence Plot (PDP).
253
254
#### DecisionBoundaryDisplay { .api }
255
```python
256
from sklearn.inspection import DecisionBoundaryDisplay
257
258
DecisionBoundaryDisplay(
259
xx0: ArrayLike,
260
xx1: ArrayLike,
261
response: ArrayLike
262
)
263
```
264
Visualization of decision boundaries of a classifier.
265
266
## Isotonic Regression Utilities
267
268
### Isotonic Functions
269
270
#### check_increasing { .api }
271
```python
272
from sklearn.isotonic import check_increasing
273
274
check_increasing(
275
x: ArrayLike,
276
y: ArrayLike
277
) -> bool
278
```
279
Determine whether y is monotonically correlated with x.
280
281
#### isotonic_regression { .api }
282
```python
283
from sklearn.isotonic import isotonic_regression
284
285
isotonic_regression(
286
y: ArrayLike,
287
sample_weight: ArrayLike | None = None,
288
y_min: float | None = None,
289
y_max: float | None = None,
290
increasing: bool = True
291
) -> ArrayLike
292
```
293
Solve the isotonic regression model.
294
295
## Neighbors Utilities
296
297
### Neighbor Functions
298
299
#### kneighbors_graph { .api }
300
```python
301
from sklearn.neighbors import kneighbors_graph
302
303
kneighbors_graph(
304
X: ArrayLike,
305
n_neighbors: int,
306
mode: str = "connectivity",
307
metric: str | Callable = "minkowski",
308
p: int = 2,
309
metric_params: dict | None = None,
310
include_self: bool | str = "auto",
311
n_jobs: int | None = None
312
) -> ArrayLike
313
```
314
Compute the (weighted) graph of k-Neighbors for points in X.
315
316
#### radius_neighbors_graph { .api }
317
```python
318
from sklearn.neighbors import radius_neighbors_graph
319
320
radius_neighbors_graph(
321
X: ArrayLike,
322
radius: float,
323
mode: str = "connectivity",
324
metric: str | Callable = "minkowski",
325
p: int = 2,
326
metric_params: dict | None = None,
327
include_self: bool | str = "auto",
328
n_jobs: int | None = None
329
) -> ArrayLike
330
```
331
Compute the (weighted) graph of Neighbors for points in X.
332
333
#### sort_graph_by_row_values { .api }
334
```python
335
from sklearn.neighbors import sort_graph_by_row_values
336
337
sort_graph_by_row_values(
338
graph: ArrayLike,
339
copy: bool = True,
340
warn_when_not_sorted: bool = True
341
) -> ArrayLike
342
```
343
Sort a sparse graph such that each row has its data sorted by value.
344
345
### Neighbor Data Structures
346
347
#### BallTree { .api }
348
```python
349
from sklearn.neighbors import BallTree
350
351
BallTree(
352
X: ArrayLike,
353
leaf_size: int = 40,
354
metric: str | DistanceMetric = "minkowski",
355
**kwargs
356
)
357
```
358
BallTree for fast generalized N-point problems.
359
360
#### KDTree { .api }
361
```python
362
from sklearn.neighbors import KDTree
363
364
KDTree(
365
X: ArrayLike,
366
leaf_size: int = 40,
367
metric: str = "minkowski",
368
**kwargs
369
)
370
```
371
KDTree for fast generalized N-point problems.
372
373
#### KernelDensity { .api }
374
```python
375
from sklearn.neighbors import KernelDensity
376
377
KernelDensity(
378
bandwidth: float | str = 1.0,
379
algorithm: str = "auto",
380
kernel: str = "gaussian",
381
metric: str = "euclidean",
382
atol: float = 0,
383
rtol: float = 0,
384
breadth_first: bool = True,
385
leaf_size: int = 40,
386
metric_params: dict | None = None
387
)
388
```
389
Kernel Density Estimation.
390
391
#### NearestNeighbors { .api }
392
```python
393
from sklearn.neighbors import NearestNeighbors
394
395
NearestNeighbors(
396
n_neighbors: int = 5,
397
radius: float = 1.0,
398
algorithm: str = "auto",
399
leaf_size: int = 30,
400
metric: str | Callable = "minkowski",
401
p: int = 2,
402
metric_params: dict | None = None,
403
n_jobs: int | None = None
404
)
405
```
406
Unsupervised learner for implementing neighbor searches.
407
408
#### KNeighborsTransformer { .api }
409
```python
410
from sklearn.neighbors import KNeighborsTransformer
411
412
KNeighborsTransformer(
413
mode: str = "distance",
414
n_neighbors: int = 5,
415
algorithm: str = "auto",
416
leaf_size: int = 30,
417
metric: str | Callable = "minkowski",
418
p: int = 2,
419
metric_params: dict | None = None,
420
n_jobs: int | None = None
421
)
422
```
423
Transform X into a (weighted) graph of k nearest neighbors.
424
425
#### RadiusNeighborsTransformer { .api }
426
```python
427
from sklearn.neighbors import RadiusNeighborsTransformer
428
429
RadiusNeighborsTransformer(
430
mode: str = "distance",
431
radius: float = 1.0,
432
algorithm: str = "auto",
433
leaf_size: int = 30,
434
metric: str | Callable = "minkowski",
435
p: int = 2,
436
metric_params: dict | None = None,
437
n_jobs: int | None = None
438
)
439
```
440
Transform X into a (weighted) graph of neighbors nearer than a radius.
441
442
#### NeighborhoodComponentsAnalysis { .api }
443
```python
444
from sklearn.neighbors import NeighborhoodComponentsAnalysis
445
446
NeighborhoodComponentsAnalysis(
447
n_components: int | None = None,
448
init: str | ArrayLike = "auto",
449
warm_start: bool = False,
450
max_iter: int = 50,
451
tol: float = 1e-05,
452
callback: Callable | None = None,
453
verbose: int = 0,
454
random_state: int | RandomState | None = None
455
)
456
```
457
Neighborhood Components Analysis.
458
459
### Neighbor Constants
460
461
#### VALID_METRICS { .api }
462
```python
463
from sklearn.neighbors import VALID_METRICS
464
465
# Dictionary mapping algorithm names to valid metrics
466
VALID_METRICS: dict[str, list[str]]
467
```
468
Valid metrics for neighbor algorithms.
469
470
#### VALID_METRICS_SPARSE { .api }
471
```python
472
from sklearn.neighbors import VALID_METRICS_SPARSE
473
474
# Dictionary mapping algorithm names to valid metrics for sparse matrices
475
VALID_METRICS_SPARSE: dict[str, list[str]]
476
```
477
Valid metrics for neighbor algorithms with sparse matrices.
478
479
## Exception Classes
480
481
#### NotFittedError { .api }
482
```python
483
from sklearn.exceptions import NotFittedError
484
485
class NotFittedError(ValueError, AttributeError):
486
"""Exception class to raise if estimator is used before fitting."""
487
pass
488
```
489
Exception class to raise if estimator is used before fitting.
490
491
#### ConvergenceWarning { .api }
492
```python
493
from sklearn.exceptions import ConvergenceWarning
494
495
class ConvergenceWarning(UserWarning):
496
"""Custom warning to capture convergence problems."""
497
pass
498
```
499
Custom warning to capture convergence problems.
500
501
#### DataConversionWarning { .api }
502
```python
503
from sklearn.exceptions import DataConversionWarning
504
505
class DataConversionWarning(UserWarning):
506
"""Warning used to notify implicit data conversions happening in the code."""
507
pass
508
```
509
Warning used to notify implicit data conversions happening in the code.
510
511
#### DataDimensionalityWarning { .api }
512
```python
513
from sklearn.exceptions import DataDimensionalityWarning
514
515
class DataDimensionalityWarning(UserWarning):
516
"""Custom warning to capture data dimensionality problems."""
517
pass
518
```
519
Custom warning to capture data dimensionality problems.
520
521
#### EfficiencyWarning { .api }
522
```python
523
from sklearn.exceptions import EfficiencyWarning
524
525
class EfficiencyWarning(UserWarning):
526
"""Warning used to notify the user of inefficient computation."""
527
pass
528
```
529
Warning used to notify the user of inefficient computation.
530
531
#### EstimatorCheckFailedWarning { .api }
532
```python
533
from sklearn.exceptions import EstimatorCheckFailedWarning
534
535
class EstimatorCheckFailedWarning(UserWarning):
536
"""Warning used when an estimator check fails."""
537
pass
538
```
539
Warning used when an estimator check fails.
540
541
#### FitFailedWarning { .api }
542
```python
543
from sklearn.exceptions import FitFailedWarning
544
545
class FitFailedWarning(RuntimeWarning):
546
"""Warning class used if there is an error while fitting the estimator."""
547
pass
548
```
549
Warning class used if there is an error while fitting the estimator.
550
551
#### PositiveSpectrumWarning { .api }
552
```python
553
from sklearn.exceptions import PositiveSpectrumWarning
554
555
class PositiveSpectrumWarning(UserWarning):
556
"""Warning raised when the eigenvalues of a PSD matrix have issues."""
557
pass
558
```
559
Warning raised when the eigenvalues of a PSD matrix have issues.
560
561
#### SkipTestWarning { .api }
562
```python
563
from sklearn.exceptions import SkipTestWarning
564
565
class SkipTestWarning(UserWarning):
566
"""Warning class used to notify the user of a test that was skipped."""
567
pass
568
```
569
Warning class used to notify the user of a test that was skipped.
570
571
#### UndefinedMetricWarning { .api }
572
```python
573
from sklearn.exceptions import UndefinedMetricWarning
574
575
class UndefinedMetricWarning(UserWarning):
576
"""Warning used when the metric is invalid."""
577
pass
578
```
579
Warning used when the metric is invalid.
580
581
#### UnsetMetadataPassedError { .api }
582
```python
583
from sklearn.exceptions import UnsetMetadataPassedError
584
585
class UnsetMetadataPassedError(ValueError):
586
"""Exception when metadata is passed which is not explicitly requested."""
587
pass
588
```
589
Exception when metadata is passed which is not explicitly requested.
590
591
## Frozen Estimators
592
593
#### FrozenEstimator { .api }
594
```python
595
from sklearn.frozen import FrozenEstimator
596
597
FrozenEstimator(
598
estimator: BaseEstimator
599
)
600
```
601
Wrapper to freeze an estimator and use it as a transformer.
602
603
## Examples
604
605
### Basic Pipeline Example
606
607
```python
608
from sklearn.pipeline import Pipeline, make_pipeline
609
from sklearn.preprocessing import StandardScaler
610
from sklearn.linear_model import LogisticRegression
611
from sklearn.datasets import load_iris
612
613
# Load data
614
X, y = load_iris(return_X_y=True)
615
616
# Method 1: Using Pipeline class
617
pipeline = Pipeline([
618
('scaler', StandardScaler()),
619
('classifier', LogisticRegression())
620
])
621
622
# Method 2: Using make_pipeline function
623
pipeline = make_pipeline(
624
StandardScaler(),
625
LogisticRegression()
626
)
627
628
# Fit and predict
629
pipeline.fit(X, y)
630
predictions = pipeline.predict(X)
631
```
632
633
### Column Transformer Example
634
635
```python
636
from sklearn.compose import ColumnTransformer, make_column_transformer
637
from sklearn.preprocessing import StandardScaler, OneHotEncoder
638
import pandas as pd
639
640
# Example with mixed data types
641
data = pd.DataFrame({
642
'age': [25, 30, 35],
643
'income': [50000, 60000, 70000],
644
'city': ['NYC', 'LA', 'Chicago'],
645
'gender': ['M', 'F', 'M']
646
})
647
648
# Method 1: Using ColumnTransformer class
649
preprocessor = ColumnTransformer([
650
('num', StandardScaler(), ['age', 'income']),
651
('cat', OneHotEncoder(), ['city', 'gender'])
652
])
653
654
# Method 2: Using make_column_transformer function
655
preprocessor = make_column_transformer(
656
(StandardScaler(), ['age', 'income']),
657
(OneHotEncoder(), ['city', 'gender'])
658
)
659
660
# Transform data
661
transformed = preprocessor.fit_transform(data)
662
```
663
664
### Feature Union Example
665
666
```python
667
from sklearn.pipeline import FeatureUnion, make_union
668
from sklearn.decomposition import PCA
669
from sklearn.feature_selection import SelectKBest
670
671
# Combine PCA and feature selection
672
feature_union = FeatureUnion([
673
('pca', PCA(n_components=2)),
674
('select_k_best', SelectKBest(k=2))
675
])
676
677
# Or using make_union
678
feature_union = make_union(
679
PCA(n_components=2),
680
SelectKBest(k=2)
681
)
682
683
# Transform features
684
X_combined = feature_union.fit_transform(X, y)
685
```
686
687
### Configuration Example
688
689
```python
690
from sklearn import set_config, get_config, config_context
691
from sklearn.linear_model import LinearRegression
692
693
# Get current config
694
current_config = get_config()
695
print(current_config)
696
697
# Set global configuration
698
set_config(display='diagram', print_changed_only=True)
699
700
# Use configuration context
701
with config_context(assume_finite=True):
702
# Operations within this block use assume_finite=True
703
model = LinearRegression()
704
model.fit(X, y)
705
706
# Configuration reverts to previous state outside the context
707
```
708
709
### Partial Dependence Example
710
711
```python
712
from sklearn.inspection import partial_dependence, PartialDependenceDisplay
713
from sklearn.ensemble import RandomForestRegressor
714
import matplotlib.pyplot as plt
715
716
# Train model
717
model = RandomForestRegressor(n_estimators=100, random_state=42)
718
model.fit(X, y)
719
720
# Compute partial dependence
721
pd_result = partial_dependence(
722
model, X, features=[0, 1],
723
grid_resolution=20
724
)
725
726
# Create display
727
display = PartialDependenceDisplay.from_estimator(
728
model, X, features=[0, 1]
729
)
730
display.plot()
731
plt.show()
732
```
733
734
### Permutation Importance Example
735
736
```python
737
from sklearn.inspection import permutation_importance
738
739
# Calculate permutation importance
740
result = permutation_importance(
741
model, X, y, n_repeats=10, random_state=42
742
)
743
744
# Get importance scores
745
importance_scores = result.importances_mean
746
importance_std = result.importances_std
747
748
# Print results
749
for i, (score, std) in enumerate(zip(importance_scores, importance_std)):
750
print(f"Feature {i}: {score:.3f} +/- {std:.3f}")
751
```