or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

datasets.mdfeature-extraction.mdindex.mdmetrics.mdmodel-selection.mdneighbors.mdpipelines.mdpreprocessing.mdsupervised-learning.mdunsupervised-learning.mdutilities.md

utilities.mddocs/

0

# Utilities and Core Functions

1

2

This document covers core utilities, configuration functions, pipelines, composition tools, and other utility functions in scikit-learn.

3

4

## Core Utilities

5

6

### Base Functions

7

8

#### clone { .api }

9

```python

10

from sklearn.base import clone

11

12

clone(

13

estimator: BaseEstimator,

14

safe: bool = True

15

) -> BaseEstimator

16

```

17

Construct a new unfitted estimator with the same parameters.

18

19

### Configuration Functions

20

21

#### get_config { .api }

22

```python

23

from sklearn import get_config

24

25

get_config() -> dict

26

```

27

Retrieve current scikit-learn configuration.

28

29

#### set_config { .api }

30

```python

31

from sklearn import set_config

32

33

set_config(

34

assume_finite: bool | None = None,

35

working_memory: int | None = None,

36

print_changed_only: bool | None = None,

37

display: str | None = None,

38

pairwise_distances_chunk_size: int | None = None,

39

enable_cython_pairwise_dist: bool | None = None,

40

array_api_dispatch: bool | None = None,

41

transform_output: str | None = None,

42

enable_metadata_routing: bool | None = None,

43

skip_parameter_validation: bool | None = None

44

) -> dict

45

```

46

Set global scikit-learn configuration.

47

48

#### config_context { .api }

49

```python

50

from sklearn import config_context

51

52

config_context(**new_config) -> ContextManager

53

```

54

Temporarily change global configuration.

55

56

### Version Information

57

58

#### show_versions { .api }

59

```python

60

from sklearn import show_versions

61

62

show_versions() -> None

63

```

64

Print system and dependency version information.

65

66

#### __version__ { .api }

67

```python

68

import sklearn

69

sklearn.__version__ # "1.7.1"

70

```

71

Current scikit-learn version string.

72

73

## Pipeline

74

75

### Pipeline Classes

76

77

#### Pipeline { .api }

78

```python

79

from sklearn.pipeline import Pipeline

80

81

Pipeline(

82

steps: list[tuple[str, BaseEstimator]],

83

memory: str | object | None = None,

84

verbose: bool = False

85

)

86

```

87

Pipeline of transforms with a final estimator.

88

89

#### FeatureUnion { .api }

90

```python

91

from sklearn.pipeline import FeatureUnion

92

93

FeatureUnion(

94

transformer_list: list[tuple[str, BaseTransformer]],

95

n_jobs: int | None = None,

96

transformer_weights: dict | None = None,

97

verbose: bool = False,

98

verbose_feature_names_out: bool = True

99

)

100

```

101

Concatenates results of multiple transformer objects.

102

103

### Pipeline Functions

104

105

#### make_pipeline { .api }

106

```python

107

from sklearn.pipeline import make_pipeline

108

109

make_pipeline(

110

*steps: BaseEstimator,

111

memory: str | object | None = None,

112

verbose: bool = False

113

) -> Pipeline

114

```

115

Construct a Pipeline from the given estimators.

116

117

#### make_union { .api }

118

```python

119

from sklearn.pipeline import make_union

120

121

make_union(

122

*transformers: BaseTransformer,

123

n_jobs: int | None = None,

124

verbose: bool = False

125

) -> FeatureUnion

126

```

127

Construct a FeatureUnion from the given transformers.

128

129

## Compose

130

131

### Column Transformer

132

133

#### ColumnTransformer { .api }

134

```python

135

from sklearn.compose import ColumnTransformer

136

137

ColumnTransformer(

138

transformers: list[tuple[str, BaseTransformer, ArrayLike | str | Callable]],

139

remainder: str | BaseTransformer = "drop",

140

sparse_threshold: float = 0.3,

141

n_jobs: int | None = None,

142

transformer_weights: dict | None = None,

143

verbose: bool = False,

144

verbose_feature_names_out: bool = True,

145

force_int_remainder_cols: bool = True

146

)

147

```

148

Applies transformers to columns of an array or pandas DataFrame.

149

150

#### TransformedTargetRegressor { .api }

151

```python

152

from sklearn.compose import TransformedTargetRegressor

153

154

TransformedTargetRegressor(

155

regressor: BaseRegressor | None = None,

156

transformer: BaseTransformer | None = None,

157

func: Callable | None = None,

158

inverse_func: Callable | None = None,

159

check_inverse: bool = True

160

)

161

```

162

Meta-estimator to regress on a transformed target.

163

164

### Compose Functions

165

166

#### make_column_transformer { .api }

167

```python

168

from sklearn.compose import make_column_transformer

169

170

make_column_transformer(

171

*transformers: tuple[BaseTransformer, ArrayLike | str | Callable],

172

remainder: str | BaseTransformer = "drop",

173

sparse_threshold: float = 0.3,

174

n_jobs: int | None = None,

175

verbose: bool = False,

176

verbose_feature_names_out: bool = True,

177

force_int_remainder_cols: bool = True

178

) -> ColumnTransformer

179

```

180

Construct a ColumnTransformer from the given transformers.

181

182

#### make_column_selector { .api }

183

```python

184

from sklearn.compose import make_column_selector

185

186

make_column_selector(

187

pattern: str | None = None,

188

dtype_include: type | str | list | None = None,

189

dtype_exclude: type | str | list | None = None

190

) -> Callable

191

```

192

Create a callable to select columns to be used with ColumnTransformer.

193

194

## Inspection

195

196

### Partial Dependence

197

198

#### partial_dependence { .api }

199

```python

200

from sklearn.inspection import partial_dependence

201

202

partial_dependence(

203

estimator: BaseEstimator,

204

X: ArrayLike,

205

features: int | str | ArrayLike | list,

206

response_method: str = "auto",

207

percentiles: tuple[float, float] = (0.05, 0.95),

208

grid_resolution: int = 100,

209

method: str = "auto",

210

kind: str = "average",

211

subsample: int | float | None = 1000,

212

n_jobs: int | None = None,

213

verbose: int = 0,

214

feature_names: ArrayLike | None = None,

215

categorical_features: ArrayLike | None = None

216

) -> dict

217

```

218

Partial dependence of features.

219

220

#### permutation_importance { .api }

221

```python

222

from sklearn.inspection import permutation_importance

223

224

permutation_importance(

225

estimator: BaseEstimator,

226

X: ArrayLike,

227

y: ArrayLike,

228

scoring: str | Callable | list | tuple | dict | None = None,

229

n_repeats: int = 5,

230

n_jobs: int | None = None,

231

random_state: int | RandomState | None = None,

232

sample_weight: ArrayLike | None = None,

233

max_samples: int | float = 1.0

234

) -> dict

235

```

236

Permutation importance for feature evaluation.

237

238

### Display Classes

239

240

#### PartialDependenceDisplay { .api }

241

```python

242

from sklearn.inspection import PartialDependenceDisplay

243

244

PartialDependenceDisplay(

245

pd_results: list[dict],

246

features: list,

247

feature_names: ArrayLike | None = None,

248

target_idx: int | None = None,

249

deciles: dict | None = None

250

)

251

```

252

Partial Dependence Plot (PDP).

253

254

#### DecisionBoundaryDisplay { .api }

255

```python

256

from sklearn.inspection import DecisionBoundaryDisplay

257

258

DecisionBoundaryDisplay(

259

xx0: ArrayLike,

260

xx1: ArrayLike,

261

response: ArrayLike

262

)

263

```

264

Visualization of decision boundaries of a classifier.

265

266

## Isotonic Regression Utilities

267

268

### Isotonic Functions

269

270

#### check_increasing { .api }

271

```python

272

from sklearn.isotonic import check_increasing

273

274

check_increasing(

275

x: ArrayLike,

276

y: ArrayLike

277

) -> bool

278

```

279

Determine whether y is monotonically correlated with x.

280

281

#### isotonic_regression { .api }

282

```python

283

from sklearn.isotonic import isotonic_regression

284

285

isotonic_regression(

286

y: ArrayLike,

287

sample_weight: ArrayLike | None = None,

288

y_min: float | None = None,

289

y_max: float | None = None,

290

increasing: bool = True

291

) -> ArrayLike

292

```

293

Solve the isotonic regression model.

294

295

## Neighbors Utilities

296

297

### Neighbor Functions

298

299

#### kneighbors_graph { .api }

300

```python

301

from sklearn.neighbors import kneighbors_graph

302

303

kneighbors_graph(

304

X: ArrayLike,

305

n_neighbors: int,

306

mode: str = "connectivity",

307

metric: str | Callable = "minkowski",

308

p: int = 2,

309

metric_params: dict | None = None,

310

include_self: bool | str = "auto",

311

n_jobs: int | None = None

312

) -> ArrayLike

313

```

314

Compute the (weighted) graph of k-Neighbors for points in X.

315

316

#### radius_neighbors_graph { .api }

317

```python

318

from sklearn.neighbors import radius_neighbors_graph

319

320

radius_neighbors_graph(

321

X: ArrayLike,

322

radius: float,

323

mode: str = "connectivity",

324

metric: str | Callable = "minkowski",

325

p: int = 2,

326

metric_params: dict | None = None,

327

include_self: bool | str = "auto",

328

n_jobs: int | None = None

329

) -> ArrayLike

330

```

331

Compute the (weighted) graph of Neighbors for points in X.

332

333

#### sort_graph_by_row_values { .api }

334

```python

335

from sklearn.neighbors import sort_graph_by_row_values

336

337

sort_graph_by_row_values(

338

graph: ArrayLike,

339

copy: bool = True,

340

warn_when_not_sorted: bool = True

341

) -> ArrayLike

342

```

343

Sort a sparse graph such that each row has its data sorted by value.

344

345

### Neighbor Data Structures

346

347

#### BallTree { .api }

348

```python

349

from sklearn.neighbors import BallTree

350

351

BallTree(

352

X: ArrayLike,

353

leaf_size: int = 40,

354

metric: str | DistanceMetric = "minkowski",

355

**kwargs

356

)

357

```

358

BallTree for fast generalized N-point problems.

359

360

#### KDTree { .api }

361

```python

362

from sklearn.neighbors import KDTree

363

364

KDTree(

365

X: ArrayLike,

366

leaf_size: int = 40,

367

metric: str = "minkowski",

368

**kwargs

369

)

370

```

371

KDTree for fast generalized N-point problems.

372

373

#### KernelDensity { .api }

374

```python

375

from sklearn.neighbors import KernelDensity

376

377

KernelDensity(

378

bandwidth: float | str = 1.0,

379

algorithm: str = "auto",

380

kernel: str = "gaussian",

381

metric: str = "euclidean",

382

atol: float = 0,

383

rtol: float = 0,

384

breadth_first: bool = True,

385

leaf_size: int = 40,

386

metric_params: dict | None = None

387

)

388

```

389

Kernel Density Estimation.

390

391

#### NearestNeighbors { .api }

392

```python

393

from sklearn.neighbors import NearestNeighbors

394

395

NearestNeighbors(

396

n_neighbors: int = 5,

397

radius: float = 1.0,

398

algorithm: str = "auto",

399

leaf_size: int = 30,

400

metric: str | Callable = "minkowski",

401

p: int = 2,

402

metric_params: dict | None = None,

403

n_jobs: int | None = None

404

)

405

```

406

Unsupervised learner for implementing neighbor searches.

407

408

#### KNeighborsTransformer { .api }

409

```python

410

from sklearn.neighbors import KNeighborsTransformer

411

412

KNeighborsTransformer(

413

mode: str = "distance",

414

n_neighbors: int = 5,

415

algorithm: str = "auto",

416

leaf_size: int = 30,

417

metric: str | Callable = "minkowski",

418

p: int = 2,

419

metric_params: dict | None = None,

420

n_jobs: int | None = None

421

)

422

```

423

Transform X into a (weighted) graph of k nearest neighbors.

424

425

#### RadiusNeighborsTransformer { .api }

426

```python

427

from sklearn.neighbors import RadiusNeighborsTransformer

428

429

RadiusNeighborsTransformer(

430

mode: str = "distance",

431

radius: float = 1.0,

432

algorithm: str = "auto",

433

leaf_size: int = 30,

434

metric: str | Callable = "minkowski",

435

p: int = 2,

436

metric_params: dict | None = None,

437

n_jobs: int | None = None

438

)

439

```

440

Transform X into a (weighted) graph of neighbors nearer than a radius.

441

442

#### NeighborhoodComponentsAnalysis { .api }

443

```python

444

from sklearn.neighbors import NeighborhoodComponentsAnalysis

445

446

NeighborhoodComponentsAnalysis(

447

n_components: int | None = None,

448

init: str | ArrayLike = "auto",

449

warm_start: bool = False,

450

max_iter: int = 50,

451

tol: float = 1e-05,

452

callback: Callable | None = None,

453

verbose: int = 0,

454

random_state: int | RandomState | None = None

455

)

456

```

457

Neighborhood Components Analysis.

458

459

### Neighbor Constants

460

461

#### VALID_METRICS { .api }

462

```python

463

from sklearn.neighbors import VALID_METRICS

464

465

# Dictionary mapping algorithm names to valid metrics

466

VALID_METRICS: dict[str, list[str]]

467

```

468

Valid metrics for neighbor algorithms.

469

470

#### VALID_METRICS_SPARSE { .api }

471

```python

472

from sklearn.neighbors import VALID_METRICS_SPARSE

473

474

# Dictionary mapping algorithm names to valid metrics for sparse matrices

475

VALID_METRICS_SPARSE: dict[str, list[str]]

476

```

477

Valid metrics for neighbor algorithms with sparse matrices.

478

479

## Exception Classes

480

481

#### NotFittedError { .api }

482

```python

483

from sklearn.exceptions import NotFittedError

484

485

class NotFittedError(ValueError, AttributeError):

486

"""Exception class to raise if estimator is used before fitting."""

487

pass

488

```

489

Exception class to raise if estimator is used before fitting.

490

491

#### ConvergenceWarning { .api }

492

```python

493

from sklearn.exceptions import ConvergenceWarning

494

495

class ConvergenceWarning(UserWarning):

496

"""Custom warning to capture convergence problems."""

497

pass

498

```

499

Custom warning to capture convergence problems.

500

501

#### DataConversionWarning { .api }

502

```python

503

from sklearn.exceptions import DataConversionWarning

504

505

class DataConversionWarning(UserWarning):

506

"""Warning used to notify implicit data conversions happening in the code."""

507

pass

508

```

509

Warning used to notify implicit data conversions happening in the code.

510

511

#### DataDimensionalityWarning { .api }

512

```python

513

from sklearn.exceptions import DataDimensionalityWarning

514

515

class DataDimensionalityWarning(UserWarning):

516

"""Custom warning to capture data dimensionality problems."""

517

pass

518

```

519

Custom warning to capture data dimensionality problems.

520

521

#### EfficiencyWarning { .api }

522

```python

523

from sklearn.exceptions import EfficiencyWarning

524

525

class EfficiencyWarning(UserWarning):

526

"""Warning used to notify the user of inefficient computation."""

527

pass

528

```

529

Warning used to notify the user of inefficient computation.

530

531

#### EstimatorCheckFailedWarning { .api }

532

```python

533

from sklearn.exceptions import EstimatorCheckFailedWarning

534

535

class EstimatorCheckFailedWarning(UserWarning):

536

"""Warning used when an estimator check fails."""

537

pass

538

```

539

Warning used when an estimator check fails.

540

541

#### FitFailedWarning { .api }

542

```python

543

from sklearn.exceptions import FitFailedWarning

544

545

class FitFailedWarning(RuntimeWarning):

546

"""Warning class used if there is an error while fitting the estimator."""

547

pass

548

```

549

Warning class used if there is an error while fitting the estimator.

550

551

#### PositiveSpectrumWarning { .api }

552

```python

553

from sklearn.exceptions import PositiveSpectrumWarning

554

555

class PositiveSpectrumWarning(UserWarning):

556

"""Warning raised when the eigenvalues of a PSD matrix have issues."""

557

pass

558

```

559

Warning raised when the eigenvalues of a PSD matrix have issues.

560

561

#### SkipTestWarning { .api }

562

```python

563

from sklearn.exceptions import SkipTestWarning

564

565

class SkipTestWarning(UserWarning):

566

"""Warning class used to notify the user of a test that was skipped."""

567

pass

568

```

569

Warning class used to notify the user of a test that was skipped.

570

571

#### UndefinedMetricWarning { .api }

572

```python

573

from sklearn.exceptions import UndefinedMetricWarning

574

575

class UndefinedMetricWarning(UserWarning):

576

"""Warning used when the metric is invalid."""

577

pass

578

```

579

Warning used when the metric is invalid.

580

581

#### UnsetMetadataPassedError { .api }

582

```python

583

from sklearn.exceptions import UnsetMetadataPassedError

584

585

class UnsetMetadataPassedError(ValueError):

586

"""Exception when metadata is passed which is not explicitly requested."""

587

pass

588

```

589

Exception when metadata is passed which is not explicitly requested.

590

591

## Frozen Estimators

592

593

#### FrozenEstimator { .api }

594

```python

595

from sklearn.frozen import FrozenEstimator

596

597

FrozenEstimator(

598

estimator: BaseEstimator

599

)

600

```

601

Wrapper to freeze an estimator and use it as a transformer.

602

603

## Examples

604

605

### Basic Pipeline Example

606

607

```python

608

from sklearn.pipeline import Pipeline, make_pipeline

609

from sklearn.preprocessing import StandardScaler

610

from sklearn.linear_model import LogisticRegression

611

from sklearn.datasets import load_iris

612

613

# Load data

614

X, y = load_iris(return_X_y=True)

615

616

# Method 1: Using Pipeline class

617

pipeline = Pipeline([

618

('scaler', StandardScaler()),

619

('classifier', LogisticRegression())

620

])

621

622

# Method 2: Using make_pipeline function

623

pipeline = make_pipeline(

624

StandardScaler(),

625

LogisticRegression()

626

)

627

628

# Fit and predict

629

pipeline.fit(X, y)

630

predictions = pipeline.predict(X)

631

```

632

633

### Column Transformer Example

634

635

```python

636

from sklearn.compose import ColumnTransformer, make_column_transformer

637

from sklearn.preprocessing import StandardScaler, OneHotEncoder

638

import pandas as pd

639

640

# Example with mixed data types

641

data = pd.DataFrame({

642

'age': [25, 30, 35],

643

'income': [50000, 60000, 70000],

644

'city': ['NYC', 'LA', 'Chicago'],

645

'gender': ['M', 'F', 'M']

646

})

647

648

# Method 1: Using ColumnTransformer class

649

preprocessor = ColumnTransformer([

650

('num', StandardScaler(), ['age', 'income']),

651

('cat', OneHotEncoder(), ['city', 'gender'])

652

])

653

654

# Method 2: Using make_column_transformer function

655

preprocessor = make_column_transformer(

656

(StandardScaler(), ['age', 'income']),

657

(OneHotEncoder(), ['city', 'gender'])

658

)

659

660

# Transform data

661

transformed = preprocessor.fit_transform(data)

662

```

663

664

### Feature Union Example

665

666

```python

667

from sklearn.pipeline import FeatureUnion, make_union

668

from sklearn.decomposition import PCA

669

from sklearn.feature_selection import SelectKBest

670

671

# Combine PCA and feature selection

672

feature_union = FeatureUnion([

673

('pca', PCA(n_components=2)),

674

('select_k_best', SelectKBest(k=2))

675

])

676

677

# Or using make_union

678

feature_union = make_union(

679

PCA(n_components=2),

680

SelectKBest(k=2)

681

)

682

683

# Transform features

684

X_combined = feature_union.fit_transform(X, y)

685

```

686

687

### Configuration Example

688

689

```python

690

from sklearn import set_config, get_config, config_context

691

from sklearn.linear_model import LinearRegression

692

693

# Get current config

694

current_config = get_config()

695

print(current_config)

696

697

# Set global configuration

698

set_config(display='diagram', print_changed_only=True)

699

700

# Use configuration context

701

with config_context(assume_finite=True):

702

# Operations within this block use assume_finite=True

703

model = LinearRegression()

704

model.fit(X, y)

705

706

# Configuration reverts to previous state outside the context

707

```

708

709

### Partial Dependence Example

710

711

```python

712

from sklearn.inspection import partial_dependence, PartialDependenceDisplay

713

from sklearn.ensemble import RandomForestRegressor

714

import matplotlib.pyplot as plt

715

716

# Train model

717

model = RandomForestRegressor(n_estimators=100, random_state=42)

718

model.fit(X, y)

719

720

# Compute partial dependence

721

pd_result = partial_dependence(

722

model, X, features=[0, 1],

723

grid_resolution=20

724

)

725

726

# Create display

727

display = PartialDependenceDisplay.from_estimator(

728

model, X, features=[0, 1]

729

)

730

display.plot()

731

plt.show()

732

```

733

734

### Permutation Importance Example

735

736

```python

737

from sklearn.inspection import permutation_importance

738

739

# Calculate permutation importance

740

result = permutation_importance(

741

model, X, y, n_repeats=10, random_state=42

742

)

743

744

# Get importance scores

745

importance_scores = result.importances_mean

746

importance_std = result.importances_std

747

748

# Print results

749

for i, (score, std) in enumerate(zip(importance_scores, importance_std)):

750

print(f"Feature {i}: {score:.3f} +/- {std:.3f}")

751

```