or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

datasets.mdfeature-extraction.mdindex.mdmetrics.mdmodel-selection.mdneighbors.mdpipelines.mdpreprocessing.mdsupervised-learning.mdunsupervised-learning.mdutilities.md

preprocessing.mddocs/

0

# Data Preprocessing and Feature Engineering

1

2

This document covers all data preprocessing, feature engineering, and feature selection capabilities in scikit-learn.

3

4

## Scaling and Normalization

5

6

#### StandardScaler { .api }

7

```python

8

from sklearn.preprocessing import StandardScaler

9

10

StandardScaler(

11

copy: bool = True,

12

with_mean: bool = True,

13

with_std: bool = True

14

)

15

```

16

Standardize features by removing the mean and scaling to unit variance.

17

18

#### MinMaxScaler { .api }

19

```python

20

from sklearn.preprocessing import MinMaxScaler

21

22

MinMaxScaler(

23

feature_range: tuple[float, float] = (0, 1),

24

copy: bool = True,

25

clip: bool = False

26

)

27

```

28

Transform features by scaling each feature to a given range.

29

30

#### MaxAbsScaler { .api }

31

```python

32

from sklearn.preprocessing import MaxAbsScaler

33

34

MaxAbsScaler(

35

copy: bool = True

36

)

37

```

38

Scale each feature by its maximum absolute value.

39

40

#### RobustScaler { .api }

41

```python

42

from sklearn.preprocessing import RobustScaler

43

44

RobustScaler(

45

quantile_range: tuple[float, float] = (25.0, 75.0),

46

copy: bool = True,

47

unit_variance: bool = False

48

)

49

```

50

Scale features using statistics that are robust to outliers.

51

52

#### Normalizer { .api }

53

```python

54

from sklearn.preprocessing import Normalizer

55

56

Normalizer(

57

norm: str = "l2",

58

copy: bool = True

59

)

60

```

61

Normalize samples individually to unit norm.

62

63

#### QuantileTransformer { .api }

64

```python

65

from sklearn.preprocessing import QuantileTransformer

66

67

QuantileTransformer(

68

n_quantiles: int = 1000,

69

output_distribution: str = "uniform",

70

ignore_implicit_zeros: bool = False,

71

subsample: int = 100000,

72

random_state: int | RandomState | None = None,

73

copy: bool = True

74

)

75

```

76

Transform features to follow a uniform or a normal distribution.

77

78

#### PowerTransformer { .api }

79

```python

80

from sklearn.preprocessing import PowerTransformer

81

82

PowerTransformer(

83

method: str = "yeo-johnson",

84

standardize: bool = True,

85

copy: bool = True

86

)

87

```

88

Apply a power transform featurewise to make data more Gaussian-like.

89

90

## Encoding

91

92

#### LabelEncoder { .api }

93

```python

94

from sklearn.preprocessing import LabelEncoder

95

96

LabelEncoder()

97

```

98

Encode target labels with value between 0 and n_classes-1.

99

100

#### LabelBinarizer { .api }

101

```python

102

from sklearn.preprocessing import LabelBinarizer

103

104

LabelBinarizer(

105

neg_label: int = 0,

106

pos_label: int = 1,

107

sparse_output: bool = False

108

)

109

```

110

Binarize labels in a one-vs-all fashion.

111

112

#### MultiLabelBinarizer { .api }

113

```python

114

from sklearn.preprocessing import MultiLabelBinarizer

115

116

MultiLabelBinarizer(

117

classes: ArrayLike | None = None,

118

sparse_output: bool = False

119

)

120

```

121

Transform between iterable of iterables and a multilabel format.

122

123

#### OneHotEncoder { .api }

124

```python

125

from sklearn.preprocessing import OneHotEncoder

126

127

OneHotEncoder(

128

categories: str | list[ArrayLike] = "auto",

129

drop: str | ArrayLike | None = None,

130

sparse_output: bool = True,

131

dtype: type = ...,

132

handle_unknown: str = "error",

133

min_frequency: int | float | None = None,

134

max_categories: int | None = None,

135

feature_name_combiner: str | Callable = "concat"

136

)

137

```

138

Encode categorical features as a one-hot numeric array.

139

140

#### OrdinalEncoder { .api }

141

```python

142

from sklearn.preprocessing import OrdinalEncoder

143

144

OrdinalEncoder(

145

categories: str | list[ArrayLike] = "auto",

146

dtype: type = ...,

147

handle_unknown: str = "error",

148

unknown_value: int | float | None = None,

149

encoded_missing_value: int | float = ...,

150

min_frequency: int | float | None = None,

151

max_categories: int | None = None

152

)

153

```

154

Encode categorical features as an integer array.

155

156

#### TargetEncoder { .api }

157

```python

158

from sklearn.preprocessing import TargetEncoder

159

160

TargetEncoder(

161

categories: str | list[ArrayLike] = "auto",

162

target_type: str = "auto",

163

smooth: str | float = "auto",

164

cv: int | BaseCrossValidator | Iterable = 5,

165

shuffle: bool = True,

166

random_state: int | RandomState | None = None

167

)

168

```

169

Target Encoder for regression and classification targets.

170

171

#### KBinsDiscretizer { .api }

172

```python

173

from sklearn.preprocessing import KBinsDiscretizer

174

175

KBinsDiscretizer(

176

n_bins: int | ArrayLike = 5,

177

encode: str = "onehot",

178

strategy: str = "quantile",

179

dtype: type | None = None,

180

subsample: int | None = 200000,

181

random_state: int | RandomState | None = None

182

)

183

```

184

Bin continuous data into intervals.

185

186

#### Binarizer { .api }

187

```python

188

from sklearn.preprocessing import Binarizer

189

190

Binarizer(

191

threshold: float = 0.0,

192

copy: bool = True

193

)

194

```

195

Binarize data (set feature values to 0 or 1) according to a threshold.

196

197

## Feature Engineering

198

199

#### PolynomialFeatures { .api }

200

```python

201

from sklearn.preprocessing import PolynomialFeatures

202

203

PolynomialFeatures(

204

degree: int = 2,

205

interaction_only: bool = False,

206

include_bias: bool = True,

207

order: str = "C"

208

)

209

```

210

Generate polynomial and interaction features.

211

212

#### SplineTransformer { .api }

213

```python

214

from sklearn.preprocessing import SplineTransformer

215

216

SplineTransformer(

217

n_knots: int = 5,

218

degree: int = 3,

219

knots: str | ArrayLike = "uniform",

220

extrapolation: str = "constant",

221

include_bias: bool = True,

222

order: str = "C",

223

sparse_output: bool = False

224

)

225

```

226

Generate univariate B-spline bases for features.

227

228

#### FunctionTransformer { .api }

229

```python

230

from sklearn.preprocessing import FunctionTransformer

231

232

FunctionTransformer(

233

func: Callable | None = None,

234

inverse_func: Callable | None = None,

235

validate: bool = False,

236

accept_sparse: bool = False,

237

check_inverse: bool = True,

238

feature_names_out: str | Callable | None = None,

239

kw_args: dict | None = None,

240

inv_kw_args: dict | None = None

241

)

242

```

243

Constructs a transformer from an arbitrary callable.

244

245

#### KernelCenterer { .api }

246

```python

247

from sklearn.preprocessing import KernelCenterer

248

249

KernelCenterer()

250

```

251

Center a kernel matrix.

252

253

## Feature Selection

254

255

### Univariate Selection

256

257

#### SelectKBest { .api }

258

```python

259

from sklearn.feature_selection import SelectKBest

260

261

SelectKBest(

262

score_func: Callable = ...,

263

k: int | str = 10

264

)

265

```

266

Select features according to the k highest scores.

267

268

#### SelectPercentile { .api }

269

```python

270

from sklearn.feature_selection import SelectPercentile

271

272

SelectPercentile(

273

score_func: Callable = ...,

274

percentile: int = 10

275

)

276

```

277

Select features according to a percentile of the highest scores.

278

279

#### SelectFpr { .api }

280

```python

281

from sklearn.feature_selection import SelectFpr

282

283

SelectFpr(

284

score_func: Callable = ...,

285

alpha: float = 0.05

286

)

287

```

288

Filter: Select the pvalues below alpha based on a FPR test.

289

290

#### SelectFdr { .api }

291

```python

292

from sklearn.feature_selection import SelectFdr

293

294

SelectFdr(

295

score_func: Callable = ...,

296

alpha: float = 0.05

297

)

298

```

299

Filter: Select the p-values for an estimated false discovery rate.

300

301

#### SelectFwe { .api }

302

```python

303

from sklearn.feature_selection import SelectFwe

304

305

SelectFwe(

306

score_func: Callable = ...,

307

alpha: float = 0.05

308

)

309

```

310

Filter: Select the p-values corresponding to Family-wise error rate.

311

312

#### GenericUnivariateSelect { .api }

313

```python

314

from sklearn.feature_selection import GenericUnivariateSelect

315

316

GenericUnivariateSelect(

317

score_func: Callable = ...,

318

mode: str = "percentile",

319

param: int | float = 1e-05

320

)

321

```

322

Univariate feature selector with configurable strategy.

323

324

### Model-based Selection

325

326

#### SelectFromModel { .api }

327

```python

328

from sklearn.feature_selection import SelectFromModel

329

330

SelectFromModel(

331

estimator: BaseEstimator,

332

threshold: str | float | None = None,

333

prefit: bool = False,

334

norm_order: int = 1,

335

max_features: int | Callable | None = None,

336

importance_getter: str | Callable = "auto"

337

)

338

```

339

Meta-transformer for selecting features based on importance weights.

340

341

### Recursive Feature Elimination

342

343

#### RFE { .api }

344

```python

345

from sklearn.feature_selection import RFE

346

347

RFE(

348

estimator: BaseEstimator,

349

n_features_to_select: int | float | None = None,

350

step: int | float = 1,

351

verbose: int = 0,

352

importance_getter: str | Callable = "auto"

353

)

354

```

355

Feature ranking with recursive feature elimination.

356

357

#### RFECV { .api }

358

```python

359

from sklearn.feature_selection import RFECV

360

361

RFECV(

362

estimator: BaseEstimator,

363

step: int | float = 1,

364

min_features_to_select: int = 1,

365

cv: int | BaseCrossValidator | Iterable | None = None,

366

scoring: str | Callable | None = None,

367

verbose: int = 0,

368

n_jobs: int | None = None,

369

importance_getter: str | Callable = "auto"

370

)

371

```

372

Recursive feature elimination with cross-validation.

373

374

### Sequential Feature Selection

375

376

#### SequentialFeatureSelector { .api }

377

```python

378

from sklearn.feature_selection import SequentialFeatureSelector

379

380

SequentialFeatureSelector(

381

estimator: BaseEstimator,

382

n_features_to_select: int | float | str = "auto",

383

tol: float | None = None,

384

direction: str = "forward",

385

scoring: str | Callable | None = None,

386

cv: int | BaseCrossValidator | Iterable = 5,

387

n_jobs: int | None = None

388

)

389

```

390

Sequential Feature Selector.

391

392

### Variance-based Selection

393

394

#### VarianceThreshold { .api }

395

```python

396

from sklearn.feature_selection import VarianceThreshold

397

398

VarianceThreshold(

399

threshold: float = 0.0

400

)

401

```

402

Feature selector that removes all low-variance features.

403

404

### Base Classes

405

406

#### SelectorMixin { .api }

407

```python

408

from sklearn.feature_selection import SelectorMixin

409

410

SelectorMixin()

411

```

412

Transformer mixin that performs feature selection given a support mask.

413

414

## Feature Selection Functions

415

416

### Statistical Tests

417

418

#### chi2 { .api }

419

```python

420

from sklearn.feature_selection import chi2

421

422

chi2(

423

X: ArrayLike,

424

y: ArrayLike

425

) -> tuple[ArrayLike, ArrayLike]

426

```

427

Compute chi-squared stats between each non-negative feature and class.

428

429

#### f_classif { .api }

430

```python

431

from sklearn.feature_selection import f_classif

432

433

f_classif(

434

X: ArrayLike,

435

y: ArrayLike

436

) -> tuple[ArrayLike, ArrayLike]

437

```

438

Compute the ANOVA F-value for the provided sample.

439

440

#### f_oneway { .api }

441

```python

442

from sklearn.feature_selection import f_oneway

443

444

f_oneway(

445

*samples: ArrayLike

446

) -> tuple[ArrayLike, ArrayLike]

447

```

448

Test for equal means in two or more samples from the normal distribution.

449

450

#### f_regression { .api }

451

```python

452

from sklearn.feature_selection import f_regression

453

454

f_regression(

455

X: ArrayLike,

456

y: ArrayLike,

457

center: bool = True

458

) -> tuple[ArrayLike, ArrayLike]

459

```

460

Univariate linear regression tests returning F-statistic and p-values.

461

462

#### r_regression { .api }

463

```python

464

from sklearn.feature_selection import r_regression

465

466

r_regression(

467

X: ArrayLike,

468

y: ArrayLike,

469

center: bool = True,

470

force_finite: bool = True

471

) -> tuple[ArrayLike, ArrayLike]

472

```

473

Compute Pearson's r for each feature with the target.

474

475

### Mutual Information

476

477

#### mutual_info_classif { .api }

478

```python

479

from sklearn.feature_selection import mutual_info_classif

480

481

mutual_info_classif(

482

X: ArrayLike,

483

y: ArrayLike,

484

discrete_features: str | bool | ArrayLike = "auto",

485

n_neighbors: int = 3,

486

copy: bool = True,

487

random_state: int | RandomState | None = None

488

) -> ArrayLike

489

```

490

Estimate mutual information for a discrete target variable.

491

492

#### mutual_info_regression { .api }

493

```python

494

from sklearn.feature_selection import mutual_info_regression

495

496

mutual_info_regression(

497

X: ArrayLike,

498

y: ArrayLike,

499

discrete_features: str | bool | ArrayLike = "auto",

500

n_neighbors: int = 3,

501

copy: bool = True,

502

random_state: int | RandomState | None = None

503

) -> ArrayLike

504

```

505

Estimate mutual information for a continuous target variable.

506

507

## Preprocessing Functions

508

509

### Scaling Functions

510

511

#### scale { .api }

512

```python

513

from sklearn.preprocessing import scale

514

515

scale(

516

X: ArrayLike,

517

axis: int = 0,

518

with_mean: bool = True,

519

with_std: bool = True,

520

copy: bool = True

521

) -> ArrayLike

522

```

523

Standardize a dataset along any axis.

524

525

#### minmax_scale { .api }

526

```python

527

from sklearn.preprocessing import minmax_scale

528

529

minmax_scale(

530

X: ArrayLike,

531

feature_range: tuple[float, float] = (0, 1),

532

axis: int = 0,

533

copy: bool = True

534

) -> ArrayLike

535

```

536

Transform features by scaling each feature to a given range.

537

538

#### maxabs_scale { .api }

539

```python

540

from sklearn.preprocessing import maxabs_scale

541

542

maxabs_scale(

543

X: ArrayLike,

544

axis: int = 0,

545

copy: bool = True

546

) -> ArrayLike

547

```

548

Scale each feature to the [-1, 1] range without breaking sparsity.

549

550

#### robust_scale { .api }

551

```python

552

from sklearn.preprocessing import robust_scale

553

554

robust_scale(

555

X: ArrayLike,

556

axis: int = 0,

557

quantile_range: tuple[float, float] = (25.0, 75.0),

558

copy: bool = True,

559

unit_variance: bool = False

560

) -> ArrayLike

561

```

562

Standardize a dataset along any axis.

563

564

#### normalize { .api }

565

```python

566

from sklearn.preprocessing import normalize

567

568

normalize(

569

X: ArrayLike,

570

norm: str = "l2",

571

axis: int = 1,

572

copy: bool = True,

573

return_norm: bool = False

574

) -> ArrayLike | tuple[ArrayLike, ArrayLike]

575

```

576

Scale input vectors individually to unit norm (vector length).

577

578

#### quantile_transform { .api }

579

```python

580

from sklearn.preprocessing import quantile_transform

581

582

quantile_transform(

583

X: ArrayLike,

584

axis: int = 0,

585

n_quantiles: int = 1000,

586

output_distribution: str = "uniform",

587

ignore_implicit_zeros: bool = False,

588

subsample: int = 100000,

589

random_state: int | RandomState | None = None,

590

copy: bool = True

591

) -> ArrayLike

592

```

593

Transform features to follow a uniform or a normal distribution.

594

595

#### power_transform { .api }

596

```python

597

from sklearn.preprocessing import power_transform

598

599

power_transform(

600

X: ArrayLike,

601

method: str = "yeo-johnson",

602

standardize: bool = True,

603

copy: bool = True

604

) -> ArrayLike

605

```

606

Apply a power transform featurewise to make data more Gaussian-like.

607

608

### Encoding Functions

609

610

#### label_binarize { .api }

611

```python

612

from sklearn.preprocessing import label_binarize

613

614

label_binarize(

615

y: ArrayLike,

616

classes: ArrayLike,

617

neg_label: int = 0,

618

pos_label: int = 1,

619

sparse_output: bool = False

620

) -> ArrayLike

621

```

622

Binarize labels in a one-vs-all fashion.

623

624

#### binarize { .api }

625

```python

626

from sklearn.preprocessing import binarize

627

628

binarize(

629

X: ArrayLike,

630

threshold: float = 0.0,

631

copy: bool = True

632

) -> ArrayLike

633

```

634

Boolean thresholding of array-like or scipy.sparse matrix.

635

636

#### add_dummy_feature { .api }

637

```python

638

from sklearn.preprocessing import add_dummy_feature

639

640

add_dummy_feature(

641

X: ArrayLike,

642

value: float = 1.0

643

) -> ArrayLike

644

```

645

Augment dataset with an additional dummy feature.

646

647

## Feature Extraction

648

649

### Text Feature Extraction

650

651

#### DictVectorizer { .api }

652

```python

653

from sklearn.feature_extraction import DictVectorizer

654

655

DictVectorizer(

656

dtype: type = ...,

657

separator: str = "=",

658

sparse: bool = True,

659

sort: bool = True

660

)

661

```

662

Transforms lists of feature-value mappings to vectors.

663

664

#### FeatureHasher { .api }

665

```python

666

from sklearn.feature_extraction import FeatureHasher

667

668

FeatureHasher(

669

n_features: int = 1048576,

670

input_type: str = "dict",

671

dtype: type = ...,

672

alternate_sign: bool = True

673

)

674

```

675

Implements feature hashing, aka the hashing trick.

676

677

### Image Feature Extraction

678

679

#### img_to_graph { .api }

680

```python

681

from sklearn.feature_extraction import img_to_graph

682

683

img_to_graph(

684

img: ArrayLike,

685

mask: ArrayLike | None = None,

686

return_as: type = ...,

687

dtype: type | None = None

688

) -> ArrayLike

689

```

690

Graph of the pixel-to-pixel gradient connections.

691

692

#### grid_to_graph { .api }

693

```python

694

from sklearn.feature_extraction import grid_to_graph

695

696

grid_to_graph(

697

n_x: int,

698

n_y: int,

699

n_z: int | None = None,

700

mask: ArrayLike | None = None,

701

return_as: type = ...,

702

dtype: type = ...,

703

**kwargs

704

) -> ArrayLike

705

```

706

Graph of the pixel-to-pixel gradient connections.

707

708

## Imputation

709

710

### Simple Imputation

711

712

#### SimpleImputer { .api }

713

```python

714

from sklearn.impute import SimpleImputer

715

716

SimpleImputer(

717

missing_values: int | float | str | None = ...,

718

strategy: str = "mean",

719

fill_value: str | int | float | None = None,

720

copy: bool = True,

721

add_indicator: bool = False,

722

keep_empty_features: bool = False

723

)

724

```

725

Imputation transformer for completing missing values.

726

727

### Advanced Imputation

728

729

#### KNNImputer { .api }

730

```python

731

from sklearn.impute import KNNImputer

732

733

KNNImputer(

734

missing_values: int | float | str | None = ...,

735

n_neighbors: int = 5,

736

weights: str | Callable = "uniform",

737

metric: str | Callable = "nan_euclidean",

738

copy: bool = True,

739

add_indicator: bool = False,

740

keep_empty_features: bool = False

741

)

742

```

743

Imputation for completing missing values using k-Nearest Neighbors.

744

745

### Missing Value Indicators

746

747

#### MissingIndicator { .api }

748

```python

749

from sklearn.impute import MissingIndicator

750

751

MissingIndicator(

752

missing_values: int | float | str | None = ...,

753

features: str = "missing-only",

754

sparse: bool | str = "auto",

755

error_on_new: bool = True

756

)

757

```

758

Binary indicators for missing values.

759

760

## Kernel Approximation

761

762

### RBF Kernel Approximation

763

764

#### RBFSampler { .api }

765

```python

766

from sklearn.kernel_approximation import RBFSampler

767

768

RBFSampler(

769

gamma: float = 1.0,

770

n_components: int = 100,

771

random_state: int | RandomState | None = None

772

)

773

```

774

Approximate a RBF kernel feature map using random Fourier features.

775

776

#### Nystroem { .api }

777

```python

778

from sklearn.kernel_approximation import Nystroem

779

780

Nystroem(

781

kernel: str | Callable = "rbf",

782

gamma: float | None = None,

783

coef0: float | None = None,

784

degree: float | None = None,

785

kernel_params: dict | None = None,

786

n_components: int = 100,

787

random_state: int | RandomState | None = None,

788

n_jobs: int | None = None

789

)

790

```

791

Approximate a kernel map using a subset of the training data.

792

793

### Chi-squared Kernel Approximation

794

795

#### AdditiveChi2Sampler { .api }

796

```python

797

from sklearn.kernel_approximation import AdditiveChi2Sampler

798

799

AdditiveChi2Sampler(

800

sample_steps: int = 2,

801

sample_interval: float | None = None

802

)

803

```

804

Approximate feature map for additive chi2 kernel.

805

806

#### SkewedChi2Sampler { .api }

807

```python

808

from sklearn.kernel_approximation import SkewedChi2Sampler

809

810

SkewedChi2Sampler(

811

skewedness: float = 1.0,

812

n_components: int = 100,

813

random_state: int | RandomState | None = None

814

)

815

```

816

Approximate feature map for "skewed chi-squared" kernel.

817

818

### Polynomial Kernel Approximation

819

820

#### PolynomialCountSketch { .api }

821

```python

822

from sklearn.kernel_approximation import PolynomialCountSketch

823

824

PolynomialCountSketch(

825

gamma: float = 1.0,

826

degree: int = 2,

827

coef0: int = 0,

828

n_components: int = 100,

829

random_state: int | RandomState | None = None

830

)

831

```

832

Polynomial kernel approximation via Tensor Sketch.

833

834

## Random Projection

835

836

#### GaussianRandomProjection { .api }

837

```python

838

from sklearn.random_projection import GaussianRandomProjection

839

840

GaussianRandomProjection(

841

n_components: int | str = "auto",

842

eps: float = 0.1,

843

random_state: int | RandomState | None = None,

844

compute_inverse_components: bool = False

845

)

846

```

847

Reduce dimensionality through Gaussian random projection.

848

849

#### SparseRandomProjection { .api }

850

```python

851

from sklearn.random_projection import SparseRandomProjection

852

853

SparseRandomProjection(

854

n_components: int | str = "auto",

855

density: float | str = "auto",

856

eps: float = 0.1,

857

dense_output: bool = False,

858

random_state: int | RandomState | None = None,

859

compute_inverse_components: bool = False

860

)

861

```

862

Reduce dimensionality through sparse random projection.

863

864

### Random Projection Functions

865

866

#### johnson_lindenstrauss_min_dim { .api }

867

```python

868

from sklearn.random_projection import johnson_lindenstrauss_min_dim

869

870

johnson_lindenstrauss_min_dim(

871

n_samples: int,

872

eps: float | ArrayLike = 0.1

873

) -> int | ArrayLike

874

```

875

Find a 'safe' number of components to randomly project to.

876

877

## Examples

878

879

### Basic Preprocessing Pipeline

880

881

```python

882

from sklearn.preprocessing import StandardScaler, OneHotEncoder

883

from sklearn.compose import ColumnTransformer

884

from sklearn.pipeline import Pipeline

885

from sklearn.impute import SimpleImputer

886

887

# Create preprocessing pipeline

888

numeric_features = ['age', 'income', 'score']

889

categorical_features = ['city', 'gender']

890

891

numeric_transformer = Pipeline(steps=[

892

('imputer', SimpleImputer(strategy='median')),

893

('scaler', StandardScaler())

894

])

895

896

categorical_transformer = Pipeline(steps=[

897

('imputer', SimpleImputer(strategy='most_frequent')),

898

('onehot', OneHotEncoder(handle_unknown='ignore'))

899

])

900

901

preprocessor = ColumnTransformer(

902

transformers=[

903

('num', numeric_transformer, numeric_features),

904

('cat', categorical_transformer, categorical_features)

905

]

906

)

907

```

908

909

### Feature Selection Pipeline

910

911

```python

912

from sklearn.feature_selection import SelectKBest, f_classif, RFE

913

from sklearn.ensemble import RandomForestClassifier

914

915

# Univariate feature selection

916

selector = SelectKBest(score_func=f_classif, k=10)

917

918

# Model-based feature selection

919

rfe = RFE(estimator=RandomForestClassifier(n_estimators=100), n_features_to_select=10)

920

921

# Complete pipeline

922

pipeline = Pipeline([

923

('scaler', StandardScaler()),

924

('selector', selector),

925

('classifier', RandomForestClassifier())

926

])

927

```