0
# Data Preprocessing and Feature Engineering
1
2
This document covers all data preprocessing, feature engineering, and feature selection capabilities in scikit-learn.
3
4
## Scaling and Normalization
5
6
#### StandardScaler { .api }
7
```python
8
from sklearn.preprocessing import StandardScaler
9
10
StandardScaler(
11
copy: bool = True,
12
with_mean: bool = True,
13
with_std: bool = True
14
)
15
```
16
Standardize features by removing the mean and scaling to unit variance.
17
18
#### MinMaxScaler { .api }
19
```python
20
from sklearn.preprocessing import MinMaxScaler
21
22
MinMaxScaler(
23
feature_range: tuple[float, float] = (0, 1),
24
copy: bool = True,
25
clip: bool = False
26
)
27
```
28
Transform features by scaling each feature to a given range.
29
30
#### MaxAbsScaler { .api }
31
```python
32
from sklearn.preprocessing import MaxAbsScaler
33
34
MaxAbsScaler(
35
copy: bool = True
36
)
37
```
38
Scale each feature by its maximum absolute value.
39
40
#### RobustScaler { .api }
41
```python
42
from sklearn.preprocessing import RobustScaler
43
44
RobustScaler(
45
quantile_range: tuple[float, float] = (25.0, 75.0),
46
copy: bool = True,
47
unit_variance: bool = False
48
)
49
```
50
Scale features using statistics that are robust to outliers.
51
52
#### Normalizer { .api }
53
```python
54
from sklearn.preprocessing import Normalizer
55
56
Normalizer(
57
norm: str = "l2",
58
copy: bool = True
59
)
60
```
61
Normalize samples individually to unit norm.
62
63
#### QuantileTransformer { .api }
64
```python
65
from sklearn.preprocessing import QuantileTransformer
66
67
QuantileTransformer(
68
n_quantiles: int = 1000,
69
output_distribution: str = "uniform",
70
ignore_implicit_zeros: bool = False,
71
subsample: int = 100000,
72
random_state: int | RandomState | None = None,
73
copy: bool = True
74
)
75
```
76
Transform features to follow a uniform or a normal distribution.
77
78
#### PowerTransformer { .api }
79
```python
80
from sklearn.preprocessing import PowerTransformer
81
82
PowerTransformer(
83
method: str = "yeo-johnson",
84
standardize: bool = True,
85
copy: bool = True
86
)
87
```
88
Apply a power transform featurewise to make data more Gaussian-like.
89
90
## Encoding
91
92
#### LabelEncoder { .api }
93
```python
94
from sklearn.preprocessing import LabelEncoder
95
96
LabelEncoder()
97
```
98
Encode target labels with value between 0 and n_classes-1.
99
100
#### LabelBinarizer { .api }
101
```python
102
from sklearn.preprocessing import LabelBinarizer
103
104
LabelBinarizer(
105
neg_label: int = 0,
106
pos_label: int = 1,
107
sparse_output: bool = False
108
)
109
```
110
Binarize labels in a one-vs-all fashion.
111
112
#### MultiLabelBinarizer { .api }
113
```python
114
from sklearn.preprocessing import MultiLabelBinarizer
115
116
MultiLabelBinarizer(
117
classes: ArrayLike | None = None,
118
sparse_output: bool = False
119
)
120
```
121
Transform between iterable of iterables and a multilabel format.
122
123
#### OneHotEncoder { .api }
124
```python
125
from sklearn.preprocessing import OneHotEncoder
126
127
OneHotEncoder(
128
categories: str | list[ArrayLike] = "auto",
129
drop: str | ArrayLike | None = None,
130
sparse_output: bool = True,
131
dtype: type = ...,
132
handle_unknown: str = "error",
133
min_frequency: int | float | None = None,
134
max_categories: int | None = None,
135
feature_name_combiner: str | Callable = "concat"
136
)
137
```
138
Encode categorical features as a one-hot numeric array.
139
140
#### OrdinalEncoder { .api }
141
```python
142
from sklearn.preprocessing import OrdinalEncoder
143
144
OrdinalEncoder(
145
categories: str | list[ArrayLike] = "auto",
146
dtype: type = ...,
147
handle_unknown: str = "error",
148
unknown_value: int | float | None = None,
149
encoded_missing_value: int | float = ...,
150
min_frequency: int | float | None = None,
151
max_categories: int | None = None
152
)
153
```
154
Encode categorical features as an integer array.
155
156
#### TargetEncoder { .api }
157
```python
158
from sklearn.preprocessing import TargetEncoder
159
160
TargetEncoder(
161
categories: str | list[ArrayLike] = "auto",
162
target_type: str = "auto",
163
smooth: str | float = "auto",
164
cv: int | BaseCrossValidator | Iterable = 5,
165
shuffle: bool = True,
166
random_state: int | RandomState | None = None
167
)
168
```
169
Target Encoder for regression and classification targets.
170
171
#### KBinsDiscretizer { .api }
172
```python
173
from sklearn.preprocessing import KBinsDiscretizer
174
175
KBinsDiscretizer(
176
n_bins: int | ArrayLike = 5,
177
encode: str = "onehot",
178
strategy: str = "quantile",
179
dtype: type | None = None,
180
subsample: int | None = 200000,
181
random_state: int | RandomState | None = None
182
)
183
```
184
Bin continuous data into intervals.
185
186
#### Binarizer { .api }
187
```python
188
from sklearn.preprocessing import Binarizer
189
190
Binarizer(
191
threshold: float = 0.0,
192
copy: bool = True
193
)
194
```
195
Binarize data (set feature values to 0 or 1) according to a threshold.
196
197
## Feature Engineering
198
199
#### PolynomialFeatures { .api }
200
```python
201
from sklearn.preprocessing import PolynomialFeatures
202
203
PolynomialFeatures(
204
degree: int = 2,
205
interaction_only: bool = False,
206
include_bias: bool = True,
207
order: str = "C"
208
)
209
```
210
Generate polynomial and interaction features.
211
212
#### SplineTransformer { .api }
213
```python
214
from sklearn.preprocessing import SplineTransformer
215
216
SplineTransformer(
217
n_knots: int = 5,
218
degree: int = 3,
219
knots: str | ArrayLike = "uniform",
220
extrapolation: str = "constant",
221
include_bias: bool = True,
222
order: str = "C",
223
sparse_output: bool = False
224
)
225
```
226
Generate univariate B-spline bases for features.
227
228
#### FunctionTransformer { .api }
229
```python
230
from sklearn.preprocessing import FunctionTransformer
231
232
FunctionTransformer(
233
func: Callable | None = None,
234
inverse_func: Callable | None = None,
235
validate: bool = False,
236
accept_sparse: bool = False,
237
check_inverse: bool = True,
238
feature_names_out: str | Callable | None = None,
239
kw_args: dict | None = None,
240
inv_kw_args: dict | None = None
241
)
242
```
243
Constructs a transformer from an arbitrary callable.
244
245
#### KernelCenterer { .api }
246
```python
247
from sklearn.preprocessing import KernelCenterer
248
249
KernelCenterer()
250
```
251
Center a kernel matrix.
252
253
## Feature Selection
254
255
### Univariate Selection
256
257
#### SelectKBest { .api }
258
```python
259
from sklearn.feature_selection import SelectKBest
260
261
SelectKBest(
262
score_func: Callable = ...,
263
k: int | str = 10
264
)
265
```
266
Select features according to the k highest scores.
267
268
#### SelectPercentile { .api }
269
```python
270
from sklearn.feature_selection import SelectPercentile
271
272
SelectPercentile(
273
score_func: Callable = ...,
274
percentile: int = 10
275
)
276
```
277
Select features according to a percentile of the highest scores.
278
279
#### SelectFpr { .api }
280
```python
281
from sklearn.feature_selection import SelectFpr
282
283
SelectFpr(
284
score_func: Callable = ...,
285
alpha: float = 0.05
286
)
287
```
288
Filter: Select the pvalues below alpha based on a FPR test.
289
290
#### SelectFdr { .api }
291
```python
292
from sklearn.feature_selection import SelectFdr
293
294
SelectFdr(
295
score_func: Callable = ...,
296
alpha: float = 0.05
297
)
298
```
299
Filter: Select the p-values for an estimated false discovery rate.
300
301
#### SelectFwe { .api }
302
```python
303
from sklearn.feature_selection import SelectFwe
304
305
SelectFwe(
306
score_func: Callable = ...,
307
alpha: float = 0.05
308
)
309
```
310
Filter: Select the p-values corresponding to Family-wise error rate.
311
312
#### GenericUnivariateSelect { .api }
313
```python
314
from sklearn.feature_selection import GenericUnivariateSelect
315
316
GenericUnivariateSelect(
317
score_func: Callable = ...,
318
mode: str = "percentile",
319
param: int | float = 1e-05
320
)
321
```
322
Univariate feature selector with configurable strategy.
323
324
### Model-based Selection
325
326
#### SelectFromModel { .api }
327
```python
328
from sklearn.feature_selection import SelectFromModel
329
330
SelectFromModel(
331
estimator: BaseEstimator,
332
threshold: str | float | None = None,
333
prefit: bool = False,
334
norm_order: int = 1,
335
max_features: int | Callable | None = None,
336
importance_getter: str | Callable = "auto"
337
)
338
```
339
Meta-transformer for selecting features based on importance weights.
340
341
### Recursive Feature Elimination
342
343
#### RFE { .api }
344
```python
345
from sklearn.feature_selection import RFE
346
347
RFE(
348
estimator: BaseEstimator,
349
n_features_to_select: int | float | None = None,
350
step: int | float = 1,
351
verbose: int = 0,
352
importance_getter: str | Callable = "auto"
353
)
354
```
355
Feature ranking with recursive feature elimination.
356
357
#### RFECV { .api }
358
```python
359
from sklearn.feature_selection import RFECV
360
361
RFECV(
362
estimator: BaseEstimator,
363
step: int | float = 1,
364
min_features_to_select: int = 1,
365
cv: int | BaseCrossValidator | Iterable | None = None,
366
scoring: str | Callable | None = None,
367
verbose: int = 0,
368
n_jobs: int | None = None,
369
importance_getter: str | Callable = "auto"
370
)
371
```
372
Recursive feature elimination with cross-validation.
373
374
### Sequential Feature Selection
375
376
#### SequentialFeatureSelector { .api }
377
```python
378
from sklearn.feature_selection import SequentialFeatureSelector
379
380
SequentialFeatureSelector(
381
estimator: BaseEstimator,
382
n_features_to_select: int | float | str = "auto",
383
tol: float | None = None,
384
direction: str = "forward",
385
scoring: str | Callable | None = None,
386
cv: int | BaseCrossValidator | Iterable = 5,
387
n_jobs: int | None = None
388
)
389
```
390
Sequential Feature Selector.
391
392
### Variance-based Selection
393
394
#### VarianceThreshold { .api }
395
```python
396
from sklearn.feature_selection import VarianceThreshold
397
398
VarianceThreshold(
399
threshold: float = 0.0
400
)
401
```
402
Feature selector that removes all low-variance features.
403
404
### Base Classes
405
406
#### SelectorMixin { .api }
407
```python
408
from sklearn.feature_selection import SelectorMixin
409
410
SelectorMixin()
411
```
412
Transformer mixin that performs feature selection given a support mask.
413
414
## Feature Selection Functions
415
416
### Statistical Tests
417
418
#### chi2 { .api }
419
```python
420
from sklearn.feature_selection import chi2
421
422
chi2(
423
X: ArrayLike,
424
y: ArrayLike
425
) -> tuple[ArrayLike, ArrayLike]
426
```
427
Compute chi-squared stats between each non-negative feature and class.
428
429
#### f_classif { .api }
430
```python
431
from sklearn.feature_selection import f_classif
432
433
f_classif(
434
X: ArrayLike,
435
y: ArrayLike
436
) -> tuple[ArrayLike, ArrayLike]
437
```
438
Compute the ANOVA F-value for the provided sample.
439
440
#### f_oneway { .api }
441
```python
442
from sklearn.feature_selection import f_oneway
443
444
f_oneway(
445
*samples: ArrayLike
446
) -> tuple[ArrayLike, ArrayLike]
447
```
448
Test for equal means in two or more samples from the normal distribution.
449
450
#### f_regression { .api }
451
```python
452
from sklearn.feature_selection import f_regression
453
454
f_regression(
455
X: ArrayLike,
456
y: ArrayLike,
457
center: bool = True
458
) -> tuple[ArrayLike, ArrayLike]
459
```
460
Univariate linear regression tests returning F-statistic and p-values.
461
462
#### r_regression { .api }
463
```python
464
from sklearn.feature_selection import r_regression
465
466
r_regression(
467
X: ArrayLike,
468
y: ArrayLike,
469
center: bool = True,
470
force_finite: bool = True
471
) -> tuple[ArrayLike, ArrayLike]
472
```
473
Compute Pearson's r for each feature with the target.
474
475
### Mutual Information
476
477
#### mutual_info_classif { .api }
478
```python
479
from sklearn.feature_selection import mutual_info_classif
480
481
mutual_info_classif(
482
X: ArrayLike,
483
y: ArrayLike,
484
discrete_features: str | bool | ArrayLike = "auto",
485
n_neighbors: int = 3,
486
copy: bool = True,
487
random_state: int | RandomState | None = None
488
) -> ArrayLike
489
```
490
Estimate mutual information for a discrete target variable.
491
492
#### mutual_info_regression { .api }
493
```python
494
from sklearn.feature_selection import mutual_info_regression
495
496
mutual_info_regression(
497
X: ArrayLike,
498
y: ArrayLike,
499
discrete_features: str | bool | ArrayLike = "auto",
500
n_neighbors: int = 3,
501
copy: bool = True,
502
random_state: int | RandomState | None = None
503
) -> ArrayLike
504
```
505
Estimate mutual information for a continuous target variable.
506
507
## Preprocessing Functions
508
509
### Scaling Functions
510
511
#### scale { .api }
512
```python
513
from sklearn.preprocessing import scale
514
515
scale(
516
X: ArrayLike,
517
axis: int = 0,
518
with_mean: bool = True,
519
with_std: bool = True,
520
copy: bool = True
521
) -> ArrayLike
522
```
523
Standardize a dataset along any axis.
524
525
#### minmax_scale { .api }
526
```python
527
from sklearn.preprocessing import minmax_scale
528
529
minmax_scale(
530
X: ArrayLike,
531
feature_range: tuple[float, float] = (0, 1),
532
axis: int = 0,
533
copy: bool = True
534
) -> ArrayLike
535
```
536
Transform features by scaling each feature to a given range.
537
538
#### maxabs_scale { .api }
539
```python
540
from sklearn.preprocessing import maxabs_scale
541
542
maxabs_scale(
543
X: ArrayLike,
544
axis: int = 0,
545
copy: bool = True
546
) -> ArrayLike
547
```
548
Scale each feature to the [-1, 1] range without breaking sparsity.
549
550
#### robust_scale { .api }
551
```python
552
from sklearn.preprocessing import robust_scale
553
554
robust_scale(
555
X: ArrayLike,
556
axis: int = 0,
557
quantile_range: tuple[float, float] = (25.0, 75.0),
558
copy: bool = True,
559
unit_variance: bool = False
560
) -> ArrayLike
561
```
562
Standardize a dataset along any axis.
563
564
#### normalize { .api }
565
```python
566
from sklearn.preprocessing import normalize
567
568
normalize(
569
X: ArrayLike,
570
norm: str = "l2",
571
axis: int = 1,
572
copy: bool = True,
573
return_norm: bool = False
574
) -> ArrayLike | tuple[ArrayLike, ArrayLike]
575
```
576
Scale input vectors individually to unit norm (vector length).
577
578
#### quantile_transform { .api }
579
```python
580
from sklearn.preprocessing import quantile_transform
581
582
quantile_transform(
583
X: ArrayLike,
584
axis: int = 0,
585
n_quantiles: int = 1000,
586
output_distribution: str = "uniform",
587
ignore_implicit_zeros: bool = False,
588
subsample: int = 100000,
589
random_state: int | RandomState | None = None,
590
copy: bool = True
591
) -> ArrayLike
592
```
593
Transform features to follow a uniform or a normal distribution.
594
595
#### power_transform { .api }
596
```python
597
from sklearn.preprocessing import power_transform
598
599
power_transform(
600
X: ArrayLike,
601
method: str = "yeo-johnson",
602
standardize: bool = True,
603
copy: bool = True
604
) -> ArrayLike
605
```
606
Apply a power transform featurewise to make data more Gaussian-like.
607
608
### Encoding Functions
609
610
#### label_binarize { .api }
611
```python
612
from sklearn.preprocessing import label_binarize
613
614
label_binarize(
615
y: ArrayLike,
616
classes: ArrayLike,
617
neg_label: int = 0,
618
pos_label: int = 1,
619
sparse_output: bool = False
620
) -> ArrayLike
621
```
622
Binarize labels in a one-vs-all fashion.
623
624
#### binarize { .api }
625
```python
626
from sklearn.preprocessing import binarize
627
628
binarize(
629
X: ArrayLike,
630
threshold: float = 0.0,
631
copy: bool = True
632
) -> ArrayLike
633
```
634
Boolean thresholding of array-like or scipy.sparse matrix.
635
636
#### add_dummy_feature { .api }
637
```python
638
from sklearn.preprocessing import add_dummy_feature
639
640
add_dummy_feature(
641
X: ArrayLike,
642
value: float = 1.0
643
) -> ArrayLike
644
```
645
Augment dataset with an additional dummy feature.
646
647
## Feature Extraction
648
649
### Text Feature Extraction
650
651
#### DictVectorizer { .api }
652
```python
653
from sklearn.feature_extraction import DictVectorizer
654
655
DictVectorizer(
656
dtype: type = ...,
657
separator: str = "=",
658
sparse: bool = True,
659
sort: bool = True
660
)
661
```
662
Transforms lists of feature-value mappings to vectors.
663
664
#### FeatureHasher { .api }
665
```python
666
from sklearn.feature_extraction import FeatureHasher
667
668
FeatureHasher(
669
n_features: int = 1048576,
670
input_type: str = "dict",
671
dtype: type = ...,
672
alternate_sign: bool = True
673
)
674
```
675
Implements feature hashing, aka the hashing trick.
676
677
### Image Feature Extraction
678
679
#### img_to_graph { .api }
680
```python
681
from sklearn.feature_extraction import img_to_graph
682
683
img_to_graph(
684
img: ArrayLike,
685
mask: ArrayLike | None = None,
686
return_as: type = ...,
687
dtype: type | None = None
688
) -> ArrayLike
689
```
690
Graph of the pixel-to-pixel gradient connections.
691
692
#### grid_to_graph { .api }
693
```python
694
from sklearn.feature_extraction import grid_to_graph
695
696
grid_to_graph(
697
n_x: int,
698
n_y: int,
699
n_z: int | None = None,
700
mask: ArrayLike | None = None,
701
return_as: type = ...,
702
dtype: type = ...,
703
**kwargs
704
) -> ArrayLike
705
```
706
Graph of the pixel-to-pixel gradient connections.
707
708
## Imputation
709
710
### Simple Imputation
711
712
#### SimpleImputer { .api }
713
```python
714
from sklearn.impute import SimpleImputer
715
716
SimpleImputer(
717
missing_values: int | float | str | None = ...,
718
strategy: str = "mean",
719
fill_value: str | int | float | None = None,
720
copy: bool = True,
721
add_indicator: bool = False,
722
keep_empty_features: bool = False
723
)
724
```
725
Imputation transformer for completing missing values.
726
727
### Advanced Imputation
728
729
#### KNNImputer { .api }
730
```python
731
from sklearn.impute import KNNImputer
732
733
KNNImputer(
734
missing_values: int | float | str | None = ...,
735
n_neighbors: int = 5,
736
weights: str | Callable = "uniform",
737
metric: str | Callable = "nan_euclidean",
738
copy: bool = True,
739
add_indicator: bool = False,
740
keep_empty_features: bool = False
741
)
742
```
743
Imputation for completing missing values using k-Nearest Neighbors.
744
745
### Missing Value Indicators
746
747
#### MissingIndicator { .api }
748
```python
749
from sklearn.impute import MissingIndicator
750
751
MissingIndicator(
752
missing_values: int | float | str | None = ...,
753
features: str = "missing-only",
754
sparse: bool | str = "auto",
755
error_on_new: bool = True
756
)
757
```
758
Binary indicators for missing values.
759
760
## Kernel Approximation
761
762
### RBF Kernel Approximation
763
764
#### RBFSampler { .api }
765
```python
766
from sklearn.kernel_approximation import RBFSampler
767
768
RBFSampler(
769
gamma: float = 1.0,
770
n_components: int = 100,
771
random_state: int | RandomState | None = None
772
)
773
```
774
Approximate a RBF kernel feature map using random Fourier features.
775
776
#### Nystroem { .api }
777
```python
778
from sklearn.kernel_approximation import Nystroem
779
780
Nystroem(
781
kernel: str | Callable = "rbf",
782
gamma: float | None = None,
783
coef0: float | None = None,
784
degree: float | None = None,
785
kernel_params: dict | None = None,
786
n_components: int = 100,
787
random_state: int | RandomState | None = None,
788
n_jobs: int | None = None
789
)
790
```
791
Approximate a kernel map using a subset of the training data.
792
793
### Chi-squared Kernel Approximation
794
795
#### AdditiveChi2Sampler { .api }
796
```python
797
from sklearn.kernel_approximation import AdditiveChi2Sampler
798
799
AdditiveChi2Sampler(
800
sample_steps: int = 2,
801
sample_interval: float | None = None
802
)
803
```
804
Approximate feature map for additive chi2 kernel.
805
806
#### SkewedChi2Sampler { .api }
807
```python
808
from sklearn.kernel_approximation import SkewedChi2Sampler
809
810
SkewedChi2Sampler(
811
skewedness: float = 1.0,
812
n_components: int = 100,
813
random_state: int | RandomState | None = None
814
)
815
```
816
Approximate feature map for "skewed chi-squared" kernel.
817
818
### Polynomial Kernel Approximation
819
820
#### PolynomialCountSketch { .api }
821
```python
822
from sklearn.kernel_approximation import PolynomialCountSketch
823
824
PolynomialCountSketch(
825
gamma: float = 1.0,
826
degree: int = 2,
827
coef0: int = 0,
828
n_components: int = 100,
829
random_state: int | RandomState | None = None
830
)
831
```
832
Polynomial kernel approximation via Tensor Sketch.
833
834
## Random Projection
835
836
#### GaussianRandomProjection { .api }
837
```python
838
from sklearn.random_projection import GaussianRandomProjection
839
840
GaussianRandomProjection(
841
n_components: int | str = "auto",
842
eps: float = 0.1,
843
random_state: int | RandomState | None = None,
844
compute_inverse_components: bool = False
845
)
846
```
847
Reduce dimensionality through Gaussian random projection.
848
849
#### SparseRandomProjection { .api }
850
```python
851
from sklearn.random_projection import SparseRandomProjection
852
853
SparseRandomProjection(
854
n_components: int | str = "auto",
855
density: float | str = "auto",
856
eps: float = 0.1,
857
dense_output: bool = False,
858
random_state: int | RandomState | None = None,
859
compute_inverse_components: bool = False
860
)
861
```
862
Reduce dimensionality through sparse random projection.
863
864
### Random Projection Functions
865
866
#### johnson_lindenstrauss_min_dim { .api }
867
```python
868
from sklearn.random_projection import johnson_lindenstrauss_min_dim
869
870
johnson_lindenstrauss_min_dim(
871
n_samples: int,
872
eps: float | ArrayLike = 0.1
873
) -> int | ArrayLike
874
```
875
Find a 'safe' number of components to randomly project to.
876
877
## Examples
878
879
### Basic Preprocessing Pipeline
880
881
```python
882
from sklearn.preprocessing import StandardScaler, OneHotEncoder
883
from sklearn.compose import ColumnTransformer
884
from sklearn.pipeline import Pipeline
885
from sklearn.impute import SimpleImputer
886
887
# Create preprocessing pipeline
888
numeric_features = ['age', 'income', 'score']
889
categorical_features = ['city', 'gender']
890
891
numeric_transformer = Pipeline(steps=[
892
('imputer', SimpleImputer(strategy='median')),
893
('scaler', StandardScaler())
894
])
895
896
categorical_transformer = Pipeline(steps=[
897
('imputer', SimpleImputer(strategy='most_frequent')),
898
('onehot', OneHotEncoder(handle_unknown='ignore'))
899
])
900
901
preprocessor = ColumnTransformer(
902
transformers=[
903
('num', numeric_transformer, numeric_features),
904
('cat', categorical_transformer, categorical_features)
905
]
906
)
907
```
908
909
### Feature Selection Pipeline
910
911
```python
912
from sklearn.feature_selection import SelectKBest, f_classif, RFE
913
from sklearn.ensemble import RandomForestClassifier
914
915
# Univariate feature selection
916
selector = SelectKBest(score_func=f_classif, k=10)
917
918
# Model-based feature selection
919
rfe = RFE(estimator=RandomForestClassifier(n_estimators=100), n_features_to_select=10)
920
921
# Complete pipeline
922
pipeline = Pipeline([
923
('scaler', StandardScaler()),
924
('selector', selector),
925
('classifier', RandomForestClassifier())
926
])
927
```