0
# Scikit-learn Intel Extension
1
2
Intel's Extension for Scikit-learn provides hardware-accelerated implementations of scikit-learn algorithms optimized for Intel CPUs and GPUs. It offers seamless drop-in replacements for existing scikit-learn applications, delivering 10-100x performance improvements through Intel hardware optimization, vector instructions, and AI-specific memory optimizations without requiring code modifications.
3
4
## Package Information
5
6
- **Package Name**: scikit-learn-intelex
7
- **Language**: Python
8
- **Installation**: `pip install scikit-learn-intelex`
9
- **License**: Apache 2.0
10
11
## Core Imports
12
13
```python
14
import sklearnex
15
```
16
17
For enabling optimizations globally:
18
19
```python
20
from sklearnex import patch_sklearn
21
patch_sklearn()
22
```
23
24
Direct imports of optimized algorithms:
25
26
```python
27
from sklearnex.ensemble import RandomForestClassifier
28
from sklearnex.linear_model import LinearRegression
29
from sklearnex.cluster import KMeans
30
```
31
32
## Basic Usage
33
34
```python
35
import numpy as np
36
from sklearnex import patch_sklearn
37
patch_sklearn()
38
39
# After patching, all sklearn imports use Intel optimizations
40
from sklearn.ensemble import RandomForestClassifier
41
from sklearn.datasets import make_classification
42
43
# Generate sample data
44
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
45
46
# Use accelerated Random Forest (same API as sklearn)
47
rf = RandomForestClassifier(n_estimators=100, random_state=42)
48
rf.fit(X, y)
49
predictions = rf.predict(X)
50
51
print(f"Accuracy: {rf.score(X, y):.3f}")
52
```
53
54
Alternative approach using direct imports:
55
56
```python
57
import numpy as np
58
from sklearnex.ensemble import RandomForestClassifier
59
from sklearn.datasets import make_classification
60
61
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
62
63
# Directly use Intel-optimized implementation
64
rf = RandomForestClassifier(n_estimators=100, random_state=42)
65
rf.fit(X, y)
66
predictions = rf.predict(X)
67
```
68
69
## Architecture
70
71
The package provides three main integration patterns:
72
73
- **Global Patching**: Replace sklearn implementations system-wide using `patch_sklearn()`
74
- **Direct Imports**: Import specific Intel-optimized algorithms directly from sklearnex modules
75
- **Distributed Computing**: Use SPMD (Single Program Multiple Data) variants for multi-node execution
76
77
All implementations maintain full API compatibility with scikit-learn while providing significant performance improvements through Intel hardware acceleration.
78
79
## Capabilities
80
81
### Patching and Configuration
82
83
Core functions for enabling Intel optimizations globally and managing configuration settings. These functions control how scikit-learn algorithms are accelerated.
84
85
```python { .api }
86
def patch_sklearn(): ...
87
def unpatch_sklearn(): ...
88
def sklearn_is_patched() -> bool: ...
89
def get_patch_map() -> dict: ...
90
def get_patch_names() -> list: ...
91
def is_patched_instance(estimator) -> bool: ...
92
def set_config(**params): ...
93
def get_config() -> dict: ...
94
def get_hyperparameters() -> dict: ...
95
```
96
97
[Patching and Configuration](./patching-config.md)
98
99
### Clustering Algorithms
100
101
High-performance implementations of clustering algorithms including K-means and DBSCAN with Intel hardware acceleration.
102
103
```python { .api }
104
class KMeans:
105
def __init__(self, n_clusters=8, **kwargs): ...
106
def fit(self, X, y=None): ...
107
def predict(self, X): ...
108
109
class DBSCAN:
110
def __init__(self, eps=0.5, min_samples=5, **kwargs): ...
111
def fit(self, X, y=None): ...
112
def fit_predict(self, X, y=None): ...
113
```
114
115
[Clustering](./clustering.md)
116
117
### Linear Models
118
119
Accelerated linear regression, logistic regression, and regularized models with Intel optimization for large datasets.
120
121
```python { .api }
122
class LinearRegression:
123
def __init__(self, **kwargs): ...
124
def fit(self, X, y): ...
125
def predict(self, X): ...
126
127
class LogisticRegression:
128
def __init__(self, **kwargs): ...
129
def fit(self, X, y): ...
130
def predict(self, X): ...
131
def predict_proba(self, X): ...
132
133
class Ridge:
134
def __init__(self, alpha=1.0, **kwargs): ...
135
136
class Lasso:
137
def __init__(self, alpha=1.0, **kwargs): ...
138
139
class ElasticNet:
140
def __init__(self, alpha=1.0, l1_ratio=0.5, **kwargs): ...
141
142
class IncrementalLinearRegression:
143
def __init__(self, **kwargs): ...
144
def partial_fit(self, X, y): ...
145
```
146
147
[Linear Models](./linear-models.md)
148
149
### Ensemble Methods
150
151
Intel-accelerated ensemble algorithms including Random Forest and Extra Trees for both classification and regression.
152
153
```python { .api }
154
class RandomForestClassifier:
155
def __init__(self, n_estimators=100, **kwargs): ...
156
def fit(self, X, y): ...
157
def predict(self, X): ...
158
def predict_proba(self, X): ...
159
160
class RandomForestRegressor:
161
def __init__(self, n_estimators=100, **kwargs): ...
162
163
class ExtraTreesClassifier:
164
def __init__(self, n_estimators=100, **kwargs): ...
165
166
class ExtraTreesRegressor:
167
def __init__(self, n_estimators=100, **kwargs): ...
168
```
169
170
[Ensemble Methods](./ensemble.md)
171
172
### Dimensionality Reduction
173
174
Principal Component Analysis with Intel acceleration for efficient dimensionality reduction on large datasets.
175
176
```python { .api }
177
class PCA:
178
def __init__(self, n_components=None, **kwargs): ...
179
def fit(self, X, y=None): ...
180
def transform(self, X): ...
181
def fit_transform(self, X, y=None): ...
182
```
183
184
[Decomposition](./decomposition.md)
185
186
### Nearest Neighbors
187
188
Accelerated k-nearest neighbors algorithms for classification, regression, and unsupervised learning with optimized distance computations.
189
190
```python { .api }
191
class KNeighborsClassifier:
192
def __init__(self, n_neighbors=5, **kwargs): ...
193
def fit(self, X, y): ...
194
def predict(self, X): ...
195
def predict_proba(self, X): ...
196
197
class KNeighborsRegressor:
198
def __init__(self, n_neighbors=5, **kwargs): ...
199
200
class NearestNeighbors:
201
def __init__(self, n_neighbors=5, **kwargs): ...
202
def fit(self, X, y=None): ...
203
def kneighbors(self, X=None, n_neighbors=None, return_distance=True): ...
204
205
class LocalOutlierFactor:
206
def __init__(self, n_neighbors=20, **kwargs): ...
207
def fit_predict(self, X): ...
208
```
209
210
[Nearest Neighbors](./neighbors.md)
211
212
### Support Vector Machines
213
214
Intel-optimized Support Vector Machine implementations for classification and regression with accelerated kernel computations.
215
216
```python { .api }
217
class SVC:
218
def __init__(self, **kwargs): ...
219
def fit(self, X, y): ...
220
def predict(self, X): ...
221
222
class SVR:
223
def __init__(self, **kwargs): ...
224
225
class NuSVC:
226
def __init__(self, **kwargs): ...
227
228
class NuSVR:
229
def __init__(self, **kwargs): ...
230
```
231
232
[Support Vector Machines](./svm.md)
233
234
### Metrics and Model Selection
235
236
Performance metrics and data splitting utilities with Intel acceleration for large-scale evaluation.
237
238
```python { .api }
239
def roc_auc_score(y_true, y_score, **kwargs): ...
240
def pairwise_distances(X, Y=None, metric='euclidean', **kwargs): ...
241
def train_test_split(*arrays, **options): ...
242
```
243
244
[Metrics and Model Selection](./metrics-model-selection.md)
245
246
### Basic Statistics and Manifold Learning
247
248
Statistical computations and manifold learning algorithms with Intel optimization.
249
250
```python { .api }
251
class BasicStatistics:
252
def __init__(self, **kwargs): ...
253
def fit(self, X, y=None): ...
254
255
class IncrementalBasicStatistics:
256
def __init__(self, **kwargs): ...
257
def partial_fit(self, X, y=None): ...
258
259
class IncrementalEmpiricalCovariance:
260
def __init__(self, **kwargs): ...
261
def fit(self, X, y=None): ...
262
def partial_fit(self, X, y=None): ...
263
264
class TSNE:
265
def __init__(self, n_components=2, **kwargs): ...
266
def fit_transform(self, X, y=None): ...
267
```
268
269
[Statistics and Manifold Learning](./stats-manifold.md)
270
271
### Model Builder API
272
273
Convert external gradient boosting models (XGBoost, LightGBM, CatBoost) to Intel oneDAL format for accelerated inference.
274
275
```python { .api }
276
from daal4py.mb import GBTDAALBaseModel, convert_model
277
278
def convert_model(model): ...
279
280
class GBTDAALBaseModel:
281
def __init__(self): ...
282
```
283
284
[Model Builder API](./daal4py-mb.md)
285
286
### Advanced Features
287
288
Preview and SPMD (distributed) capabilities for cutting-edge algorithms and multi-node execution.
289
290
```python { .api }
291
# Preview features (requires SKLEARNEX_PREVIEW environment variable)
292
from sklearnex.preview.covariance import EmpiricalCovariance
293
from sklearnex.preview.decomposition import IncrementalPCA
294
295
# SPMD distributed computing
296
from sklearnex.spmd.cluster import KMeans as SPMDKMeans
297
from sklearnex.spmd.linear_model import LinearRegression as SPMDLinearRegression
298
299
# Utility functions
300
from sklearnex.utils import get_namespace, _assert_all_finite
301
```
302
303
[Advanced Features](./advanced.md)
304
305
## Environment Variables
306
307
- **OFF_ONEDAL_IFACE**: Set to "1" to disable oneDAL interface
308
- **SKLEARNEX_PREVIEW**: Enable preview features
309
- **DALROOT**: Path to Intel oneDAL installation
310
311
## Performance Notes
312
313
- Expect 10-100x speedups on Intel hardware
314
- Optimizations work best with larger datasets (>1000 samples)
315
- All optimized algorithms maintain identical APIs to scikit-learn
316
- Can be used as drop-in replacements in existing code