Tessl Tile for pypi/scikit-learn-intelex@2024.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced.md clustering.md daal4py-mb.md decomposition.md ensemble.md index.md linear-models.md metrics-model-selection.md neighbors.md patching-config.md stats-manifold.md svm.md

index.mddocs/

0
# Scikit-learn Intel Extension
1

2
Intel's Extension for Scikit-learn provides hardware-accelerated implementations of scikit-learn algorithms optimized for Intel CPUs and GPUs. It offers seamless drop-in replacements for existing scikit-learn applications, delivering 10-100x performance improvements through Intel hardware optimization, vector instructions, and AI-specific memory optimizations without requiring code modifications.
3

4
## Package Information
5

6
- **Package Name**: scikit-learn-intelex
7
- **Language**: Python
8
- **Installation**: `pip install scikit-learn-intelex`
9
- **License**: Apache 2.0
10

11
## Core Imports
12

13
```python
14
import sklearnex
15
```
16

17
For enabling optimizations globally:
18

19
```python
20
from sklearnex import patch_sklearn
21
patch_sklearn()
22
```
23

24
Direct imports of optimized algorithms:
25

26
```python
27
from sklearnex.ensemble import RandomForestClassifier
28
from sklearnex.linear_model import LinearRegression
29
from sklearnex.cluster import KMeans
30
```
31

32
## Basic Usage
33

34
```python
35
import numpy as np
36
from sklearnex import patch_sklearn
37
patch_sklearn()
38

39
# After patching, all sklearn imports use Intel optimizations
40
from sklearn.ensemble import RandomForestClassifier
41
from sklearn.datasets import make_classification
42

43
# Generate sample data
44
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
45

46
# Use accelerated Random Forest (same API as sklearn)
47
rf = RandomForestClassifier(n_estimators=100, random_state=42)
48
rf.fit(X, y)
49
predictions = rf.predict(X)
50

51
print(f"Accuracy: {rf.score(X, y):.3f}")
52
```
53

54
Alternative approach using direct imports:
55

56
```python
57
import numpy as np
58
from sklearnex.ensemble import RandomForestClassifier
59
from sklearn.datasets import make_classification
60

61
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
62

63
# Directly use Intel-optimized implementation
64
rf = RandomForestClassifier(n_estimators=100, random_state=42)
65
rf.fit(X, y)
66
predictions = rf.predict(X)
67
```
68

69
## Architecture
70

71
The package provides three main integration patterns:
72

73
- **Global Patching**: Replace sklearn implementations system-wide using `patch_sklearn()`
74
- **Direct Imports**: Import specific Intel-optimized algorithms directly from sklearnex modules
75
- **Distributed Computing**: Use SPMD (Single Program Multiple Data) variants for multi-node execution
76

77
All implementations maintain full API compatibility with scikit-learn while providing significant performance improvements through Intel hardware acceleration.
78

79
## Capabilities
80

81
### Patching and Configuration
82

83
Core functions for enabling Intel optimizations globally and managing configuration settings. These functions control how scikit-learn algorithms are accelerated.
84

85
```python { .api }
86
def patch_sklearn(): ...
87
def unpatch_sklearn(): ...
88
def sklearn_is_patched() -> bool: ...
89
def get_patch_map() -> dict: ...
90
def get_patch_names() -> list: ...
91
def is_patched_instance(estimator) -> bool: ...
92
def set_config(**params): ...
93
def get_config() -> dict: ...
94
def get_hyperparameters() -> dict: ...
95
```
96

97
[Patching and Configuration](./patching-config.md)
98

99
### Clustering Algorithms
100

101
High-performance implementations of clustering algorithms including K-means and DBSCAN with Intel hardware acceleration.
102

103
```python { .api }
104
class KMeans:
105
    def __init__(self, n_clusters=8, **kwargs): ...
106
    def fit(self, X, y=None): ...
107
    def predict(self, X): ...
108

109
class DBSCAN:
110
    def __init__(self, eps=0.5, min_samples=5, **kwargs): ...
111
    def fit(self, X, y=None): ...
112
    def fit_predict(self, X, y=None): ...
113
```
114

115
[Clustering](./clustering.md)
116

117
### Linear Models
118

119
Accelerated linear regression, logistic regression, and regularized models with Intel optimization for large datasets.
120

121
```python { .api }
122
class LinearRegression:
123
    def __init__(self, **kwargs): ...
124
    def fit(self, X, y): ...
125
    def predict(self, X): ...
126

127
class LogisticRegression:
128
    def __init__(self, **kwargs): ...
129
    def fit(self, X, y): ...
130
    def predict(self, X): ...
131
    def predict_proba(self, X): ...
132

133
class Ridge:
134
    def __init__(self, alpha=1.0, **kwargs): ...
135

136
class Lasso:
137
    def __init__(self, alpha=1.0, **kwargs): ...
138

139
class ElasticNet:
140
    def __init__(self, alpha=1.0, l1_ratio=0.5, **kwargs): ...
141

142
class IncrementalLinearRegression:
143
    def __init__(self, **kwargs): ...
144
    def partial_fit(self, X, y): ...
145
```
146

147
[Linear Models](./linear-models.md)
148

149
### Ensemble Methods
150

151
Intel-accelerated ensemble algorithms including Random Forest and Extra Trees for both classification and regression.
152

153
```python { .api }
154
class RandomForestClassifier:
155
    def __init__(self, n_estimators=100, **kwargs): ...
156
    def fit(self, X, y): ...
157
    def predict(self, X): ...
158
    def predict_proba(self, X): ...
159

160
class RandomForestRegressor:
161
    def __init__(self, n_estimators=100, **kwargs): ...
162

163
class ExtraTreesClassifier:
164
    def __init__(self, n_estimators=100, **kwargs): ...
165

166
class ExtraTreesRegressor:
167
    def __init__(self, n_estimators=100, **kwargs): ...
168
```
169

170
[Ensemble Methods](./ensemble.md)
171

172
### Dimensionality Reduction
173

174
Principal Component Analysis with Intel acceleration for efficient dimensionality reduction on large datasets.
175

176
```python { .api }
177
class PCA:
178
    def __init__(self, n_components=None, **kwargs): ...
179
    def fit(self, X, y=None): ...
180
    def transform(self, X): ...
181
    def fit_transform(self, X, y=None): ...
182
```
183

184
[Decomposition](./decomposition.md)
185

186
### Nearest Neighbors
187

188
Accelerated k-nearest neighbors algorithms for classification, regression, and unsupervised learning with optimized distance computations.
189

190
```python { .api }
191
class KNeighborsClassifier:
192
    def __init__(self, n_neighbors=5, **kwargs): ...
193
    def fit(self, X, y): ...
194
    def predict(self, X): ...
195
    def predict_proba(self, X): ...
196

197
class KNeighborsRegressor:
198
    def __init__(self, n_neighbors=5, **kwargs): ...
199

200
class NearestNeighbors:
201
    def __init__(self, n_neighbors=5, **kwargs): ...
202
    def fit(self, X, y=None): ...
203
    def kneighbors(self, X=None, n_neighbors=None, return_distance=True): ...
204

205
class LocalOutlierFactor:
206
    def __init__(self, n_neighbors=20, **kwargs): ...
207
    def fit_predict(self, X): ...
208
```
209

210
[Nearest Neighbors](./neighbors.md)
211

212
### Support Vector Machines
213

214
Intel-optimized Support Vector Machine implementations for classification and regression with accelerated kernel computations.
215

216
```python { .api }
217
class SVC:
218
    def __init__(self, **kwargs): ...
219
    def fit(self, X, y): ...
220
    def predict(self, X): ...
221

222
class SVR:
223
    def __init__(self, **kwargs): ...
224

225
class NuSVC:
226
    def __init__(self, **kwargs): ...
227

228
class NuSVR:
229
    def __init__(self, **kwargs): ...
230
```
231

232
[Support Vector Machines](./svm.md)
233

234
### Metrics and Model Selection
235

236
Performance metrics and data splitting utilities with Intel acceleration for large-scale evaluation.
237

238
```python { .api }
239
def roc_auc_score(y_true, y_score, **kwargs): ...
240
def pairwise_distances(X, Y=None, metric='euclidean', **kwargs): ...
241
def train_test_split(*arrays, **options): ...
242
```
243

244
[Metrics and Model Selection](./metrics-model-selection.md)
245

246
### Basic Statistics and Manifold Learning
247

248
Statistical computations and manifold learning algorithms with Intel optimization.
249

250
```python { .api }
251
class BasicStatistics:
252
    def __init__(self, **kwargs): ...
253
    def fit(self, X, y=None): ...
254

255
class IncrementalBasicStatistics:
256
    def __init__(self, **kwargs): ...
257
    def partial_fit(self, X, y=None): ...
258

259
class IncrementalEmpiricalCovariance:
260
    def __init__(self, **kwargs): ...
261
    def fit(self, X, y=None): ...
262
    def partial_fit(self, X, y=None): ...
263

264
class TSNE:
265
    def __init__(self, n_components=2, **kwargs): ...
266
    def fit_transform(self, X, y=None): ...
267
```
268

269
[Statistics and Manifold Learning](./stats-manifold.md)
270

271
### Model Builder API
272

273
Convert external gradient boosting models (XGBoost, LightGBM, CatBoost) to Intel oneDAL format for accelerated inference.
274

275
```python { .api }
276
from daal4py.mb import GBTDAALBaseModel, convert_model
277

278
def convert_model(model): ...
279

280
class GBTDAALBaseModel:
281
    def __init__(self): ...
282
```
283

284
[Model Builder API](./daal4py-mb.md)
285

286
### Advanced Features
287

288
Preview and SPMD (distributed) capabilities for cutting-edge algorithms and multi-node execution.
289

290
```python { .api }
291
# Preview features (requires SKLEARNEX_PREVIEW environment variable)
292
from sklearnex.preview.covariance import EmpiricalCovariance
293
from sklearnex.preview.decomposition import IncrementalPCA
294

295
# SPMD distributed computing
296
from sklearnex.spmd.cluster import KMeans as SPMDKMeans
297
from sklearnex.spmd.linear_model import LinearRegression as SPMDLinearRegression
298

299
# Utility functions
300
from sklearnex.utils import get_namespace, _assert_all_finite
301
```
302

303
[Advanced Features](./advanced.md)
304

305
## Environment Variables
306

307
- **OFF_ONEDAL_IFACE**: Set to "1" to disable oneDAL interface
308
- **SKLEARNEX_PREVIEW**: Enable preview features  
309
- **DALROOT**: Path to Intel oneDAL installation
310

311
## Performance Notes
312

313
- Expect 10-100x speedups on Intel hardware
314
- Optimizations work best with larger datasets (>1000 samples)
315
- All optimized algorithms maintain identical APIs to scikit-learn
316
- Can be used as drop-in replacements in existing code

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/