CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-scikit-learn-intelex

Intel Extension for Scikit-learn providing hardware-accelerated implementations of scikit-learn algorithms optimized for Intel CPUs and GPUs.

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

index.mddocs/

Scikit-learn Intel Extension

Intel's Extension for Scikit-learn provides hardware-accelerated implementations of scikit-learn algorithms optimized for Intel CPUs and GPUs. It offers seamless drop-in replacements for existing scikit-learn applications, delivering 10-100x performance improvements through Intel hardware optimization, vector instructions, and AI-specific memory optimizations without requiring code modifications.

Package Information

  • Package Name: scikit-learn-intelex
  • Language: Python
  • Installation: pip install scikit-learn-intelex
  • License: Apache 2.0

Core Imports

import sklearnex

For enabling optimizations globally:

from sklearnex import patch_sklearn
patch_sklearn()

Direct imports of optimized algorithms:

from sklearnex.ensemble import RandomForestClassifier
from sklearnex.linear_model import LinearRegression
from sklearnex.cluster import KMeans

Basic Usage

import numpy as np
from sklearnex import patch_sklearn
patch_sklearn()

# After patching, all sklearn imports use Intel optimizations
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Generate sample data
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Use accelerated Random Forest (same API as sklearn)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)
predictions = rf.predict(X)

print(f"Accuracy: {rf.score(X, y):.3f}")

Alternative approach using direct imports:

import numpy as np
from sklearnex.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Directly use Intel-optimized implementation
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)
predictions = rf.predict(X)

Architecture

The package provides three main integration patterns:

  • Global Patching: Replace sklearn implementations system-wide using patch_sklearn()
  • Direct Imports: Import specific Intel-optimized algorithms directly from sklearnex modules
  • Distributed Computing: Use SPMD (Single Program Multiple Data) variants for multi-node execution

All implementations maintain full API compatibility with scikit-learn while providing significant performance improvements through Intel hardware acceleration.

Capabilities

Patching and Configuration

Core functions for enabling Intel optimizations globally and managing configuration settings. These functions control how scikit-learn algorithms are accelerated.

def patch_sklearn(): ...
def unpatch_sklearn(): ...
def sklearn_is_patched() -> bool: ...
def get_patch_map() -> dict: ...
def get_patch_names() -> list: ...
def is_patched_instance(estimator) -> bool: ...
def set_config(**params): ...
def get_config() -> dict: ...
def get_hyperparameters() -> dict: ...

Patching and Configuration

Clustering Algorithms

High-performance implementations of clustering algorithms including K-means and DBSCAN with Intel hardware acceleration.

class KMeans:
    def __init__(self, n_clusters=8, **kwargs): ...
    def fit(self, X, y=None): ...
    def predict(self, X): ...

class DBSCAN:
    def __init__(self, eps=0.5, min_samples=5, **kwargs): ...
    def fit(self, X, y=None): ...
    def fit_predict(self, X, y=None): ...

Clustering

Linear Models

Accelerated linear regression, logistic regression, and regularized models with Intel optimization for large datasets.

class LinearRegression:
    def __init__(self, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...

class LogisticRegression:
    def __init__(self, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...
    def predict_proba(self, X): ...

class Ridge:
    def __init__(self, alpha=1.0, **kwargs): ...

class Lasso:
    def __init__(self, alpha=1.0, **kwargs): ...

class ElasticNet:
    def __init__(self, alpha=1.0, l1_ratio=0.5, **kwargs): ...

class IncrementalLinearRegression:
    def __init__(self, **kwargs): ...
    def partial_fit(self, X, y): ...

Linear Models

Ensemble Methods

Intel-accelerated ensemble algorithms including Random Forest and Extra Trees for both classification and regression.

class RandomForestClassifier:
    def __init__(self, n_estimators=100, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...
    def predict_proba(self, X): ...

class RandomForestRegressor:
    def __init__(self, n_estimators=100, **kwargs): ...

class ExtraTreesClassifier:
    def __init__(self, n_estimators=100, **kwargs): ...

class ExtraTreesRegressor:
    def __init__(self, n_estimators=100, **kwargs): ...

Ensemble Methods

Dimensionality Reduction

Principal Component Analysis with Intel acceleration for efficient dimensionality reduction on large datasets.

class PCA:
    def __init__(self, n_components=None, **kwargs): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...
    def fit_transform(self, X, y=None): ...

Decomposition

Nearest Neighbors

Accelerated k-nearest neighbors algorithms for classification, regression, and unsupervised learning with optimized distance computations.

class KNeighborsClassifier:
    def __init__(self, n_neighbors=5, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...
    def predict_proba(self, X): ...

class KNeighborsRegressor:
    def __init__(self, n_neighbors=5, **kwargs): ...

class NearestNeighbors:
    def __init__(self, n_neighbors=5, **kwargs): ...
    def fit(self, X, y=None): ...
    def kneighbors(self, X=None, n_neighbors=None, return_distance=True): ...

class LocalOutlierFactor:
    def __init__(self, n_neighbors=20, **kwargs): ...
    def fit_predict(self, X): ...

Nearest Neighbors

Support Vector Machines

Intel-optimized Support Vector Machine implementations for classification and regression with accelerated kernel computations.

class SVC:
    def __init__(self, **kwargs): ...
    def fit(self, X, y): ...
    def predict(self, X): ...

class SVR:
    def __init__(self, **kwargs): ...

class NuSVC:
    def __init__(self, **kwargs): ...

class NuSVR:
    def __init__(self, **kwargs): ...

Support Vector Machines

Metrics and Model Selection

Performance metrics and data splitting utilities with Intel acceleration for large-scale evaluation.

def roc_auc_score(y_true, y_score, **kwargs): ...
def pairwise_distances(X, Y=None, metric='euclidean', **kwargs): ...
def train_test_split(*arrays, **options): ...

Metrics and Model Selection

Basic Statistics and Manifold Learning

Statistical computations and manifold learning algorithms with Intel optimization.

class BasicStatistics:
    def __init__(self, **kwargs): ...
    def fit(self, X, y=None): ...

class IncrementalBasicStatistics:
    def __init__(self, **kwargs): ...
    def partial_fit(self, X, y=None): ...

class IncrementalEmpiricalCovariance:
    def __init__(self, **kwargs): ...
    def fit(self, X, y=None): ...
    def partial_fit(self, X, y=None): ...

class TSNE:
    def __init__(self, n_components=2, **kwargs): ...
    def fit_transform(self, X, y=None): ...

Statistics and Manifold Learning

Model Builder API

Convert external gradient boosting models (XGBoost, LightGBM, CatBoost) to Intel oneDAL format for accelerated inference.

from daal4py.mb import GBTDAALBaseModel, convert_model

def convert_model(model): ...

class GBTDAALBaseModel:
    def __init__(self): ...

Model Builder API

Advanced Features

Preview and SPMD (distributed) capabilities for cutting-edge algorithms and multi-node execution.

# Preview features (requires SKLEARNEX_PREVIEW environment variable)
from sklearnex.preview.covariance import EmpiricalCovariance
from sklearnex.preview.decomposition import IncrementalPCA

# SPMD distributed computing
from sklearnex.spmd.cluster import KMeans as SPMDKMeans
from sklearnex.spmd.linear_model import LinearRegression as SPMDLinearRegression

# Utility functions
from sklearnex.utils import get_namespace, _assert_all_finite

Advanced Features

Environment Variables

  • OFF_ONEDAL_IFACE: Set to "1" to disable oneDAL interface
  • SKLEARNEX_PREVIEW: Enable preview features
  • DALROOT: Path to Intel oneDAL installation

Performance Notes

  • Expect 10-100x speedups on Intel hardware
  • Optimizations work best with larger datasets (>1000 samples)
  • All optimized algorithms maintain identical APIs to scikit-learn
  • Can be used as drop-in replacements in existing code

docs

advanced.md

clustering.md

daal4py-mb.md

decomposition.md

ensemble.md

index.md

linear-models.md

metrics-model-selection.md

neighbors.md

patching-config.md

stats-manifold.md

svm.md

tile.json