LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be distributed and efficient with faster training speed, higher efficiency, lower memory usage, better accuracy, and support for parallel, distributed, and GPU learning.
npx @tessl/cli install tessl/pypi-lightgbm@4.6.0LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be distributed and efficient with faster training speed, higher efficiency, lower memory usage, better accuracy, and support for parallel, distributed, and GPU learning. It provides a comprehensive machine learning library for gradient boosting with capabilities for handling large-scale data, featuring a scikit-learn compatible API, support for various data formats including pandas DataFrames and NumPy arrays, advanced hyperparameter tuning integration, and cross-platform compatibility.
pip install lightgbmpip install lightgbm[dask]pip install lightgbm[pandas]pip install lightgbm[scikit-learn]pip install lightgbm[arrow]import lightgbm as lgbImport specific components:
from lightgbm import (
LGBMRegressor, LGBMClassifier, LGBMRanker, # Scikit-learn interface
Booster, Dataset, # Core components
train, cv, # Training functions
plot_importance, plot_tree # Visualization
)import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42
)
# Method 1: Using scikit-learn interface (recommended for most users)
model = lgb.LGBMClassifier(
objective='binary',
num_leaves=31,
learning_rate=0.05,
feature_fraction=0.9
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)
# Method 2: Using native LightGBM interface (for advanced control)
train_data = lgb.Dataset(X_train, label=y_train)
params = {
'objective': 'binary',
'metric': 'binary_logloss',
'boosting_type': 'gbdt',
'num_leaves': 31,
'learning_rate': 0.05,
'feature_fraction': 0.9
}
model = lgb.train(params, train_data, num_boost_round=100)
predictions = model.predict(X_test)LightGBM's architecture provides flexibility through multiple interfaces:
Booster and Dataset provide low-level model control and efficient data handlingLGBMRegressor, LGBMClassifier, LGBMRanker offer familiar sklearn-compatible APIstrain() and cv() enable direct model training and cross-validationThis design enables LightGBM to serve both as a high-performance gradient boosting engine and a comprehensive machine learning framework suitable for production environments.
High-level, sklearn-compatible interface for regression, classification, and ranking tasks. Provides familiar .fit(), .predict(), and .score() methods with automatic hyperparameter handling and feature processing.
class LGBMRegressor:
def fit(self, X, y, **kwargs): ...
def predict(self, X, **kwargs): ...
def score(self, X, y, **kwargs): ...
class LGBMClassifier:
def fit(self, X, y, **kwargs): ...
def predict(self, X, **kwargs): ...
def predict_proba(self, X, **kwargs): ...
def score(self, X, y, **kwargs): ...
class LGBMRanker:
def fit(self, X, y, **kwargs): ...
def predict(self, X, **kwargs): ...
def score(self, X, y, **kwargs): ...Low-level LightGBM interface providing direct access to the gradient boosting engine. Enables advanced model control, custom objectives, evaluation functions, and fine-tuned training procedures.
class Booster:
def __init__(self, params, train_set, **kwargs): ...
def predict(self, data, **kwargs): ...
def update(self, train_set, fobj): ...
def feature_importance(self, importance_type='split'): ...
def save_model(self, filename): ...
class Dataset:
def __init__(self, data, label=None, **kwargs): ...
def construct(): ...
def create_valid(data, **kwargs): ...
def set_field(field_name, data): ...
def train(params, train_set, **kwargs): ...
def cv(params, train_set, **kwargs): ...Distributed training and prediction using Dask for scalable machine learning across multiple machines. Provides all the functionality of standard LightGBM models with automatic data distribution and parallel processing.
class DaskLGBMRegressor:
def fit(self, X, y, **kwargs): ...
def predict(self, X, **kwargs): ...
class DaskLGBMClassifier:
def fit(self, X, y, **kwargs): ...
def predict(self, X, **kwargs): ...
def predict_proba(self, X, **kwargs): ...
class DaskLGBMRanker:
def fit(self, X, y, **kwargs): ...
def predict(self, X, **kwargs): ...Built-in plotting functions for model interpretation, feature importance analysis, training progress monitoring, and tree structure visualization. Supports both matplotlib and graphviz backends.
def plot_importance(booster, **kwargs): ...
def plot_metric(eval_result, **kwargs): ...
def plot_tree(booster, **kwargs): ...
def plot_split_value_histogram(booster, **kwargs): ...
def create_tree_digraph(booster, **kwargs): ...Flexible training control through callback functions enabling early stopping, evaluation logging, parameter adjustment, and custom training behaviors. Supports both built-in and custom callback implementations.
def early_stopping(stopping_rounds, **kwargs): ...
def log_evaluation(period=1, **kwargs): ...
def record_evaluation(eval_result): ...
def reset_parameter(**kwargs): ...
class EarlyStopException(Exception): ...