or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

core-training.mddistributed-computing.mdindex.mdsklearn-interface.mdtraining-callbacks.mdvisualization.md
tile.json

tessl/pypi-lightgbm

LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be distributed and efficient with faster training speed, higher efficiency, lower memory usage, better accuracy, and support for parallel, distributed, and GPU learning.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/lightgbm@4.6.x

To install, run

npx @tessl/cli install tessl/pypi-lightgbm@4.6.0

index.mddocs/

LightGBM

LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be distributed and efficient with faster training speed, higher efficiency, lower memory usage, better accuracy, and support for parallel, distributed, and GPU learning. It provides a comprehensive machine learning library for gradient boosting with capabilities for handling large-scale data, featuring a scikit-learn compatible API, support for various data formats including pandas DataFrames and NumPy arrays, advanced hyperparameter tuning integration, and cross-platform compatibility.

Package Information

  • Package Name: lightgbm
  • Language: Python
  • Installation: pip install lightgbm
  • Optional Dependencies:
    • Dask: pip install lightgbm[dask]
    • Pandas: pip install lightgbm[pandas]
    • Scikit-learn: pip install lightgbm[scikit-learn]
    • Arrow: pip install lightgbm[arrow]

Core Imports

import lightgbm as lgb

Import specific components:

from lightgbm import (
    LGBMRegressor, LGBMClassifier, LGBMRanker,  # Scikit-learn interface
    Booster, Dataset,  # Core components
    train, cv,  # Training functions
    plot_importance, plot_tree  # Visualization
)

Basic Usage

import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)

# Method 1: Using scikit-learn interface (recommended for most users)
model = lgb.LGBMClassifier(
    objective='binary',
    num_leaves=31,
    learning_rate=0.05,
    feature_fraction=0.9
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)

# Method 2: Using native LightGBM interface (for advanced control)
train_data = lgb.Dataset(X_train, label=y_train)
params = {
    'objective': 'binary',
    'metric': 'binary_logloss',
    'boosting_type': 'gbdt',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}
model = lgb.train(params, train_data, num_boost_round=100)
predictions = model.predict(X_test)

Architecture

LightGBM's architecture provides flexibility through multiple interfaces:

  • Core Components: Booster and Dataset provide low-level model control and efficient data handling
  • Scikit-learn Interface: LGBMRegressor, LGBMClassifier, LGBMRanker offer familiar sklearn-compatible APIs
  • Training Functions: train() and cv() enable direct model training and cross-validation
  • Distributed Computing: Dask integration enables scalable training across multiple machines
  • Visualization: Built-in plotting functions for model interpretation and analysis
  • Callbacks: Extensible training control with early stopping, logging, and custom callbacks

This design enables LightGBM to serve both as a high-performance gradient boosting engine and a comprehensive machine learning framework suitable for production environments.

Capabilities

Scikit-learn Compatible Models

High-level, sklearn-compatible interface for regression, classification, and ranking tasks. Provides familiar .fit(), .predict(), and .score() methods with automatic hyperparameter handling and feature processing.

class LGBMRegressor:
    def fit(self, X, y, **kwargs): ...
    def predict(self, X, **kwargs): ...
    def score(self, X, y, **kwargs): ...

class LGBMClassifier:
    def fit(self, X, y, **kwargs): ...
    def predict(self, X, **kwargs): ...
    def predict_proba(self, X, **kwargs): ...
    def score(self, X, y, **kwargs): ...

class LGBMRanker:
    def fit(self, X, y, **kwargs): ...
    def predict(self, X, **kwargs): ...
    def score(self, X, y, **kwargs): ...

Scikit-learn Interface

Core Model Training

Low-level LightGBM interface providing direct access to the gradient boosting engine. Enables advanced model control, custom objectives, evaluation functions, and fine-tuned training procedures.

class Booster:
    def __init__(self, params, train_set, **kwargs): ...
    def predict(self, data, **kwargs): ...
    def update(self, train_set, fobj): ...
    def feature_importance(self, importance_type='split'): ...
    def save_model(self, filename): ...

class Dataset:
    def __init__(self, data, label=None, **kwargs): ...
    def construct(): ...
    def create_valid(data, **kwargs): ...
    def set_field(field_name, data): ...

def train(params, train_set, **kwargs): ...
def cv(params, train_set, **kwargs): ...

Core Training

Distributed Computing

Distributed training and prediction using Dask for scalable machine learning across multiple machines. Provides all the functionality of standard LightGBM models with automatic data distribution and parallel processing.

class DaskLGBMRegressor:
    def fit(self, X, y, **kwargs): ...
    def predict(self, X, **kwargs): ...

class DaskLGBMClassifier:
    def fit(self, X, y, **kwargs): ...
    def predict(self, X, **kwargs): ...
    def predict_proba(self, X, **kwargs): ...

class DaskLGBMRanker:
    def fit(self, X, y, **kwargs): ...
    def predict(self, X, **kwargs): ...

Distributed Computing

Visualization and Model Interpretation

Built-in plotting functions for model interpretation, feature importance analysis, training progress monitoring, and tree structure visualization. Supports both matplotlib and graphviz backends.

def plot_importance(booster, **kwargs): ...
def plot_metric(eval_result, **kwargs): ...
def plot_tree(booster, **kwargs): ...
def plot_split_value_histogram(booster, **kwargs): ...
def create_tree_digraph(booster, **kwargs): ...

Visualization

Training Control and Callbacks

Flexible training control through callback functions enabling early stopping, evaluation logging, parameter adjustment, and custom training behaviors. Supports both built-in and custom callback implementations.

def early_stopping(stopping_rounds, **kwargs): ...
def log_evaluation(period=1, **kwargs): ...
def record_evaluation(eval_result): ...
def reset_parameter(**kwargs): ...

class EarlyStopException(Exception): ...

Training Callbacks