0
# LightGBM
1
2
LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be distributed and efficient with faster training speed, higher efficiency, lower memory usage, better accuracy, and support for parallel, distributed, and GPU learning. It provides a comprehensive machine learning library for gradient boosting with capabilities for handling large-scale data, featuring a scikit-learn compatible API, support for various data formats including pandas DataFrames and NumPy arrays, advanced hyperparameter tuning integration, and cross-platform compatibility.
3
4
## Package Information
5
6
- **Package Name**: lightgbm
7
- **Language**: Python
8
- **Installation**: `pip install lightgbm`
9
- **Optional Dependencies**:
10
- Dask: `pip install lightgbm[dask]`
11
- Pandas: `pip install lightgbm[pandas]`
12
- Scikit-learn: `pip install lightgbm[scikit-learn]`
13
- Arrow: `pip install lightgbm[arrow]`
14
15
## Core Imports
16
17
```python
18
import lightgbm as lgb
19
```
20
21
Import specific components:
22
23
```python
24
from lightgbm import (
25
LGBMRegressor, LGBMClassifier, LGBMRanker, # Scikit-learn interface
26
Booster, Dataset, # Core components
27
train, cv, # Training functions
28
plot_importance, plot_tree # Visualization
29
)
30
```
31
32
## Basic Usage
33
34
```python
35
import lightgbm as lgb
36
import numpy as np
37
from sklearn.datasets import load_breast_cancer
38
from sklearn.model_selection import train_test_split
39
40
# Load data
41
data = load_breast_cancer()
42
X_train, X_test, y_train, y_test = train_test_split(
43
data.data, data.target, test_size=0.2, random_state=42
44
)
45
46
# Method 1: Using scikit-learn interface (recommended for most users)
47
model = lgb.LGBMClassifier(
48
objective='binary',
49
num_leaves=31,
50
learning_rate=0.05,
51
feature_fraction=0.9
52
)
53
model.fit(X_train, y_train)
54
predictions = model.predict(X_test)
55
probabilities = model.predict_proba(X_test)
56
57
# Method 2: Using native LightGBM interface (for advanced control)
58
train_data = lgb.Dataset(X_train, label=y_train)
59
params = {
60
'objective': 'binary',
61
'metric': 'binary_logloss',
62
'boosting_type': 'gbdt',
63
'num_leaves': 31,
64
'learning_rate': 0.05,
65
'feature_fraction': 0.9
66
}
67
model = lgb.train(params, train_data, num_boost_round=100)
68
predictions = model.predict(X_test)
69
```
70
71
## Architecture
72
73
LightGBM's architecture provides flexibility through multiple interfaces:
74
75
- **Core Components**: `Booster` and `Dataset` provide low-level model control and efficient data handling
76
- **Scikit-learn Interface**: `LGBMRegressor`, `LGBMClassifier`, `LGBMRanker` offer familiar sklearn-compatible APIs
77
- **Training Functions**: `train()` and `cv()` enable direct model training and cross-validation
78
- **Distributed Computing**: Dask integration enables scalable training across multiple machines
79
- **Visualization**: Built-in plotting functions for model interpretation and analysis
80
- **Callbacks**: Extensible training control with early stopping, logging, and custom callbacks
81
82
This design enables LightGBM to serve both as a high-performance gradient boosting engine and a comprehensive machine learning framework suitable for production environments.
83
84
## Capabilities
85
86
### Scikit-learn Compatible Models
87
88
High-level, sklearn-compatible interface for regression, classification, and ranking tasks. Provides familiar `.fit()`, `.predict()`, and `.score()` methods with automatic hyperparameter handling and feature processing.
89
90
```python { .api }
91
class LGBMRegressor:
92
def fit(self, X, y, **kwargs): ...
93
def predict(self, X, **kwargs): ...
94
def score(self, X, y, **kwargs): ...
95
96
class LGBMClassifier:
97
def fit(self, X, y, **kwargs): ...
98
def predict(self, X, **kwargs): ...
99
def predict_proba(self, X, **kwargs): ...
100
def score(self, X, y, **kwargs): ...
101
102
class LGBMRanker:
103
def fit(self, X, y, **kwargs): ...
104
def predict(self, X, **kwargs): ...
105
def score(self, X, y, **kwargs): ...
106
```
107
108
[Scikit-learn Interface](./sklearn-interface.md)
109
110
### Core Model Training
111
112
Low-level LightGBM interface providing direct access to the gradient boosting engine. Enables advanced model control, custom objectives, evaluation functions, and fine-tuned training procedures.
113
114
```python { .api }
115
class Booster:
116
def __init__(self, params, train_set, **kwargs): ...
117
def predict(self, data, **kwargs): ...
118
def update(self, train_set, fobj): ...
119
def feature_importance(self, importance_type='split'): ...
120
def save_model(self, filename): ...
121
122
class Dataset:
123
def __init__(self, data, label=None, **kwargs): ...
124
def construct(): ...
125
def create_valid(data, **kwargs): ...
126
def set_field(field_name, data): ...
127
128
def train(params, train_set, **kwargs): ...
129
def cv(params, train_set, **kwargs): ...
130
```
131
132
[Core Training](./core-training.md)
133
134
### Distributed Computing
135
136
Distributed training and prediction using Dask for scalable machine learning across multiple machines. Provides all the functionality of standard LightGBM models with automatic data distribution and parallel processing.
137
138
```python { .api }
139
class DaskLGBMRegressor:
140
def fit(self, X, y, **kwargs): ...
141
def predict(self, X, **kwargs): ...
142
143
class DaskLGBMClassifier:
144
def fit(self, X, y, **kwargs): ...
145
def predict(self, X, **kwargs): ...
146
def predict_proba(self, X, **kwargs): ...
147
148
class DaskLGBMRanker:
149
def fit(self, X, y, **kwargs): ...
150
def predict(self, X, **kwargs): ...
151
```
152
153
[Distributed Computing](./distributed-computing.md)
154
155
### Visualization and Model Interpretation
156
157
Built-in plotting functions for model interpretation, feature importance analysis, training progress monitoring, and tree structure visualization. Supports both matplotlib and graphviz backends.
158
159
```python { .api }
160
def plot_importance(booster, **kwargs): ...
161
def plot_metric(eval_result, **kwargs): ...
162
def plot_tree(booster, **kwargs): ...
163
def plot_split_value_histogram(booster, **kwargs): ...
164
def create_tree_digraph(booster, **kwargs): ...
165
```
166
167
[Visualization](./visualization.md)
168
169
### Training Control and Callbacks
170
171
Flexible training control through callback functions enabling early stopping, evaluation logging, parameter adjustment, and custom training behaviors. Supports both built-in and custom callback implementations.
172
173
```python { .api }
174
def early_stopping(stopping_rounds, **kwargs): ...
175
def log_evaluation(period=1, **kwargs): ...
176
def record_evaluation(eval_result): ...
177
def reset_parameter(**kwargs): ...
178
179
class EarlyStopException(Exception): ...
180
```
181
182
[Training Callbacks](./training-callbacks.md)