or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-training.mddistributed-computing.mdindex.mdsklearn-interface.mdtraining-callbacks.mdvisualization.md

index.mddocs/

0

# LightGBM

1

2

LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be distributed and efficient with faster training speed, higher efficiency, lower memory usage, better accuracy, and support for parallel, distributed, and GPU learning. It provides a comprehensive machine learning library for gradient boosting with capabilities for handling large-scale data, featuring a scikit-learn compatible API, support for various data formats including pandas DataFrames and NumPy arrays, advanced hyperparameter tuning integration, and cross-platform compatibility.

3

4

## Package Information

5

6

- **Package Name**: lightgbm

7

- **Language**: Python

8

- **Installation**: `pip install lightgbm`

9

- **Optional Dependencies**:

10

- Dask: `pip install lightgbm[dask]`

11

- Pandas: `pip install lightgbm[pandas]`

12

- Scikit-learn: `pip install lightgbm[scikit-learn]`

13

- Arrow: `pip install lightgbm[arrow]`

14

15

## Core Imports

16

17

```python

18

import lightgbm as lgb

19

```

20

21

Import specific components:

22

23

```python

24

from lightgbm import (

25

LGBMRegressor, LGBMClassifier, LGBMRanker, # Scikit-learn interface

26

Booster, Dataset, # Core components

27

train, cv, # Training functions

28

plot_importance, plot_tree # Visualization

29

)

30

```

31

32

## Basic Usage

33

34

```python

35

import lightgbm as lgb

36

import numpy as np

37

from sklearn.datasets import load_breast_cancer

38

from sklearn.model_selection import train_test_split

39

40

# Load data

41

data = load_breast_cancer()

42

X_train, X_test, y_train, y_test = train_test_split(

43

data.data, data.target, test_size=0.2, random_state=42

44

)

45

46

# Method 1: Using scikit-learn interface (recommended for most users)

47

model = lgb.LGBMClassifier(

48

objective='binary',

49

num_leaves=31,

50

learning_rate=0.05,

51

feature_fraction=0.9

52

)

53

model.fit(X_train, y_train)

54

predictions = model.predict(X_test)

55

probabilities = model.predict_proba(X_test)

56

57

# Method 2: Using native LightGBM interface (for advanced control)

58

train_data = lgb.Dataset(X_train, label=y_train)

59

params = {

60

'objective': 'binary',

61

'metric': 'binary_logloss',

62

'boosting_type': 'gbdt',

63

'num_leaves': 31,

64

'learning_rate': 0.05,

65

'feature_fraction': 0.9

66

}

67

model = lgb.train(params, train_data, num_boost_round=100)

68

predictions = model.predict(X_test)

69

```

70

71

## Architecture

72

73

LightGBM's architecture provides flexibility through multiple interfaces:

74

75

- **Core Components**: `Booster` and `Dataset` provide low-level model control and efficient data handling

76

- **Scikit-learn Interface**: `LGBMRegressor`, `LGBMClassifier`, `LGBMRanker` offer familiar sklearn-compatible APIs

77

- **Training Functions**: `train()` and `cv()` enable direct model training and cross-validation

78

- **Distributed Computing**: Dask integration enables scalable training across multiple machines

79

- **Visualization**: Built-in plotting functions for model interpretation and analysis

80

- **Callbacks**: Extensible training control with early stopping, logging, and custom callbacks

81

82

This design enables LightGBM to serve both as a high-performance gradient boosting engine and a comprehensive machine learning framework suitable for production environments.

83

84

## Capabilities

85

86

### Scikit-learn Compatible Models

87

88

High-level, sklearn-compatible interface for regression, classification, and ranking tasks. Provides familiar `.fit()`, `.predict()`, and `.score()` methods with automatic hyperparameter handling and feature processing.

89

90

```python { .api }

91

class LGBMRegressor:

92

def fit(self, X, y, **kwargs): ...

93

def predict(self, X, **kwargs): ...

94

def score(self, X, y, **kwargs): ...

95

96

class LGBMClassifier:

97

def fit(self, X, y, **kwargs): ...

98

def predict(self, X, **kwargs): ...

99

def predict_proba(self, X, **kwargs): ...

100

def score(self, X, y, **kwargs): ...

101

102

class LGBMRanker:

103

def fit(self, X, y, **kwargs): ...

104

def predict(self, X, **kwargs): ...

105

def score(self, X, y, **kwargs): ...

106

```

107

108

[Scikit-learn Interface](./sklearn-interface.md)

109

110

### Core Model Training

111

112

Low-level LightGBM interface providing direct access to the gradient boosting engine. Enables advanced model control, custom objectives, evaluation functions, and fine-tuned training procedures.

113

114

```python { .api }

115

class Booster:

116

def __init__(self, params, train_set, **kwargs): ...

117

def predict(self, data, **kwargs): ...

118

def update(self, train_set, fobj): ...

119

def feature_importance(self, importance_type='split'): ...

120

def save_model(self, filename): ...

121

122

class Dataset:

123

def __init__(self, data, label=None, **kwargs): ...

124

def construct(): ...

125

def create_valid(data, **kwargs): ...

126

def set_field(field_name, data): ...

127

128

def train(params, train_set, **kwargs): ...

129

def cv(params, train_set, **kwargs): ...

130

```

131

132

[Core Training](./core-training.md)

133

134

### Distributed Computing

135

136

Distributed training and prediction using Dask for scalable machine learning across multiple machines. Provides all the functionality of standard LightGBM models with automatic data distribution and parallel processing.

137

138

```python { .api }

139

class DaskLGBMRegressor:

140

def fit(self, X, y, **kwargs): ...

141

def predict(self, X, **kwargs): ...

142

143

class DaskLGBMClassifier:

144

def fit(self, X, y, **kwargs): ...

145

def predict(self, X, **kwargs): ...

146

def predict_proba(self, X, **kwargs): ...

147

148

class DaskLGBMRanker:

149

def fit(self, X, y, **kwargs): ...

150

def predict(self, X, **kwargs): ...

151

```

152

153

[Distributed Computing](./distributed-computing.md)

154

155

### Visualization and Model Interpretation

156

157

Built-in plotting functions for model interpretation, feature importance analysis, training progress monitoring, and tree structure visualization. Supports both matplotlib and graphviz backends.

158

159

```python { .api }

160

def plot_importance(booster, **kwargs): ...

161

def plot_metric(eval_result, **kwargs): ...

162

def plot_tree(booster, **kwargs): ...

163

def plot_split_value_histogram(booster, **kwargs): ...

164

def create_tree_digraph(booster, **kwargs): ...

165

```

166

167

[Visualization](./visualization.md)

168

169

### Training Control and Callbacks

170

171

Flexible training control through callback functions enabling early stopping, evaluation logging, parameter adjustment, and custom training behaviors. Supports both built-in and custom callback implementations.

172

173

```python { .api }

174

def early_stopping(stopping_rounds, **kwargs): ...

175

def log_evaluation(period=1, **kwargs): ...

176

def record_evaluation(eval_result): ...

177

def reset_parameter(**kwargs): ...

178

179

class EarlyStopException(Exception): ...

180

```

181

182

[Training Callbacks](./training-callbacks.md)