XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable
npx @tessl/cli install tessl/pypi-xgboost@3.0.00
# XGBoost
1
2
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework, providing parallel tree boosting (GBDT, GBM) that solves data science problems in a fast and accurate way. The library runs on major distributed environments and can handle problems beyond billions of examples.
3
4
## Package Information
5
6
- **Package Name**: xgboost
7
- **Language**: Python
8
- **Installation**: `pip install xgboost`
9
10
## Core Imports
11
12
```python
13
import xgboost as xgb
14
```
15
16
For scikit-learn compatible estimators:
17
18
```python
19
from xgboost import XGBClassifier, XGBRegressor, XGBRanker
20
```
21
22
For core functionality:
23
24
```python
25
from xgboost import DMatrix, Booster, train, cv
26
```
27
28
## Basic Usage
29
30
```python
31
import xgboost as xgb
32
import numpy as np
33
from sklearn.datasets import load_boston
34
from sklearn.model_selection import train_test_split
35
36
# Load sample data
37
X, y = load_boston(return_X_y=True)
38
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
39
40
# Method 1: Using XGBoost native API
41
dtrain = xgb.DMatrix(X_train, label=y_train)
42
dtest = xgb.DMatrix(X_test, label=y_test)
43
44
params = {
45
'objective': 'reg:squarederror',
46
'max_depth': 3,
47
'learning_rate': 0.1,
48
'n_estimators': 100
49
}
50
51
model = xgb.train(params, dtrain, num_boost_round=100)
52
predictions = model.predict(dtest)
53
54
# Method 2: Using scikit-learn API
55
model = xgb.XGBRegressor(max_depth=3, learning_rate=0.1, n_estimators=100)
56
model.fit(X_train, y_train)
57
predictions = model.predict(X_test)
58
```
59
60
## Architecture
61
62
XGBoost provides multiple interfaces for different use cases:
63
64
- **Core API**: Native XGBoost interface with DMatrix for data and Booster for models
65
- **Scikit-Learn API**: Drop-in replacement estimators compatible with sklearn pipelines
66
- **Distributed Computing**: Integration with Dask, Spark, and collective communication
67
- **Specialized Features**: Quantile regression, ranking, federated learning
68
69
The library is built around efficient gradient boosting with optimizations for speed, memory usage, and scalability across different computing environments.
70
71
## Capabilities
72
73
### Core Data Structures and Training
74
75
Fundamental XGBoost data structures and training functions that form the core of the library. Includes DMatrix for efficient data handling and training functions for model creation.
76
77
```python { .api }
78
class DMatrix:
79
def __init__(self, data, label=None, **kwargs): ...
80
81
class Booster:
82
def predict(self, data, **kwargs): ...
83
def save_model(self, fname): ...
84
85
def train(params, dtrain, num_boost_round=10, **kwargs): ...
86
def cv(params, dtrain, num_boost_round=10, **kwargs): ...
87
```
88
89
[Core API](./core-api.md)
90
91
### Scikit-Learn Compatible Estimators
92
93
Drop-in replacement estimators that follow scikit-learn conventions for seamless integration with existing ML pipelines. Includes classifiers, regressors, and rankers.
94
95
```python { .api }
96
class XGBRegressor:
97
def fit(self, X, y, **kwargs): ...
98
def predict(self, X): ...
99
100
class XGBClassifier:
101
def fit(self, X, y, **kwargs): ...
102
def predict(self, X): ...
103
def predict_proba(self, X): ...
104
105
class XGBRanker:
106
def fit(self, X, y, **kwargs): ...
107
def predict(self, X): ...
108
```
109
110
[Scikit-Learn Interface](./sklearn-interface.md)
111
112
### Distributed Computing
113
114
Distributed training and prediction capabilities for large-scale machine learning across multiple workers and computing environments.
115
116
```python { .api }
117
# Dask integration
118
from xgboost.dask import DaskXGBRegressor, DaskXGBClassifier
119
120
# Spark integration
121
from xgboost.spark import SparkXGBRegressor, SparkXGBClassifier
122
123
# Collective communication
124
import xgboost.collective as collective
125
```
126
127
[Distributed Computing](./distributed-computing.md)
128
129
### Visualization and Model Interpretation
130
131
Tools for visualizing model structure, feature importance, and decision trees to understand and interpret XGBoost models.
132
133
```python { .api }
134
def plot_importance(booster, **kwargs): ...
135
def plot_tree(booster, **kwargs): ...
136
def to_graphviz(booster, **kwargs): ...
137
```
138
139
[Visualization](./visualization.md)
140
141
### Training Callbacks
142
143
Comprehensive callback system for monitoring and controlling the training process, including early stopping, learning rate scheduling, and model checkpointing.
144
145
```python { .api }
146
from xgboost.callback import (
147
TrainingCallback,
148
EarlyStopping,
149
LearningRateScheduler,
150
EvaluationMonitor,
151
TrainingCheckPoint
152
)
153
```
154
155
[Callbacks](./callbacks.md)
156
157
### Configuration and Utilities
158
159
Global configuration management, build information, and utility functions for customizing XGBoost behavior and accessing system information.
160
161
```python { .api }
162
def set_config(**kwargs): ...
163
def get_config(): ...
164
def config_context(**kwargs): ...
165
def build_info(): ...
166
```
167
168
[Configuration](./configuration.md)
169
170
## Types
171
172
### Core Types
173
174
```python { .api }
175
from typing import Dict, List, Optional, Union, Any
176
import numpy as np
177
178
# Data types
179
ArrayLike = Union[np.ndarray, List, tuple, 'pd.DataFrame', 'scipy.sparse.matrix']
180
FeatureNames = Optional[Union[str, List[str]]]
181
FeatureTypes = Optional[List[str]]
182
183
# Parameter types
184
BoosterParam = Dict[str, Any]
185
```