0
# XGBoost-CPU
1
2
XGBoost Python Package (CPU only) - A minimal installation with no support for GPU algorithms or federated learning, providing optimized distributed gradient boosting for machine learning. XGBoost is an optimized distributed gradient boosting library designed for high efficiency, flexibility, and portability, implementing machine learning algorithms under the Gradient Boosting framework.
3
4
## Package Information
5
6
- **Package Name**: xgboost-cpu
7
- **Language**: Python
8
- **Installation**: `pip install xgboost-cpu`
9
- **Documentation**: https://xgboost.readthedocs.io/en/stable/
10
11
## Core Imports
12
13
```python
14
import xgboost as xgb
15
```
16
17
Common imports for different use cases:
18
19
```python
20
# Core functionality
21
from xgboost import DMatrix, Booster, train, cv
22
23
# Scikit-learn interface
24
from xgboost import XGBClassifier, XGBRegressor, XGBRanker
25
26
# Distributed computing
27
from xgboost import dask as dxgb # Dask integration
28
from xgboost import spark as spark_xgb # Spark integration
29
30
# Utilities
31
from xgboost import plot_importance, plot_tree
32
from xgboost import get_config, set_config
33
```
34
35
## Basic Usage
36
37
```python
38
import xgboost as xgb
39
import numpy as np
40
from sklearn.datasets import make_classification
41
from sklearn.model_selection import train_test_split
42
43
# Create sample data
44
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
45
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
46
47
# Method 1: Using XGBoost's native API
48
dtrain = xgb.DMatrix(X_train, label=y_train)
49
dtest = xgb.DMatrix(X_test, label=y_test)
50
51
params = {
52
'objective': 'binary:logistic',
53
'eval_metric': 'logloss',
54
'max_depth': 6,
55
'learning_rate': 0.1
56
}
57
58
model = xgb.train(params, dtrain, num_boost_round=100,
59
evals=[(dtrain, 'train'), (dtest, 'test')])
60
61
# Make predictions
62
y_pred = model.predict(dtest)
63
64
# Method 2: Using scikit-learn interface
65
from xgboost import XGBClassifier
66
67
clf = XGBClassifier(objective='binary:logistic', max_depth=6,
68
learning_rate=0.1, n_estimators=100)
69
clf.fit(X_train, y_train)
70
y_pred_sklearn = clf.predict_proba(X_test)[:, 1]
71
72
# Visualize feature importance
73
xgb.plot_importance(model, max_num_features=10)
74
```
75
76
## Architecture
77
78
XGBoost provides multiple interfaces and deployment options:
79
80
- **Core API**: Native XGBoost data structures (DMatrix, Booster) and training functions for maximum control and performance
81
- **Scikit-learn Interface**: Drop-in replacements for sklearn estimators with familiar fit/predict API
82
- **Distributed Computing**: Native support for Dask and Spark ecosystems for scalable training
83
- **Data Handling**: Optimized data structures with support for sparse matrices, missing values, and external memory
84
- **Model Interpretation**: Built-in visualization and feature importance tools
85
86
This design enables XGBoost to serve as both a high-performance gradient boosting engine and an accessible machine learning library that integrates seamlessly with the Python data science ecosystem.
87
88
## Capabilities
89
90
### Core Data Structures and Models
91
92
Fundamental XGBoost data structures and model objects that provide the foundation for training and prediction. These include DMatrix for efficient data handling, Booster for trained models, and specialized variants for memory optimization.
93
94
```python { .api }
95
class DMatrix:
96
def __init__(self, data, label=None, *, weight=None, base_margin=None,
97
missing=None, silent=False, feature_names=None,
98
feature_types=None, nthread=None, group=None, qid=None,
99
label_lower_bound=None, label_upper_bound=None,
100
feature_weights=None, enable_categorical=False):
101
"""Optimized data matrix for XGBoost training and prediction."""
102
103
class Booster:
104
def __init__(self, params=None, cache=(), model_file=None):
105
"""XGBoost model containing training, prediction, and evaluation routines."""
106
107
def predict(self, data, *, output_margin=False, pred_leaf=False,
108
pred_contribs=False, approx_contribs=False,
109
pred_interactions=False, validate_features=True,
110
training=False, iteration_range=(0, 0), strict_shape=False):
111
"""Make predictions using the trained model."""
112
113
class QuantileDMatrix:
114
def __init__(self, data, label=None, *, ref=None, **kwargs):
115
"""Memory-efficient DMatrix variant using quantized data."""
116
```
117
118
[Core Data Structures and Models](./core-data-models.md)
119
120
### Training and Evaluation
121
122
Core training functions and cross-validation for model development. These functions provide the primary interface for training XGBoost models with extensive configuration options and evaluation capabilities.
123
124
```python { .api }
125
def train(params, dtrain, num_boost_round=10, evals=(), obj=None,
126
maximize=None, early_stopping_rounds=None, evals_result=None,
127
verbose_eval=True, xgb_model=None, callbacks=None, custom_metric=None):
128
"""Train a booster with given parameters."""
129
130
def cv(params, dtrain, num_boost_round=10, nfold=3, stratified=False,
131
folds=None, metrics=(), obj=None, maximize=None,
132
early_stopping_rounds=None, fpreproc=None, as_pandas=True,
133
verbose_eval=None, show_stdv=True, seed=0, callbacks=None,
134
shuffle=True, custom_metric=None):
135
"""Cross-validation with given parameters."""
136
```
137
138
[Training and Evaluation](./training-evaluation.md)
139
140
### Scikit-learn Interface
141
142
Drop-in replacements for scikit-learn estimators providing familiar fit/predict API with XGBoost's performance. Includes classifiers, regressors, rankers, and random forest variants.
143
144
```python { .api }
145
class XGBClassifier:
146
def __init__(self, *, max_depth=6, learning_rate=0.3, n_estimators=100,
147
objective=None, booster='gbtree', tree_method='auto',
148
n_jobs=None, gamma=0, min_child_weight=1, max_delta_step=0,
149
subsample=1, colsample_bytree=1, reg_alpha=0, reg_lambda=1,
150
scale_pos_weight=1, base_score=None, random_state=None,
151
missing=np.nan, **kwargs):
152
"""XGBoost classifier following scikit-learn API."""
153
154
def fit(self, X, y, *, sample_weight=None, base_margin=None,
155
eval_set=None, verbose=True, xgb_model=None,
156
sample_weight_eval_set=None, base_margin_eval_set=None,
157
feature_weights=None):
158
"""Fit the model to training data."""
159
160
def predict_proba(self, X, *, validate_features=True, base_margin=None,
161
iteration_range=None):
162
"""Predict class probabilities."""
163
164
class XGBRegressor:
165
"""XGBoost regressor following scikit-learn API."""
166
167
class XGBRanker:
168
"""XGBoost ranker for learning-to-rank tasks."""
169
```
170
171
[Scikit-learn Interface](./sklearn-interface.md)
172
173
### Distributed Computing
174
175
Native support for distributed training across Dask and Spark ecosystems, enabling scalable machine learning on large datasets and compute clusters.
176
177
```python { .api }
178
# Dask integration
179
from xgboost import dask as dxgb
180
181
def dxgb.train(client, params, dtrain, num_boost_round=10, evals=(),
182
obj=None, maximize=None, early_stopping_rounds=None,
183
evals_result=None, verbose_eval=True, xgb_model=None,
184
callbacks=None):
185
"""Train XGBoost model using Dask."""
186
187
class dxgb.DaskXGBClassifier:
188
"""Dask-distributed XGBoost classifier."""
189
190
# Spark integration
191
from xgboost import spark as spark_xgb
192
193
class spark_xgb.SparkXGBClassifier:
194
"""PySpark XGBoost classifier."""
195
```
196
197
[Distributed Computing](./distributed-computing.md)
198
199
### Utilities and Visualization
200
201
Utility functions for model interpretation, configuration management, and visualization. These tools help understand model behavior and manage XGBoost settings.
202
203
```python { .api }
204
def plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None,
205
title='Feature importance', xlabel='F score',
206
ylabel='Features', fmap='', importance_type='weight',
207
max_num_features=None, grid=True, show_values=True,
208
values_format='{v}'):
209
"""Plot feature importance based on fitted trees."""
210
211
def plot_tree(booster, fmap='', num_trees=0, rankdir=None, ax=None, **kwargs):
212
"""Plot specified tree using matplotlib."""
213
214
def set_config(**new_config):
215
"""Set global XGBoost configuration."""
216
217
def get_config():
218
"""Get current global configuration values."""
219
```
220
221
[Utilities and Visualization](./utilities.md)
222
223
## Types
224
225
```python { .api }
226
from typing import Union, Optional, List, Dict, Any, Tuple, Callable
227
import numpy as np
228
import pandas as pd
229
230
# Common type aliases used throughout XGBoost
231
ArrayLike = Union[List, np.ndarray, pd.DataFrame, pd.Series]
232
PathLike = Union[str, os.PathLike]
233
Metric = Union[str, List[str], Callable]
234
Objective = Union[str, Callable]
235
EvalSet = List[Tuple[DMatrix, str]]
236
FeatureNames = List[str]
237
FeatureTypes = List[str]
238
FloatCompatible = Union[float, np.float32, np.float64]
239
240
# Callback types
241
from xgboost.callback import TrainingCallback
242
EvalsLog = Dict[str, Dict[str, List[float]]]
243
CallbackList = Optional[List[TrainingCallback]]
244
245
# Data splitting modes
246
from enum import IntEnum
247
248
class DataSplitMode(IntEnum):
249
"""Supported data split mode for DMatrix."""
250
ROW = 0 # Split by rows (default)
251
COL = 1 # Split by columns
252
253
# Collective communication operations
254
class Op(IntEnum):
255
"""Supported operations for allreduce."""
256
MAX = 0
257
MIN = 1
258
SUM = 2
259
BITWISE_AND = 3
260
BITWISE_OR = 4
261
BITWISE_XOR = 5
262
```