0
# Core API
1
2
The core XGBoost API provides the native interface for data handling, model training, and prediction. This includes the DMatrix data structure, Booster model class, and training functions that form the foundation of XGBoost.
3
4
## Capabilities
5
6
### DMatrix - Data Structure
7
8
The primary data structure for XGBoost that efficiently handles various data formats including NumPy arrays, pandas DataFrames, and sparse matrices.
9
10
```python { .api }
11
class DMatrix:
12
def __init__(
13
self,
14
data,
15
label=None,
16
weight=None,
17
base_margin=None,
18
missing=None,
19
silent=False,
20
feature_names=None,
21
feature_types=None,
22
nthread=None,
23
group=None,
24
qid=None,
25
label_lower_bound=None,
26
label_upper_bound=None,
27
feature_weights=None,
28
enable_categorical=False
29
):
30
"""
31
Data Matrix used in XGBoost.
32
33
Parameters:
34
- data: Input data (numpy array, pandas DataFrame, scipy sparse matrix, or file path)
35
- label: Labels for training data
36
- weight: Instance weights
37
- base_margin: Base margin for prediction
38
- missing: Value to treat as missing
39
- silent: Whether to suppress warnings
40
- feature_names: Names of features
41
- feature_types: Types of features
42
- nthread: Number of threads for loading data
43
- group: Group sizes for ranking
44
- qid: Query ID for ranking
45
- label_lower_bound: Lower bound for labels
46
- label_upper_bound: Upper bound for labels
47
- feature_weights: Feature weights
48
- enable_categorical: Enable categorical feature support
49
"""
50
51
def get_label(self):
52
"""Get labels from DMatrix."""
53
54
def set_label(self, label):
55
"""Set labels for DMatrix."""
56
57
def get_weight(self):
58
"""Get instance weights from DMatrix."""
59
60
def set_weight(self, weight):
61
"""Set instance weights for DMatrix."""
62
63
def get_base_margin(self):
64
"""Get base margin from DMatrix."""
65
66
def set_base_margin(self, margin):
67
"""Set base margin for DMatrix."""
68
69
def save_binary(self, fname, silent=True):
70
"""Save DMatrix to XGBoost binary format."""
71
72
def slice(self, rindex, allow_groups=False):
73
"""Slice DMatrix by row indices."""
74
75
def get_float_info(self, field):
76
"""Get float information from DMatrix."""
77
78
def get_uint_info(self, field):
79
"""Get unsigned integer information from DMatrix."""
80
```
81
82
### DataIter - Abstract Data Iterator
83
84
Abstract base class for creating custom data iterators for streaming data into XGBoost.
85
86
```python { .api }
87
class DataIter:
88
def reset(self):
89
"""Reset iterator to beginning."""
90
91
def next(self, input_data):
92
"""
93
Get next batch of data.
94
95
Parameters:
96
- input_data: Callback function to provide data batch
97
98
Returns:
99
int: 0 for success, 1 for end of iteration
100
"""
101
```
102
103
### QuantileDMatrix - Memory Efficient Data Structure
104
105
Memory-efficient data structure for large datasets using quantile-based approximation.
106
107
```python { .api }
108
class QuantileDMatrix:
109
def __init__(
110
self,
111
data,
112
label=None,
113
weight=None,
114
base_margin=None,
115
missing=None,
116
silent=False,
117
feature_names=None,
118
feature_types=None,
119
nthread=None,
120
group=None,
121
qid=None,
122
label_lower_bound=None,
123
label_upper_bound=None,
124
feature_weights=None,
125
ref=None,
126
enable_categorical=False,
127
max_bin=256
128
):
129
"""
130
Quantile DMatrix for memory efficient training.
131
132
Parameters:
133
- data: Input data
134
- max_bin: Maximum number of bins for quantile approximation
135
- ref: Reference QuantileDMatrix for consistent binning
136
- (other parameters same as DMatrix)
137
"""
138
```
139
140
### ExtMemQuantileDMatrix - External Memory Data Structure
141
142
External memory version of QuantileDMatrix for training on datasets larger than available RAM.
143
144
```python { .api }
145
class ExtMemQuantileDMatrix:
146
def __init__(
147
self,
148
it,
149
ref=None,
150
**kwargs
151
):
152
"""
153
External memory quantile DMatrix.
154
155
Parameters:
156
- it: Data iterator (DataIter object)
157
- ref: Reference QuantileDMatrix for consistent binning
158
- **kwargs: Additional parameters same as QuantileDMatrix
159
"""
160
```
161
162
### Booster - Model Class
163
164
The core model class that handles training, prediction, and model persistence.
165
166
```python { .api }
167
class Booster:
168
def __init__(self, params=None, cache=(), model_file=None):
169
"""
170
Initialize Booster.
171
172
Parameters:
173
- params: Parameters dictionary
174
- cache: List of DMatrix objects to cache
175
- model_file: Path to model file to load
176
"""
177
178
def update(self, dtrain, iteration, fobj=None):
179
"""Update booster for one iteration."""
180
181
def predict(
182
self,
183
data,
184
output_margin=False,
185
pred_leaf=False,
186
pred_contribs=False,
187
approx_contribs=False,
188
pred_interactions=False,
189
validate_features=True,
190
training=False,
191
iteration_range=None,
192
strict_shape=False
193
):
194
"""
195
Predict using the booster.
196
197
Parameters:
198
- data: Input data (DMatrix)
199
- output_margin: Output raw margins instead of probabilities
200
- pred_leaf: Predict leaf indices
201
- pred_contribs: Predict feature contributions (SHAP values)
202
- approx_contribs: Use approximate feature contributions
203
- pred_interactions: Predict SHAP interaction values
204
- validate_features: Validate feature names/types
205
- training: Whether this is for training
206
- iteration_range: Range of boosting rounds to use
207
- strict_shape: Strict output shape checking
208
209
Returns:
210
Predictions as numpy array
211
"""
212
213
def save_model(self, fname):
214
"""Save booster to file."""
215
216
def load_model(self, fname):
217
"""Load booster from file."""
218
219
def get_dump(self, fmap='', with_stats=False, dump_format='text'):
220
"""Get model dump as list of strings."""
221
222
def get_fscore(self, fmap=''):
223
"""Get feature importance scores."""
224
225
def get_score(self, importance_type='weight'):
226
"""Get feature importance scores by type."""
227
228
def set_param(self, params, value=None):
229
"""Set parameters for booster."""
230
231
def get_params(self):
232
"""Get current booster parameters."""
233
234
def copy(self):
235
"""Copy booster."""
236
237
def eval(self, data, name='eval', iteration=0):
238
"""Evaluate on data."""
239
240
def eval_set(self, evals, iteration=0, feval=None):
241
"""Evaluate on multiple datasets."""
242
```
243
244
### Training Functions
245
246
Core training functions for model creation and cross-validation.
247
248
```python { .api }
249
def train(
250
params,
251
dtrain,
252
num_boost_round=10,
253
evals=None,
254
obj=None,
255
maximize=None,
256
early_stopping_rounds=None,
257
evals_result=None,
258
verbose_eval=True,
259
xgb_model=None,
260
callbacks=None,
261
custom_metric=None
262
):
263
"""
264
Train an XGBoost model.
265
266
Parameters:
267
- params: Training parameters dictionary
268
- dtrain: Training DMatrix
269
- num_boost_round: Number of boosting rounds
270
- evals: List of (DMatrix, name) tuples for evaluation
271
- obj: Custom objective function
272
- maximize: Whether to maximize evaluation metric
273
- early_stopping_rounds: Early stopping rounds
274
- evals_result: Dictionary to store evaluation results
275
- verbose_eval: Verbosity of evaluation
276
- xgb_model: Path to existing model or Booster instance
277
- callbacks: List of callback functions
278
- custom_metric: Custom evaluation metric
279
280
Returns:
281
Trained Booster object
282
"""
283
284
def cv(
285
params,
286
dtrain,
287
num_boost_round=10,
288
nfold=3,
289
stratified=False,
290
folds=None,
291
metrics=(),
292
obj=None,
293
maximize=None,
294
early_stopping_rounds=None,
295
fpreproc=None,
296
as_pandas=True,
297
verbose_eval=None,
298
show_stdv=True,
299
seed=0,
300
callbacks=None,
301
shuffle=True,
302
custom_metric=None
303
):
304
"""
305
Cross-validation for XGBoost.
306
307
Parameters:
308
- params: Training parameters
309
- dtrain: Training DMatrix
310
- num_boost_round: Number of boosting rounds
311
- nfold: Number of CV folds
312
- stratified: Stratified sampling for folds
313
- folds: Custom CV folds
314
- metrics: Evaluation metrics
315
- obj: Custom objective function
316
- maximize: Whether to maximize metric
317
- early_stopping_rounds: Early stopping rounds
318
- fpreproc: Preprocessing function
319
- as_pandas: Return pandas DataFrame
320
- verbose_eval: Verbosity
321
- show_stdv: Show standard deviation
322
- seed: Random seed
323
- callbacks: Callback functions
324
- shuffle: Shuffle data before folding
325
- custom_metric: Custom evaluation metric
326
327
Returns:
328
CV results as DataFrame or dict
329
"""
330
```
331
332
### Exception Classes
333
334
```python { .api }
335
class XGBoostError(ValueError):
336
"""Exception raised by XGBoost operations."""
337
```
338
339
### Utility Functions
340
341
```python { .api }
342
def build_info():
343
"""
344
Get build information about XGBoost.
345
346
Returns:
347
Dictionary containing build and system information
348
"""
349
```