or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

callbacks.mdconfiguration.mdcore-api.mddistributed-computing.mdindex.mdsklearn-interface.mdvisualization.md

core-api.mddocs/

0

# Core API

1

2

The core XGBoost API provides the native interface for data handling, model training, and prediction. This includes the DMatrix data structure, Booster model class, and training functions that form the foundation of XGBoost.

3

4

## Capabilities

5

6

### DMatrix - Data Structure

7

8

The primary data structure for XGBoost that efficiently handles various data formats including NumPy arrays, pandas DataFrames, and sparse matrices.

9

10

```python { .api }

11

class DMatrix:

12

def __init__(

13

self,

14

data,

15

label=None,

16

weight=None,

17

base_margin=None,

18

missing=None,

19

silent=False,

20

feature_names=None,

21

feature_types=None,

22

nthread=None,

23

group=None,

24

qid=None,

25

label_lower_bound=None,

26

label_upper_bound=None,

27

feature_weights=None,

28

enable_categorical=False

29

):

30

"""

31

Data Matrix used in XGBoost.

32

33

Parameters:

34

- data: Input data (numpy array, pandas DataFrame, scipy sparse matrix, or file path)

35

- label: Labels for training data

36

- weight: Instance weights

37

- base_margin: Base margin for prediction

38

- missing: Value to treat as missing

39

- silent: Whether to suppress warnings

40

- feature_names: Names of features

41

- feature_types: Types of features

42

- nthread: Number of threads for loading data

43

- group: Group sizes for ranking

44

- qid: Query ID for ranking

45

- label_lower_bound: Lower bound for labels

46

- label_upper_bound: Upper bound for labels

47

- feature_weights: Feature weights

48

- enable_categorical: Enable categorical feature support

49

"""

50

51

def get_label(self):

52

"""Get labels from DMatrix."""

53

54

def set_label(self, label):

55

"""Set labels for DMatrix."""

56

57

def get_weight(self):

58

"""Get instance weights from DMatrix."""

59

60

def set_weight(self, weight):

61

"""Set instance weights for DMatrix."""

62

63

def get_base_margin(self):

64

"""Get base margin from DMatrix."""

65

66

def set_base_margin(self, margin):

67

"""Set base margin for DMatrix."""

68

69

def save_binary(self, fname, silent=True):

70

"""Save DMatrix to XGBoost binary format."""

71

72

def slice(self, rindex, allow_groups=False):

73

"""Slice DMatrix by row indices."""

74

75

def get_float_info(self, field):

76

"""Get float information from DMatrix."""

77

78

def get_uint_info(self, field):

79

"""Get unsigned integer information from DMatrix."""

80

```

81

82

### DataIter - Abstract Data Iterator

83

84

Abstract base class for creating custom data iterators for streaming data into XGBoost.

85

86

```python { .api }

87

class DataIter:

88

def reset(self):

89

"""Reset iterator to beginning."""

90

91

def next(self, input_data):

92

"""

93

Get next batch of data.

94

95

Parameters:

96

- input_data: Callback function to provide data batch

97

98

Returns:

99

int: 0 for success, 1 for end of iteration

100

"""

101

```

102

103

### QuantileDMatrix - Memory Efficient Data Structure

104

105

Memory-efficient data structure for large datasets using quantile-based approximation.

106

107

```python { .api }

108

class QuantileDMatrix:

109

def __init__(

110

self,

111

data,

112

label=None,

113

weight=None,

114

base_margin=None,

115

missing=None,

116

silent=False,

117

feature_names=None,

118

feature_types=None,

119

nthread=None,

120

group=None,

121

qid=None,

122

label_lower_bound=None,

123

label_upper_bound=None,

124

feature_weights=None,

125

ref=None,

126

enable_categorical=False,

127

max_bin=256

128

):

129

"""

130

Quantile DMatrix for memory efficient training.

131

132

Parameters:

133

- data: Input data

134

- max_bin: Maximum number of bins for quantile approximation

135

- ref: Reference QuantileDMatrix for consistent binning

136

- (other parameters same as DMatrix)

137

"""

138

```

139

140

### ExtMemQuantileDMatrix - External Memory Data Structure

141

142

External memory version of QuantileDMatrix for training on datasets larger than available RAM.

143

144

```python { .api }

145

class ExtMemQuantileDMatrix:

146

def __init__(

147

self,

148

it,

149

ref=None,

150

**kwargs

151

):

152

"""

153

External memory quantile DMatrix.

154

155

Parameters:

156

- it: Data iterator (DataIter object)

157

- ref: Reference QuantileDMatrix for consistent binning

158

- **kwargs: Additional parameters same as QuantileDMatrix

159

"""

160

```

161

162

### Booster - Model Class

163

164

The core model class that handles training, prediction, and model persistence.

165

166

```python { .api }

167

class Booster:

168

def __init__(self, params=None, cache=(), model_file=None):

169

"""

170

Initialize Booster.

171

172

Parameters:

173

- params: Parameters dictionary

174

- cache: List of DMatrix objects to cache

175

- model_file: Path to model file to load

176

"""

177

178

def update(self, dtrain, iteration, fobj=None):

179

"""Update booster for one iteration."""

180

181

def predict(

182

self,

183

data,

184

output_margin=False,

185

pred_leaf=False,

186

pred_contribs=False,

187

approx_contribs=False,

188

pred_interactions=False,

189

validate_features=True,

190

training=False,

191

iteration_range=None,

192

strict_shape=False

193

):

194

"""

195

Predict using the booster.

196

197

Parameters:

198

- data: Input data (DMatrix)

199

- output_margin: Output raw margins instead of probabilities

200

- pred_leaf: Predict leaf indices

201

- pred_contribs: Predict feature contributions (SHAP values)

202

- approx_contribs: Use approximate feature contributions

203

- pred_interactions: Predict SHAP interaction values

204

- validate_features: Validate feature names/types

205

- training: Whether this is for training

206

- iteration_range: Range of boosting rounds to use

207

- strict_shape: Strict output shape checking

208

209

Returns:

210

Predictions as numpy array

211

"""

212

213

def save_model(self, fname):

214

"""Save booster to file."""

215

216

def load_model(self, fname):

217

"""Load booster from file."""

218

219

def get_dump(self, fmap='', with_stats=False, dump_format='text'):

220

"""Get model dump as list of strings."""

221

222

def get_fscore(self, fmap=''):

223

"""Get feature importance scores."""

224

225

def get_score(self, importance_type='weight'):

226

"""Get feature importance scores by type."""

227

228

def set_param(self, params, value=None):

229

"""Set parameters for booster."""

230

231

def get_params(self):

232

"""Get current booster parameters."""

233

234

def copy(self):

235

"""Copy booster."""

236

237

def eval(self, data, name='eval', iteration=0):

238

"""Evaluate on data."""

239

240

def eval_set(self, evals, iteration=0, feval=None):

241

"""Evaluate on multiple datasets."""

242

```

243

244

### Training Functions

245

246

Core training functions for model creation and cross-validation.

247

248

```python { .api }

249

def train(

250

params,

251

dtrain,

252

num_boost_round=10,

253

evals=None,

254

obj=None,

255

maximize=None,

256

early_stopping_rounds=None,

257

evals_result=None,

258

verbose_eval=True,

259

xgb_model=None,

260

callbacks=None,

261

custom_metric=None

262

):

263

"""

264

Train an XGBoost model.

265

266

Parameters:

267

- params: Training parameters dictionary

268

- dtrain: Training DMatrix

269

- num_boost_round: Number of boosting rounds

270

- evals: List of (DMatrix, name) tuples for evaluation

271

- obj: Custom objective function

272

- maximize: Whether to maximize evaluation metric

273

- early_stopping_rounds: Early stopping rounds

274

- evals_result: Dictionary to store evaluation results

275

- verbose_eval: Verbosity of evaluation

276

- xgb_model: Path to existing model or Booster instance

277

- callbacks: List of callback functions

278

- custom_metric: Custom evaluation metric

279

280

Returns:

281

Trained Booster object

282

"""

283

284

def cv(

285

params,

286

dtrain,

287

num_boost_round=10,

288

nfold=3,

289

stratified=False,

290

folds=None,

291

metrics=(),

292

obj=None,

293

maximize=None,

294

early_stopping_rounds=None,

295

fpreproc=None,

296

as_pandas=True,

297

verbose_eval=None,

298

show_stdv=True,

299

seed=0,

300

callbacks=None,

301

shuffle=True,

302

custom_metric=None

303

):

304

"""

305

Cross-validation for XGBoost.

306

307

Parameters:

308

- params: Training parameters

309

- dtrain: Training DMatrix

310

- num_boost_round: Number of boosting rounds

311

- nfold: Number of CV folds

312

- stratified: Stratified sampling for folds

313

- folds: Custom CV folds

314

- metrics: Evaluation metrics

315

- obj: Custom objective function

316

- maximize: Whether to maximize metric

317

- early_stopping_rounds: Early stopping rounds

318

- fpreproc: Preprocessing function

319

- as_pandas: Return pandas DataFrame

320

- verbose_eval: Verbosity

321

- show_stdv: Show standard deviation

322

- seed: Random seed

323

- callbacks: Callback functions

324

- shuffle: Shuffle data before folding

325

- custom_metric: Custom evaluation metric

326

327

Returns:

328

CV results as DataFrame or dict

329

"""

330

```

331

332

### Exception Classes

333

334

```python { .api }

335

class XGBoostError(ValueError):

336

"""Exception raised by XGBoost operations."""

337

```

338

339

### Utility Functions

340

341

```python { .api }

342

def build_info():

343

"""

344

Get build information about XGBoost.

345

346

Returns:

347

Dictionary containing build and system information

348

"""

349

```