or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-xgboost-cpu

XGBoost Python Package (CPU only) - A minimal installation with no support for GPU algorithms or federated learning, providing optimized distributed gradient boosting for machine learning

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/xgboost-cpu@3.0.x

To install, run

npx @tessl/cli install tessl/pypi-xgboost-cpu@3.0.0

0

# XGBoost-CPU

1

2

XGBoost Python Package (CPU only) - A minimal installation with no support for GPU algorithms or federated learning, providing optimized distributed gradient boosting for machine learning. XGBoost is an optimized distributed gradient boosting library designed for high efficiency, flexibility, and portability, implementing machine learning algorithms under the Gradient Boosting framework.

3

4

## Package Information

5

6

- **Package Name**: xgboost-cpu

7

- **Language**: Python

8

- **Installation**: `pip install xgboost-cpu`

9

- **Documentation**: https://xgboost.readthedocs.io/en/stable/

10

11

## Core Imports

12

13

```python

14

import xgboost as xgb

15

```

16

17

Common imports for different use cases:

18

19

```python

20

# Core functionality

21

from xgboost import DMatrix, Booster, train, cv

22

23

# Scikit-learn interface

24

from xgboost import XGBClassifier, XGBRegressor, XGBRanker

25

26

# Distributed computing

27

from xgboost import dask as dxgb # Dask integration

28

from xgboost import spark as spark_xgb # Spark integration

29

30

# Utilities

31

from xgboost import plot_importance, plot_tree

32

from xgboost import get_config, set_config

33

```

34

35

## Basic Usage

36

37

```python

38

import xgboost as xgb

39

import numpy as np

40

from sklearn.datasets import make_classification

41

from sklearn.model_selection import train_test_split

42

43

# Create sample data

44

X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

45

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

46

47

# Method 1: Using XGBoost's native API

48

dtrain = xgb.DMatrix(X_train, label=y_train)

49

dtest = xgb.DMatrix(X_test, label=y_test)

50

51

params = {

52

'objective': 'binary:logistic',

53

'eval_metric': 'logloss',

54

'max_depth': 6,

55

'learning_rate': 0.1

56

}

57

58

model = xgb.train(params, dtrain, num_boost_round=100,

59

evals=[(dtrain, 'train'), (dtest, 'test')])

60

61

# Make predictions

62

y_pred = model.predict(dtest)

63

64

# Method 2: Using scikit-learn interface

65

from xgboost import XGBClassifier

66

67

clf = XGBClassifier(objective='binary:logistic', max_depth=6,

68

learning_rate=0.1, n_estimators=100)

69

clf.fit(X_train, y_train)

70

y_pred_sklearn = clf.predict_proba(X_test)[:, 1]

71

72

# Visualize feature importance

73

xgb.plot_importance(model, max_num_features=10)

74

```

75

76

## Architecture

77

78

XGBoost provides multiple interfaces and deployment options:

79

80

- **Core API**: Native XGBoost data structures (DMatrix, Booster) and training functions for maximum control and performance

81

- **Scikit-learn Interface**: Drop-in replacements for sklearn estimators with familiar fit/predict API

82

- **Distributed Computing**: Native support for Dask and Spark ecosystems for scalable training

83

- **Data Handling**: Optimized data structures with support for sparse matrices, missing values, and external memory

84

- **Model Interpretation**: Built-in visualization and feature importance tools

85

86

This design enables XGBoost to serve as both a high-performance gradient boosting engine and an accessible machine learning library that integrates seamlessly with the Python data science ecosystem.

87

88

## Capabilities

89

90

### Core Data Structures and Models

91

92

Fundamental XGBoost data structures and model objects that provide the foundation for training and prediction. These include DMatrix for efficient data handling, Booster for trained models, and specialized variants for memory optimization.

93

94

```python { .api }

95

class DMatrix:

96

def __init__(self, data, label=None, *, weight=None, base_margin=None,

97

missing=None, silent=False, feature_names=None,

98

feature_types=None, nthread=None, group=None, qid=None,

99

label_lower_bound=None, label_upper_bound=None,

100

feature_weights=None, enable_categorical=False):

101

"""Optimized data matrix for XGBoost training and prediction."""

102

103

class Booster:

104

def __init__(self, params=None, cache=(), model_file=None):

105

"""XGBoost model containing training, prediction, and evaluation routines."""

106

107

def predict(self, data, *, output_margin=False, pred_leaf=False,

108

pred_contribs=False, approx_contribs=False,

109

pred_interactions=False, validate_features=True,

110

training=False, iteration_range=(0, 0), strict_shape=False):

111

"""Make predictions using the trained model."""

112

113

class QuantileDMatrix:

114

def __init__(self, data, label=None, *, ref=None, **kwargs):

115

"""Memory-efficient DMatrix variant using quantized data."""

116

```

117

118

[Core Data Structures and Models](./core-data-models.md)

119

120

### Training and Evaluation

121

122

Core training functions and cross-validation for model development. These functions provide the primary interface for training XGBoost models with extensive configuration options and evaluation capabilities.

123

124

```python { .api }

125

def train(params, dtrain, num_boost_round=10, evals=(), obj=None,

126

maximize=None, early_stopping_rounds=None, evals_result=None,

127

verbose_eval=True, xgb_model=None, callbacks=None, custom_metric=None):

128

"""Train a booster with given parameters."""

129

130

def cv(params, dtrain, num_boost_round=10, nfold=3, stratified=False,

131

folds=None, metrics=(), obj=None, maximize=None,

132

early_stopping_rounds=None, fpreproc=None, as_pandas=True,

133

verbose_eval=None, show_stdv=True, seed=0, callbacks=None,

134

shuffle=True, custom_metric=None):

135

"""Cross-validation with given parameters."""

136

```

137

138

[Training and Evaluation](./training-evaluation.md)

139

140

### Scikit-learn Interface

141

142

Drop-in replacements for scikit-learn estimators providing familiar fit/predict API with XGBoost's performance. Includes classifiers, regressors, rankers, and random forest variants.

143

144

```python { .api }

145

class XGBClassifier:

146

def __init__(self, *, max_depth=6, learning_rate=0.3, n_estimators=100,

147

objective=None, booster='gbtree', tree_method='auto',

148

n_jobs=None, gamma=0, min_child_weight=1, max_delta_step=0,

149

subsample=1, colsample_bytree=1, reg_alpha=0, reg_lambda=1,

150

scale_pos_weight=1, base_score=None, random_state=None,

151

missing=np.nan, **kwargs):

152

"""XGBoost classifier following scikit-learn API."""

153

154

def fit(self, X, y, *, sample_weight=None, base_margin=None,

155

eval_set=None, verbose=True, xgb_model=None,

156

sample_weight_eval_set=None, base_margin_eval_set=None,

157

feature_weights=None):

158

"""Fit the model to training data."""

159

160

def predict_proba(self, X, *, validate_features=True, base_margin=None,

161

iteration_range=None):

162

"""Predict class probabilities."""

163

164

class XGBRegressor:

165

"""XGBoost regressor following scikit-learn API."""

166

167

class XGBRanker:

168

"""XGBoost ranker for learning-to-rank tasks."""

169

```

170

171

[Scikit-learn Interface](./sklearn-interface.md)

172

173

### Distributed Computing

174

175

Native support for distributed training across Dask and Spark ecosystems, enabling scalable machine learning on large datasets and compute clusters.

176

177

```python { .api }

178

# Dask integration

179

from xgboost import dask as dxgb

180

181

def dxgb.train(client, params, dtrain, num_boost_round=10, evals=(),

182

obj=None, maximize=None, early_stopping_rounds=None,

183

evals_result=None, verbose_eval=True, xgb_model=None,

184

callbacks=None):

185

"""Train XGBoost model using Dask."""

186

187

class dxgb.DaskXGBClassifier:

188

"""Dask-distributed XGBoost classifier."""

189

190

# Spark integration

191

from xgboost import spark as spark_xgb

192

193

class spark_xgb.SparkXGBClassifier:

194

"""PySpark XGBoost classifier."""

195

```

196

197

[Distributed Computing](./distributed-computing.md)

198

199

### Utilities and Visualization

200

201

Utility functions for model interpretation, configuration management, and visualization. These tools help understand model behavior and manage XGBoost settings.

202

203

```python { .api }

204

def plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None,

205

title='Feature importance', xlabel='F score',

206

ylabel='Features', fmap='', importance_type='weight',

207

max_num_features=None, grid=True, show_values=True,

208

values_format='{v}'):

209

"""Plot feature importance based on fitted trees."""

210

211

def plot_tree(booster, fmap='', num_trees=0, rankdir=None, ax=None, **kwargs):

212

"""Plot specified tree using matplotlib."""

213

214

def set_config(**new_config):

215

"""Set global XGBoost configuration."""

216

217

def get_config():

218

"""Get current global configuration values."""

219

```

220

221

[Utilities and Visualization](./utilities.md)

222

223

## Types

224

225

```python { .api }

226

from typing import Union, Optional, List, Dict, Any, Tuple, Callable

227

import numpy as np

228

import pandas as pd

229

230

# Common type aliases used throughout XGBoost

231

ArrayLike = Union[List, np.ndarray, pd.DataFrame, pd.Series]

232

PathLike = Union[str, os.PathLike]

233

Metric = Union[str, List[str], Callable]

234

Objective = Union[str, Callable]

235

EvalSet = List[Tuple[DMatrix, str]]

236

FeatureNames = List[str]

237

FeatureTypes = List[str]

238

FloatCompatible = Union[float, np.float32, np.float64]

239

240

# Callback types

241

from xgboost.callback import TrainingCallback

242

EvalsLog = Dict[str, Dict[str, List[float]]]

243

CallbackList = Optional[List[TrainingCallback]]

244

245

# Data splitting modes

246

from enum import IntEnum

247

248

class DataSplitMode(IntEnum):

249

"""Supported data split mode for DMatrix."""

250

ROW = 0 # Split by rows (default)

251

COL = 1 # Split by columns

252

253

# Collective communication operations

254

class Op(IntEnum):

255

"""Supported operations for allreduce."""

256

MAX = 0

257

MIN = 1

258

SUM = 2

259

BITWISE_AND = 3

260

BITWISE_OR = 4

261

BITWISE_XOR = 5

262

```