or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-xgboost

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/xgboost@3.0.x

To install, run

npx @tessl/cli install tessl/pypi-xgboost@3.0.0

0

# XGBoost

1

2

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework, providing parallel tree boosting (GBDT, GBM) that solves data science problems in a fast and accurate way. The library runs on major distributed environments and can handle problems beyond billions of examples.

3

4

## Package Information

5

6

- **Package Name**: xgboost

7

- **Language**: Python

8

- **Installation**: `pip install xgboost`

9

10

## Core Imports

11

12

```python

13

import xgboost as xgb

14

```

15

16

For scikit-learn compatible estimators:

17

18

```python

19

from xgboost import XGBClassifier, XGBRegressor, XGBRanker

20

```

21

22

For core functionality:

23

24

```python

25

from xgboost import DMatrix, Booster, train, cv

26

```

27

28

## Basic Usage

29

30

```python

31

import xgboost as xgb

32

import numpy as np

33

from sklearn.datasets import load_boston

34

from sklearn.model_selection import train_test_split

35

36

# Load sample data

37

X, y = load_boston(return_X_y=True)

38

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

39

40

# Method 1: Using XGBoost native API

41

dtrain = xgb.DMatrix(X_train, label=y_train)

42

dtest = xgb.DMatrix(X_test, label=y_test)

43

44

params = {

45

'objective': 'reg:squarederror',

46

'max_depth': 3,

47

'learning_rate': 0.1,

48

'n_estimators': 100

49

}

50

51

model = xgb.train(params, dtrain, num_boost_round=100)

52

predictions = model.predict(dtest)

53

54

# Method 2: Using scikit-learn API

55

model = xgb.XGBRegressor(max_depth=3, learning_rate=0.1, n_estimators=100)

56

model.fit(X_train, y_train)

57

predictions = model.predict(X_test)

58

```

59

60

## Architecture

61

62

XGBoost provides multiple interfaces for different use cases:

63

64

- **Core API**: Native XGBoost interface with DMatrix for data and Booster for models

65

- **Scikit-Learn API**: Drop-in replacement estimators compatible with sklearn pipelines

66

- **Distributed Computing**: Integration with Dask, Spark, and collective communication

67

- **Specialized Features**: Quantile regression, ranking, federated learning

68

69

The library is built around efficient gradient boosting with optimizations for speed, memory usage, and scalability across different computing environments.

70

71

## Capabilities

72

73

### Core Data Structures and Training

74

75

Fundamental XGBoost data structures and training functions that form the core of the library. Includes DMatrix for efficient data handling and training functions for model creation.

76

77

```python { .api }

78

class DMatrix:

79

def __init__(self, data, label=None, **kwargs): ...

80

81

class Booster:

82

def predict(self, data, **kwargs): ...

83

def save_model(self, fname): ...

84

85

def train(params, dtrain, num_boost_round=10, **kwargs): ...

86

def cv(params, dtrain, num_boost_round=10, **kwargs): ...

87

```

88

89

[Core API](./core-api.md)

90

91

### Scikit-Learn Compatible Estimators

92

93

Drop-in replacement estimators that follow scikit-learn conventions for seamless integration with existing ML pipelines. Includes classifiers, regressors, and rankers.

94

95

```python { .api }

96

class XGBRegressor:

97

def fit(self, X, y, **kwargs): ...

98

def predict(self, X): ...

99

100

class XGBClassifier:

101

def fit(self, X, y, **kwargs): ...

102

def predict(self, X): ...

103

def predict_proba(self, X): ...

104

105

class XGBRanker:

106

def fit(self, X, y, **kwargs): ...

107

def predict(self, X): ...

108

```

109

110

[Scikit-Learn Interface](./sklearn-interface.md)

111

112

### Distributed Computing

113

114

Distributed training and prediction capabilities for large-scale machine learning across multiple workers and computing environments.

115

116

```python { .api }

117

# Dask integration

118

from xgboost.dask import DaskXGBRegressor, DaskXGBClassifier

119

120

# Spark integration

121

from xgboost.spark import SparkXGBRegressor, SparkXGBClassifier

122

123

# Collective communication

124

import xgboost.collective as collective

125

```

126

127

[Distributed Computing](./distributed-computing.md)

128

129

### Visualization and Model Interpretation

130

131

Tools for visualizing model structure, feature importance, and decision trees to understand and interpret XGBoost models.

132

133

```python { .api }

134

def plot_importance(booster, **kwargs): ...

135

def plot_tree(booster, **kwargs): ...

136

def to_graphviz(booster, **kwargs): ...

137

```

138

139

[Visualization](./visualization.md)

140

141

### Training Callbacks

142

143

Comprehensive callback system for monitoring and controlling the training process, including early stopping, learning rate scheduling, and model checkpointing.

144

145

```python { .api }

146

from xgboost.callback import (

147

TrainingCallback,

148

EarlyStopping,

149

LearningRateScheduler,

150

EvaluationMonitor,

151

TrainingCheckPoint

152

)

153

```

154

155

[Callbacks](./callbacks.md)

156

157

### Configuration and Utilities

158

159

Global configuration management, build information, and utility functions for customizing XGBoost behavior and accessing system information.

160

161

```python { .api }

162

def set_config(**kwargs): ...

163

def get_config(): ...

164

def config_context(**kwargs): ...

165

def build_info(): ...

166

```

167

168

[Configuration](./configuration.md)

169

170

## Types

171

172

### Core Types

173

174

```python { .api }

175

from typing import Dict, List, Optional, Union, Any

176

import numpy as np

177

178

# Data types

179

ArrayLike = Union[np.ndarray, List, tuple, 'pd.DataFrame', 'scipy.sparse.matrix']

180

FeatureNames = Optional[Union[str, List[str]]]

181

FeatureTypes = Optional[List[str]]

182

183

# Parameter types

184

BoosterParam = Dict[str, Any]

185

```