Tessl Tile for pypi/xgboost-cpu@3.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-data-models.md distributed-computing.md index.md sklearn-interface.md training-evaluation.md utilities.md

index.mddocs/

0
# XGBoost-CPU
1

2
XGBoost Python Package (CPU only) - A minimal installation with no support for GPU algorithms or federated learning, providing optimized distributed gradient boosting for machine learning. XGBoost is an optimized distributed gradient boosting library designed for high efficiency, flexibility, and portability, implementing machine learning algorithms under the Gradient Boosting framework.
3

4
## Package Information
5

6
- **Package Name**: xgboost-cpu
7
- **Language**: Python
8
- **Installation**: `pip install xgboost-cpu`
9
- **Documentation**: https://xgboost.readthedocs.io/en/stable/
10

11
## Core Imports
12

13
```python
14
import xgboost as xgb
15
```
16

17
Common imports for different use cases:
18

19
```python
20
# Core functionality
21
from xgboost import DMatrix, Booster, train, cv
22

23
# Scikit-learn interface
24
from xgboost import XGBClassifier, XGBRegressor, XGBRanker
25

26
# Distributed computing
27
from xgboost import dask as dxgb  # Dask integration
28
from xgboost import spark as spark_xgb  # Spark integration
29

30
# Utilities
31
from xgboost import plot_importance, plot_tree
32
from xgboost import get_config, set_config
33
```
34

35
## Basic Usage
36

37
```python
38
import xgboost as xgb
39
import numpy as np
40
from sklearn.datasets import make_classification
41
from sklearn.model_selection import train_test_split
42

43
# Create sample data
44
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
45
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
46

47
# Method 1: Using XGBoost's native API
48
dtrain = xgb.DMatrix(X_train, label=y_train)
49
dtest = xgb.DMatrix(X_test, label=y_test)
50

51
params = {
52
    'objective': 'binary:logistic',
53
    'eval_metric': 'logloss',
54
    'max_depth': 6,
55
    'learning_rate': 0.1
56
}
57

58
model = xgb.train(params, dtrain, num_boost_round=100, 
59
                  evals=[(dtrain, 'train'), (dtest, 'test')])
60

61
# Make predictions
62
y_pred = model.predict(dtest)
63

64
# Method 2: Using scikit-learn interface
65
from xgboost import XGBClassifier
66

67
clf = XGBClassifier(objective='binary:logistic', max_depth=6, 
68
                   learning_rate=0.1, n_estimators=100)
69
clf.fit(X_train, y_train)
70
y_pred_sklearn = clf.predict_proba(X_test)[:, 1]
71

72
# Visualize feature importance
73
xgb.plot_importance(model, max_num_features=10)
74
```
75

76
## Architecture
77

78
XGBoost provides multiple interfaces and deployment options:
79

80
- **Core API**: Native XGBoost data structures (DMatrix, Booster) and training functions for maximum control and performance
81
- **Scikit-learn Interface**: Drop-in replacements for sklearn estimators with familiar fit/predict API
82
- **Distributed Computing**: Native support for Dask and Spark ecosystems for scalable training
83
- **Data Handling**: Optimized data structures with support for sparse matrices, missing values, and external memory
84
- **Model Interpretation**: Built-in visualization and feature importance tools
85

86
This design enables XGBoost to serve as both a high-performance gradient boosting engine and an accessible machine learning library that integrates seamlessly with the Python data science ecosystem.
87

88
## Capabilities
89

90
### Core Data Structures and Models
91

92
Fundamental XGBoost data structures and model objects that provide the foundation for training and prediction. These include DMatrix for efficient data handling, Booster for trained models, and specialized variants for memory optimization.
93

94
```python { .api }
95
class DMatrix:
96
    def __init__(self, data, label=None, *, weight=None, base_margin=None, 
97
                 missing=None, silent=False, feature_names=None, 
98
                 feature_types=None, nthread=None, group=None, qid=None, 
99
                 label_lower_bound=None, label_upper_bound=None, 
100
                 feature_weights=None, enable_categorical=False):
101
        """Optimized data matrix for XGBoost training and prediction."""
102

103
class Booster:
104
    def __init__(self, params=None, cache=(), model_file=None):
105
        """XGBoost model containing training, prediction, and evaluation routines."""
106
    
107
    def predict(self, data, *, output_margin=False, pred_leaf=False, 
108
                pred_contribs=False, approx_contribs=False, 
109
                pred_interactions=False, validate_features=True, 
110
                training=False, iteration_range=(0, 0), strict_shape=False):
111
        """Make predictions using the trained model."""
112

113
class QuantileDMatrix:
114
    def __init__(self, data, label=None, *, ref=None, **kwargs):
115
        """Memory-efficient DMatrix variant using quantized data."""
116
```
117

118
[Core Data Structures and Models](./core-data-models.md)
119

120
### Training and Evaluation
121

122
Core training functions and cross-validation for model development. These functions provide the primary interface for training XGBoost models with extensive configuration options and evaluation capabilities.
123

124
```python { .api }
125
def train(params, dtrain, num_boost_round=10, evals=(), obj=None, 
126
          maximize=None, early_stopping_rounds=None, evals_result=None, 
127
          verbose_eval=True, xgb_model=None, callbacks=None, custom_metric=None):
128
    """Train a booster with given parameters."""
129

130
def cv(params, dtrain, num_boost_round=10, nfold=3, stratified=False, 
131
       folds=None, metrics=(), obj=None, maximize=None, 
132
       early_stopping_rounds=None, fpreproc=None, as_pandas=True, 
133
       verbose_eval=None, show_stdv=True, seed=0, callbacks=None, 
134
       shuffle=True, custom_metric=None):
135
    """Cross-validation with given parameters."""
136
```
137

138
[Training and Evaluation](./training-evaluation.md)
139

140
### Scikit-learn Interface
141

142
Drop-in replacements for scikit-learn estimators providing familiar fit/predict API with XGBoost's performance. Includes classifiers, regressors, rankers, and random forest variants.
143

144
```python { .api }
145
class XGBClassifier:
146
    def __init__(self, *, max_depth=6, learning_rate=0.3, n_estimators=100, 
147
                 objective=None, booster='gbtree', tree_method='auto', 
148
                 n_jobs=None, gamma=0, min_child_weight=1, max_delta_step=0, 
149
                 subsample=1, colsample_bytree=1, reg_alpha=0, reg_lambda=1, 
150
                 scale_pos_weight=1, base_score=None, random_state=None, 
151
                 missing=np.nan, **kwargs):
152
        """XGBoost classifier following scikit-learn API."""
153
    
154
    def fit(self, X, y, *, sample_weight=None, base_margin=None, 
155
            eval_set=None, verbose=True, xgb_model=None, 
156
            sample_weight_eval_set=None, base_margin_eval_set=None, 
157
            feature_weights=None):
158
        """Fit the model to training data."""
159
    
160
    def predict_proba(self, X, *, validate_features=True, base_margin=None, 
161
                      iteration_range=None):
162
        """Predict class probabilities."""
163

164
class XGBRegressor:
165
    """XGBoost regressor following scikit-learn API."""
166

167
class XGBRanker:
168
    """XGBoost ranker for learning-to-rank tasks."""
169
```
170

171
[Scikit-learn Interface](./sklearn-interface.md)
172

173
### Distributed Computing
174

175
Native support for distributed training across Dask and Spark ecosystems, enabling scalable machine learning on large datasets and compute clusters.
176

177
```python { .api }
178
# Dask integration
179
from xgboost import dask as dxgb
180

181
def dxgb.train(client, params, dtrain, num_boost_round=10, evals=(), 
182
               obj=None, maximize=None, early_stopping_rounds=None, 
183
               evals_result=None, verbose_eval=True, xgb_model=None, 
184
               callbacks=None):
185
    """Train XGBoost model using Dask."""
186

187
class dxgb.DaskXGBClassifier:
188
    """Dask-distributed XGBoost classifier."""
189

190
# Spark integration  
191
from xgboost import spark as spark_xgb
192

193
class spark_xgb.SparkXGBClassifier:
194
    """PySpark XGBoost classifier."""
195
```
196

197
[Distributed Computing](./distributed-computing.md)
198

199
### Utilities and Visualization
200

201
Utility functions for model interpretation, configuration management, and visualization. These tools help understand model behavior and manage XGBoost settings.
202

203
```python { .api }
204
def plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, 
205
                   title='Feature importance', xlabel='F score', 
206
                   ylabel='Features', fmap='', importance_type='weight', 
207
                   max_num_features=None, grid=True, show_values=True, 
208
                   values_format='{v}'):
209
    """Plot feature importance based on fitted trees."""
210

211
def plot_tree(booster, fmap='', num_trees=0, rankdir=None, ax=None, **kwargs):
212
    """Plot specified tree using matplotlib."""
213

214
def set_config(**new_config):
215
    """Set global XGBoost configuration."""
216

217
def get_config():
218
    """Get current global configuration values."""
219
```
220

221
[Utilities and Visualization](./utilities.md)
222

223
## Types
224

225
```python { .api }
226
from typing import Union, Optional, List, Dict, Any, Tuple, Callable
227
import numpy as np
228
import pandas as pd
229

230
# Common type aliases used throughout XGBoost
231
ArrayLike = Union[List, np.ndarray, pd.DataFrame, pd.Series]
232
PathLike = Union[str, os.PathLike]
233
Metric = Union[str, List[str], Callable]
234
Objective = Union[str, Callable]
235
EvalSet = List[Tuple[DMatrix, str]]
236
FeatureNames = List[str]
237
FeatureTypes = List[str]
238
FloatCompatible = Union[float, np.float32, np.float64]
239

240
# Callback types
241
from xgboost.callback import TrainingCallback
242
EvalsLog = Dict[str, Dict[str, List[float]]]
243
CallbackList = Optional[List[TrainingCallback]]
244

245
# Data splitting modes  
246
from enum import IntEnum
247

248
class DataSplitMode(IntEnum):
249
    """Supported data split mode for DMatrix."""
250
    ROW = 0  # Split by rows (default)
251
    COL = 1  # Split by columns
252

253
# Collective communication operations
254
class Op(IntEnum):
255
    """Supported operations for allreduce."""
256
    MAX = 0
257
    MIN = 1
258
    SUM = 2
259
    BITWISE_AND = 3
260
    BITWISE_OR = 4
261
    BITWISE_XOR = 5
262
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/