XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable
npx @tessl/cli install tessl/pypi-xgboost@3.0.0XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework, providing parallel tree boosting (GBDT, GBM) that solves data science problems in a fast and accurate way. The library runs on major distributed environments and can handle problems beyond billions of examples.
pip install xgboostimport xgboost as xgbFor scikit-learn compatible estimators:
from xgboost import XGBClassifier, XGBRegressor, XGBRankerFor core functionality:
from xgboost import DMatrix, Booster, train, cvimport xgboost as xgb
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
# Load sample data
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Method 1: Using XGBoost native API
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {
'objective': 'reg:squarederror',
'max_depth': 3,
'learning_rate': 0.1,
'n_estimators': 100
}
model = xgb.train(params, dtrain, num_boost_round=100)
predictions = model.predict(dtest)
# Method 2: Using scikit-learn API
model = xgb.XGBRegressor(max_depth=3, learning_rate=0.1, n_estimators=100)
model.fit(X_train, y_train)
predictions = model.predict(X_test)XGBoost provides multiple interfaces for different use cases:
The library is built around efficient gradient boosting with optimizations for speed, memory usage, and scalability across different computing environments.
Fundamental XGBoost data structures and training functions that form the core of the library. Includes DMatrix for efficient data handling and training functions for model creation.
class DMatrix:
def __init__(self, data, label=None, **kwargs): ...
class Booster:
def predict(self, data, **kwargs): ...
def save_model(self, fname): ...
def train(params, dtrain, num_boost_round=10, **kwargs): ...
def cv(params, dtrain, num_boost_round=10, **kwargs): ...Drop-in replacement estimators that follow scikit-learn conventions for seamless integration with existing ML pipelines. Includes classifiers, regressors, and rankers.
class XGBRegressor:
def fit(self, X, y, **kwargs): ...
def predict(self, X): ...
class XGBClassifier:
def fit(self, X, y, **kwargs): ...
def predict(self, X): ...
def predict_proba(self, X): ...
class XGBRanker:
def fit(self, X, y, **kwargs): ...
def predict(self, X): ...Distributed training and prediction capabilities for large-scale machine learning across multiple workers and computing environments.
# Dask integration
from xgboost.dask import DaskXGBRegressor, DaskXGBClassifier
# Spark integration
from xgboost.spark import SparkXGBRegressor, SparkXGBClassifier
# Collective communication
import xgboost.collective as collectiveTools for visualizing model structure, feature importance, and decision trees to understand and interpret XGBoost models.
def plot_importance(booster, **kwargs): ...
def plot_tree(booster, **kwargs): ...
def to_graphviz(booster, **kwargs): ...Comprehensive callback system for monitoring and controlling the training process, including early stopping, learning rate scheduling, and model checkpointing.
from xgboost.callback import (
TrainingCallback,
EarlyStopping,
LearningRateScheduler,
EvaluationMonitor,
TrainingCheckPoint
)Global configuration management, build information, and utility functions for customizing XGBoost behavior and accessing system information.
def set_config(**kwargs): ...
def get_config(): ...
def config_context(**kwargs): ...
def build_info(): ...from typing import Dict, List, Optional, Union, Any
import numpy as np
# Data types
ArrayLike = Union[np.ndarray, List, tuple, 'pd.DataFrame', 'scipy.sparse.matrix']
FeatureNames = Optional[Union[str, List[str]]]
FeatureTypes = Optional[List[str]]
# Parameter types
BoosterParam = Dict[str, Any]