or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

callbacks.mdconfiguration.mdcore-api.mddistributed-computing.mdindex.mdsklearn-interface.mdvisualization.md
tile.json

tessl/pypi-xgboost

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/xgboost@3.0.x

To install, run

npx @tessl/cli install tessl/pypi-xgboost@3.0.0

index.mddocs/

XGBoost

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework, providing parallel tree boosting (GBDT, GBM) that solves data science problems in a fast and accurate way. The library runs on major distributed environments and can handle problems beyond billions of examples.

Package Information

  • Package Name: xgboost
  • Language: Python
  • Installation: pip install xgboost

Core Imports

import xgboost as xgb

For scikit-learn compatible estimators:

from xgboost import XGBClassifier, XGBRegressor, XGBRanker

For core functionality:

from xgboost import DMatrix, Booster, train, cv

Basic Usage

import xgboost as xgb
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

# Load sample data
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Method 1: Using XGBoost native API
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

params = {
    'objective': 'reg:squarederror',
    'max_depth': 3,
    'learning_rate': 0.1,
    'n_estimators': 100
}

model = xgb.train(params, dtrain, num_boost_round=100)
predictions = model.predict(dtest)

# Method 2: Using scikit-learn API
model = xgb.XGBRegressor(max_depth=3, learning_rate=0.1, n_estimators=100)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Architecture

XGBoost provides multiple interfaces for different use cases:

  • Core API: Native XGBoost interface with DMatrix for data and Booster for models
  • Scikit-Learn API: Drop-in replacement estimators compatible with sklearn pipelines
  • Distributed Computing: Integration with Dask, Spark, and collective communication
  • Specialized Features: Quantile regression, ranking, federated learning

The library is built around efficient gradient boosting with optimizations for speed, memory usage, and scalability across different computing environments.

Capabilities

Core Data Structures and Training

Fundamental XGBoost data structures and training functions that form the core of the library. Includes DMatrix for efficient data handling and training functions for model creation.

class DMatrix:
    def __init__(self, data, label=None, **kwargs): ...

class Booster:
    def predict(self, data, **kwargs): ...
    def save_model(self, fname): ...

def train(params, dtrain, num_boost_round=10, **kwargs): ...
def cv(params, dtrain, num_boost_round=10, **kwargs): ...

Core API

Scikit-Learn Compatible Estimators

Drop-in replacement estimators that follow scikit-learn conventions for seamless integration with existing ML pipelines. Includes classifiers, regressors, and rankers.

class XGBRegressor:
    def fit(self, X, y, **kwargs): ...
    def predict(self, X): ...

class XGBClassifier:
    def fit(self, X, y, **kwargs): ...
    def predict(self, X): ...
    def predict_proba(self, X): ...

class XGBRanker:
    def fit(self, X, y, **kwargs): ...
    def predict(self, X): ...

Scikit-Learn Interface

Distributed Computing

Distributed training and prediction capabilities for large-scale machine learning across multiple workers and computing environments.

# Dask integration
from xgboost.dask import DaskXGBRegressor, DaskXGBClassifier

# Spark integration  
from xgboost.spark import SparkXGBRegressor, SparkXGBClassifier

# Collective communication
import xgboost.collective as collective

Distributed Computing

Visualization and Model Interpretation

Tools for visualizing model structure, feature importance, and decision trees to understand and interpret XGBoost models.

def plot_importance(booster, **kwargs): ...
def plot_tree(booster, **kwargs): ...
def to_graphviz(booster, **kwargs): ...

Visualization

Training Callbacks

Comprehensive callback system for monitoring and controlling the training process, including early stopping, learning rate scheduling, and model checkpointing.

from xgboost.callback import (
    TrainingCallback,
    EarlyStopping,
    LearningRateScheduler,
    EvaluationMonitor,
    TrainingCheckPoint
)

Callbacks

Configuration and Utilities

Global configuration management, build information, and utility functions for customizing XGBoost behavior and accessing system information.

def set_config(**kwargs): ...
def get_config(): ...
def config_context(**kwargs): ...
def build_info(): ...

Configuration

Types

Core Types

from typing import Dict, List, Optional, Union, Any
import numpy as np

# Data types
ArrayLike = Union[np.ndarray, List, tuple, 'pd.DataFrame', 'scipy.sparse.matrix']
FeatureNames = Optional[Union[str, List[str]]]
FeatureTypes = Optional[List[str]]

# Parameter types
BoosterParam = Dict[str, Any]