CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-xgboost

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable

Overview
Eval results
Files

core-api.mddocs/

Core API

The core XGBoost API provides the native interface for data handling, model training, and prediction. This includes the DMatrix data structure, Booster model class, and training functions that form the foundation of XGBoost.

Capabilities

DMatrix - Data Structure

The primary data structure for XGBoost that efficiently handles various data formats including NumPy arrays, pandas DataFrames, and sparse matrices.

class DMatrix:
    def __init__(
        self,
        data,
        label=None,
        weight=None,
        base_margin=None,
        missing=None,
        silent=False,
        feature_names=None,
        feature_types=None,
        nthread=None,
        group=None,
        qid=None,
        label_lower_bound=None,
        label_upper_bound=None,
        feature_weights=None,
        enable_categorical=False
    ):
        """
        Data Matrix used in XGBoost.

        Parameters:
        - data: Input data (numpy array, pandas DataFrame, scipy sparse matrix, or file path)
        - label: Labels for training data
        - weight: Instance weights
        - base_margin: Base margin for prediction
        - missing: Value to treat as missing
        - silent: Whether to suppress warnings
        - feature_names: Names of features
        - feature_types: Types of features
        - nthread: Number of threads for loading data
        - group: Group sizes for ranking
        - qid: Query ID for ranking
        - label_lower_bound: Lower bound for labels
        - label_upper_bound: Upper bound for labels
        - feature_weights: Feature weights
        - enable_categorical: Enable categorical feature support
        """

    def get_label(self):
        """Get labels from DMatrix."""

    def set_label(self, label):
        """Set labels for DMatrix."""

    def get_weight(self):
        """Get instance weights from DMatrix."""

    def set_weight(self, weight):
        """Set instance weights for DMatrix."""

    def get_base_margin(self):
        """Get base margin from DMatrix."""

    def set_base_margin(self, margin):
        """Set base margin for DMatrix."""

    def save_binary(self, fname, silent=True):
        """Save DMatrix to XGBoost binary format."""

    def slice(self, rindex, allow_groups=False):
        """Slice DMatrix by row indices."""

    def get_float_info(self, field):
        """Get float information from DMatrix."""

    def get_uint_info(self, field):
        """Get unsigned integer information from DMatrix."""

DataIter - Abstract Data Iterator

Abstract base class for creating custom data iterators for streaming data into XGBoost.

class DataIter:
    def reset(self):
        """Reset iterator to beginning."""
        
    def next(self, input_data):
        """
        Get next batch of data.

        Parameters:
        - input_data: Callback function to provide data batch

        Returns:
        int: 0 for success, 1 for end of iteration
        """

QuantileDMatrix - Memory Efficient Data Structure

Memory-efficient data structure for large datasets using quantile-based approximation.

class QuantileDMatrix:
    def __init__(
        self,
        data,
        label=None,
        weight=None,
        base_margin=None,
        missing=None,
        silent=False,
        feature_names=None,
        feature_types=None,
        nthread=None,
        group=None,
        qid=None,
        label_lower_bound=None,
        label_upper_bound=None,
        feature_weights=None,
        ref=None,
        enable_categorical=False,
        max_bin=256
    ):
        """
        Quantile DMatrix for memory efficient training.

        Parameters:
        - data: Input data
        - max_bin: Maximum number of bins for quantile approximation
        - ref: Reference QuantileDMatrix for consistent binning
        - (other parameters same as DMatrix)
        """

ExtMemQuantileDMatrix - External Memory Data Structure

External memory version of QuantileDMatrix for training on datasets larger than available RAM.

class ExtMemQuantileDMatrix:
    def __init__(
        self,
        it,
        ref=None,
        **kwargs
    ):
        """
        External memory quantile DMatrix.

        Parameters:
        - it: Data iterator (DataIter object) 
        - ref: Reference QuantileDMatrix for consistent binning
        - **kwargs: Additional parameters same as QuantileDMatrix
        """

Booster - Model Class

The core model class that handles training, prediction, and model persistence.

class Booster:
    def __init__(self, params=None, cache=(), model_file=None):
        """
        Initialize Booster.

        Parameters:
        - params: Parameters dictionary
        - cache: List of DMatrix objects to cache
        - model_file: Path to model file to load
        """

    def update(self, dtrain, iteration, fobj=None):
        """Update booster for one iteration."""

    def predict(
        self,
        data,
        output_margin=False,
        pred_leaf=False,
        pred_contribs=False,
        approx_contribs=False,
        pred_interactions=False,
        validate_features=True,
        training=False,
        iteration_range=None,
        strict_shape=False
    ):
        """
        Predict using the booster.

        Parameters:
        - data: Input data (DMatrix)
        - output_margin: Output raw margins instead of probabilities
        - pred_leaf: Predict leaf indices
        - pred_contribs: Predict feature contributions (SHAP values)
        - approx_contribs: Use approximate feature contributions
        - pred_interactions: Predict SHAP interaction values
        - validate_features: Validate feature names/types
        - training: Whether this is for training
        - iteration_range: Range of boosting rounds to use
        - strict_shape: Strict output shape checking

        Returns:
        Predictions as numpy array
        """

    def save_model(self, fname):
        """Save booster to file."""

    def load_model(self, fname):
        """Load booster from file."""

    def get_dump(self, fmap='', with_stats=False, dump_format='text'):
        """Get model dump as list of strings."""

    def get_fscore(self, fmap=''):
        """Get feature importance scores."""

    def get_score(self, importance_type='weight'):
        """Get feature importance scores by type."""

    def set_param(self, params, value=None):
        """Set parameters for booster."""

    def get_params(self):
        """Get current booster parameters."""

    def copy(self):
        """Copy booster."""

    def eval(self, data, name='eval', iteration=0):
        """Evaluate on data."""

    def eval_set(self, evals, iteration=0, feval=None):
        """Evaluate on multiple datasets."""

Training Functions

Core training functions for model creation and cross-validation.

def train(
    params,
    dtrain,
    num_boost_round=10,
    evals=None,
    obj=None,
    maximize=None,
    early_stopping_rounds=None,
    evals_result=None,
    verbose_eval=True,
    xgb_model=None,
    callbacks=None,
    custom_metric=None
):
    """
    Train an XGBoost model.

    Parameters:
    - params: Training parameters dictionary
    - dtrain: Training DMatrix
    - num_boost_round: Number of boosting rounds
    - evals: List of (DMatrix, name) tuples for evaluation
    - obj: Custom objective function
    - maximize: Whether to maximize evaluation metric
    - early_stopping_rounds: Early stopping rounds
    - evals_result: Dictionary to store evaluation results
    - verbose_eval: Verbosity of evaluation
    - xgb_model: Path to existing model or Booster instance
    - callbacks: List of callback functions
    - custom_metric: Custom evaluation metric

    Returns:
    Trained Booster object
    """

def cv(
    params,
    dtrain,
    num_boost_round=10,
    nfold=3,
    stratified=False,
    folds=None,
    metrics=(),
    obj=None,
    maximize=None,
    early_stopping_rounds=None,
    fpreproc=None,
    as_pandas=True,
    verbose_eval=None,
    show_stdv=True,
    seed=0,
    callbacks=None,
    shuffle=True,
    custom_metric=None
):
    """
    Cross-validation for XGBoost.

    Parameters:
    - params: Training parameters
    - dtrain: Training DMatrix
    - num_boost_round: Number of boosting rounds
    - nfold: Number of CV folds
    - stratified: Stratified sampling for folds
    - folds: Custom CV folds
    - metrics: Evaluation metrics
    - obj: Custom objective function
    - maximize: Whether to maximize metric
    - early_stopping_rounds: Early stopping rounds
    - fpreproc: Preprocessing function
    - as_pandas: Return pandas DataFrame
    - verbose_eval: Verbosity
    - show_stdv: Show standard deviation
    - seed: Random seed
    - callbacks: Callback functions
    - shuffle: Shuffle data before folding
    - custom_metric: Custom evaluation metric

    Returns:
    CV results as DataFrame or dict
    """

Exception Classes

class XGBoostError(ValueError):
    """Exception raised by XGBoost operations."""

Utility Functions

def build_info():
    """
    Get build information about XGBoost.

    Returns:
    Dictionary containing build and system information
    """

Install with Tessl CLI

npx tessl i tessl/pypi-xgboost

docs

callbacks.md

configuration.md

core-api.md

distributed-computing.md

index.md

sklearn-interface.md

visualization.md

tile.json