CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-xgboost-cpu

XGBoost Python Package (CPU only) - A minimal installation with no support for GPU algorithms or federated learning, providing optimized distributed gradient boosting for machine learning

Overview
Eval results
Files

training-evaluation.mddocs/

Training and Evaluation

Core training functions and cross-validation capabilities for XGBoost model development. These functions provide the primary interface for training XGBoost models with extensive configuration options, evaluation metrics, and early stopping capabilities.

Capabilities

Model Training

Primary training function that builds XGBoost models using gradient boosting with support for custom objectives, evaluation metrics, and advanced training configurations.

def train(params, dtrain, num_boost_round=10, evals=(), obj=None, 
          maximize=None, early_stopping_rounds=None, evals_result=None, 
          verbose_eval=True, xgb_model=None, callbacks=None, custom_metric=None):
    """
    Train a booster with given parameters.
    
    Parameters:
    - params: Training parameters as dictionary (dict)
        Common parameters:
        - 'objective': Learning task ('reg:squarederror', 'binary:logistic', 'multi:softmax', etc.)
        - 'eval_metric': Evaluation metric ('rmse', 'logloss', 'mlogloss', 'auc', etc.) 
        - 'max_depth': Maximum tree depth (int, default=6)
        - 'learning_rate': Boosting learning rate (float, default=0.3)
        - 'subsample': Fraction of observations to subsample (float, default=1)
        - 'colsample_bytree': Fraction of features to subsample per tree (float, default=1)
        - 'gamma': Minimum loss reduction required for split (float, default=0)
        - 'min_child_weight': Minimum sum of instance weight in child (float, default=1)  
        - 'reg_alpha': L1 regularization term (float, default=0)
        - 'reg_lambda': L2 regularization term (float, default=1)
        - 'scale_pos_weight': Balancing weight for positive class (float, default=1)
        - 'tree_method': Tree construction algorithm ('auto', 'exact', 'approx', 'hist')
        - 'device': Device to use ('cpu', 'cuda', 'gpu')
        - 'random_state': Random seed (int)
    - dtrain: Training DMatrix (DMatrix)
    - num_boost_round: Number of boosting iterations (int)
    - evals: List of pairs (DMatrix, string) for evaluation during training (list)
    - obj: Customized objective function (callable, optional)
        Signature: obj(y_pred, y_true) -> (grad, hess)
    - maximize: Whether to maximize evaluation metric (bool, optional)
    - early_stopping_rounds: Stop training if evaluation doesn't improve (int, optional)
    - evals_result: Dictionary to store evaluation results (dict, optional)
    - verbose_eval: Whether to display evaluation results (bool or int)
    - xgb_model: Existing model to continue training (Booster, optional)
    - callbacks: List of callback functions (list, optional)
    - custom_metric: Custom evaluation metric function (callable, optional)
        Signature: custom_metric(y_pred, y_true) -> (eval_name, eval_result, is_higher_better)
    
    Returns: Booster - Trained XGBoost model
    """

Cross-Validation

Robust cross-validation functionality for model validation, hyperparameter tuning, and performance estimation with support for stratified sampling and custom evaluation metrics.

def cv(params, dtrain, num_boost_round=10, nfold=3, stratified=False, 
       folds=None, metrics=(), obj=None, maximize=None, 
       early_stopping_rounds=None, fpreproc=None, as_pandas=True, 
       verbose_eval=None, show_stdv=True, seed=0, callbacks=None, 
       shuffle=True, custom_metric=None):
    """
    Cross-validation with given parameters.
    
    Parameters:
    - params: Training parameters (same as train() function) (dict)
    - dtrain: Training DMatrix (DMatrix)  
    - num_boost_round: Number of boosting iterations (int)
    - nfold: Number of CV folds (int)
    - stratified: Whether to use stratified sampling (bool)
    - folds: Custom CV folds as sklearn splits (iterator, optional)
    - metrics: Additional evaluation metrics (tuple of str)
    - obj: Custom objective function (callable, optional)
    - maximize: Whether to maximize evaluation metric (bool, optional)
    - early_stopping_rounds: Stop if no improvement for N rounds (int, optional)
    - fpreproc: Preprocessing function for each fold (callable, optional)
        Signature: fpreproc(dtrain, dtest, params) -> (dtrain, dtest, params)
    - as_pandas: Return results as pandas DataFrame (bool)
    - verbose_eval: Control evaluation result display (bool or int, optional)
    - show_stdv: Whether to display standard deviation (bool)
    - seed: Random seed for fold assignment (int)
    - callbacks: List of callback functions (list, optional)
    - shuffle: Whether to shuffle data before splitting (bool)
    - custom_metric: Custom evaluation metric (callable, optional)
    
    Returns: dict or pandas.DataFrame - Cross-validation results
        If as_pandas=True: DataFrame with metrics for each fold and iteration
        If as_pandas=False: Dict with metric names as keys and lists of scores as values
    """

Usage Examples

Basic Model Training

import xgboost as xgb
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Create sample data
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, 
                          n_informative=15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
                                                   random_state=42)

# Create DMatrix objects
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set training parameters
params = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'max_depth': 6,
    'learning_rate': 0.1,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'gamma': 0.1,
    'min_child_weight': 1,
    'reg_alpha': 0.1,
    'reg_lambda': 1.0,
    'random_state': 42
}

# Train model with evaluation
model = xgb.train(
    params=params,
    dtrain=dtrain,
    num_boost_round=100,
    evals=[(dtrain, 'train'), (dtest, 'test')],
    early_stopping_rounds=10,
    verbose_eval=10
)

print(f"Best iteration: {model.best_iteration}")
print(f"Best score: {model.best_score}")

# Make predictions
y_pred = model.predict(dtest)
y_pred_binary = (y_pred > 0.5).astype(int)

Advanced Training with Custom Objective

import numpy as np

def custom_objective(y_pred, y_true):
    """Custom focal loss objective for imbalanced classification."""
    alpha = 0.25
    gamma = 2.0
    
    # Convert DMatrix labels to numpy array
    y_true = y_true.get_label()
    
    # Sigmoid activation
    sigmoid = 1 / (1 + np.exp(-y_pred))
    
    # Focal loss gradients and hessians
    grad = alpha * (y_true - sigmoid) * ((1 - sigmoid) ** gamma) * gamma * np.log(sigmoid + 1e-8) + \
           alpha * (y_true - sigmoid) * ((1 - sigmoid) ** (gamma - 1))
    
    hess = alpha * gamma * ((1 - sigmoid) ** (gamma - 1)) * \
           (gamma * (y_true - sigmoid) * np.log(sigmoid + 1e-8) + (y_true - sigmoid) + \
            (1 - sigmoid))
    
    return grad, hess

def custom_eval_metric(y_pred, y_true):
    """Custom F1 score evaluation metric."""
    from sklearn.metrics import f1_score
    
    y_true = y_true.get_label()
    y_pred_binary = (y_pred > 0.5).astype(int)
    f1 = f1_score(y_true, y_pred_binary)
    
    return 'f1', f1, True  # Name, value, higher_is_better

# Train with custom objective and metric
model_custom = xgb.train(
    params={'max_depth': 6, 'learning_rate': 0.1},
    dtrain=dtrain,
    num_boost_round=100,
    evals=[(dtrain, 'train'), (dtest, 'test')],
    obj=custom_objective,
    custom_metric=custom_eval_metric,
    early_stopping_rounds=10,
    verbose_eval=10
)

Cross-Validation for Model Selection

# Basic cross-validation
cv_params = {
    'objective': 'binary:logistic',
    'eval_metric': 'auc',
    'max_depth': 6,
    'learning_rate': 0.1,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'random_state': 42
}

cv_results = xgb.cv(
    params=cv_params,
    dtrain=dtrain,
    num_boost_round=200,
    nfold=5,
    stratified=True,
    early_stopping_rounds=10,
    seed=42,
    verbose_eval=10,
    show_stdv=True
)

print(f"Best CV score: {cv_results.iloc[-1]['test-auc-mean']:.4f} ± {cv_results.iloc[-1]['test-auc-std']:.4f}")
print(f"Best iteration: {len(cv_results)}")

# Cross-validation with hyperparameter grid search
from sklearn.model_selection import ParameterGrid

param_grid = {
    'max_depth': [3, 6, 9],
    'learning_rate': [0.01, 0.1, 0.2],
    'subsample': [0.8, 0.9, 1.0],
    'colsample_bytree': [0.8, 0.9, 1.0]
}

best_score = 0
best_params = None
best_num_rounds = 0

for params in ParameterGrid(param_grid):
    cv_params = {
        'objective': 'binary:logistic',
        'eval_metric': 'auc',
        'random_state': 42,
        **params
    }
    
    cv_results = xgb.cv(
        params=cv_params,
        dtrain=dtrain,
        num_boost_round=100,
        nfold=5,
        stratified=True,
        early_stopping_rounds=10,
        seed=42,
        verbose_eval=False,
        show_stdv=False
    )
    
    score = cv_results.iloc[-1]['test-auc-mean']
    if score > best_score:
        best_score = score
        best_params = params
        best_num_rounds = len(cv_results)

print(f"Best parameters: {best_params}")
print(f"Best CV score: {best_score:.4f}")
print(f"Best number of rounds: {best_num_rounds}")

# Train final model with best parameters
final_params = {
    'objective': 'binary:logistic',
    'eval_metric': 'auc',
    'random_state': 42,
    **best_params
}

final_model = xgb.train(
    params=final_params,
    dtrain=dtrain,
    num_boost_round=best_num_rounds,
    evals=[(dtest, 'test')],
    verbose_eval=False
)

Multi-Class Classification Training

from sklearn.datasets import make_classification

# Create multi-class data
X_multi, y_multi = make_classification(n_samples=1000, n_features=20, 
                                      n_classes=5, n_informative=15, 
                                      random_state=42)
X_train_multi, X_test_multi, y_train_multi, y_test_multi = \
    train_test_split(X_multi, y_multi, test_size=0.2, random_state=42)

# Create DMatrix for multi-class
dtrain_multi = xgb.DMatrix(X_train_multi, label=y_train_multi)
dtest_multi = xgb.DMatrix(X_test_multi, label=y_test_multi)

# Multi-class parameters
multi_params = {
    'objective': 'multi:softmax',
    'eval_metric': 'mlogloss',
    'num_class': 5,  # Number of classes
    'max_depth': 6,
    'learning_rate': 0.1,
    'random_state': 42
}

# Train multi-class model
multi_model = xgb.train(
    params=multi_params,
    dtrain=dtrain_multi,
    num_boost_round=100,
    evals=[(dtrain_multi, 'train'), (dtest_multi, 'test')],
    early_stopping_rounds=10,
    verbose_eval=10
)

# Get class probabilities instead of class predictions
multi_params_prob = multi_params.copy()
multi_params_prob['objective'] = 'multi:softprob'

multi_model_prob = xgb.train(
    params=multi_params_prob,
    dtrain=dtrain_multi,
    num_boost_round=100,
    early_stopping_rounds=10,
    verbose_eval=False
)

# Predictions
y_pred_classes = multi_model.predict(dtest_multi)
y_pred_probs = multi_model_prob.predict(dtest_multi).reshape(-1, 5)

print(f"Predicted classes shape: {y_pred_classes.shape}")
print(f"Predicted probabilities shape: {y_pred_probs.shape}")

Regression Training

from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, r2_score

# Create regression data
X_reg, y_reg = make_regression(n_samples=1000, n_features=20, noise=0.1, 
                              random_state=42)
X_train_reg, X_test_reg, y_train_reg, y_test_reg = \
    train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)

# Create DMatrix for regression
dtrain_reg = xgb.DMatrix(X_train_reg, label=y_train_reg)
dtest_reg = xgb.DMatrix(X_test_reg, label=y_test_reg)

# Regression parameters
reg_params = {
    'objective': 'reg:squarederror',
    'eval_metric': 'rmse',
    'max_depth': 6,
    'learning_rate': 0.1,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'random_state': 42
}

# Cross-validation for regression
reg_cv_results = xgb.cv(
    params=reg_params,
    dtrain=dtrain_reg,
    num_boost_round=100,
    nfold=5,
    early_stopping_rounds=10,
    seed=42,
    verbose_eval=10
)

# Train final regression model
reg_model = xgb.train(
    params=reg_params,
    dtrain=dtrain_reg,
    num_boost_round=len(reg_cv_results),
    evals=[(dtrain_reg, 'train'), (dtest_reg, 'test')],
    verbose_eval=False
)

# Evaluate regression performance
y_pred_reg = reg_model.predict(dtest_reg)
rmse = mean_squared_error(y_test_reg, y_pred_reg, squared=False)
r2 = r2_score(y_test_reg, y_pred_reg)

print(f"RMSE: {rmse:.4f}")
print(f"R²: {r2:.4f}")

Training with Callbacks

import xgboost as xgb

# Define custom callback
class CustomCallback:
    def __init__(self):
        self.history = {'train': [], 'test': []}
    
    def after_iteration(self, model, epoch, evals_log):
        """Called after each training iteration."""
        if evals_log:
            for data_name, eval_results in evals_log.items():
                for metric_name, metric_values in eval_results.items():
                    self.history[data_name].append(metric_values[-1])
        
        # Custom logic, e.g., adaptive learning rate
        if epoch > 50 and epoch % 10 == 0:
            current_lr = model.get_xgb_params().get('learning_rate', 0.1)
            new_lr = current_lr * 0.95
            model.set_param('learning_rate', new_lr)
            print(f"Epoch {epoch}: Reduced learning rate to {new_lr:.4f}")
        
        return False  # Continue training

# Train with callbacks
callback = CustomCallback()
evals_result = {}

model_with_callback = xgb.train(
    params=params,
    dtrain=dtrain,
    num_boost_round=100,
    evals=[(dtrain, 'train'), (dtest, 'test')],
    evals_result=evals_result,
    callbacks=[callback],
    verbose_eval=10
)

print(f"Training history length: {len(callback.history['train'])}")
print(f"Final evaluation results keys: {list(evals_result.keys())}")

Early Stopping and Model Selection

# Training with detailed early stopping
evals_result_detailed = {}

model_early_stop = xgb.train(
    params={
        'objective': 'binary:logistic',
        'eval_metric': ['logloss', 'auc'],  # Multiple metrics
        'max_depth': 6,
        'learning_rate': 0.1,
        'random_state': 42
    },
    dtrain=dtrain,
    num_boost_round=500,  # Large number, will stop early
    evals=[(dtrain, 'train'), (dtest, 'validation')],
    evals_result=evals_result_detailed,
    early_stopping_rounds=20,  # Stop if no improvement for 20 rounds
    maximize=False,  # Minimize logloss (first metric)
    verbose_eval=25
)

print(f"Training stopped at iteration: {model_early_stop.best_iteration + 1}")
print(f"Best score: {model_early_stop.best_score:.4f}")

# Access detailed evaluation history
train_logloss = evals_result_detailed['train']['logloss']
val_logloss = evals_result_detailed['validation']['logloss']
train_auc = evals_result_detailed['train']['auc']
val_auc = evals_result_detailed['validation']['auc']

print(f"Final train logloss: {train_logloss[-1]:.4f}")
print(f"Final validation logloss: {val_logloss[-1]:.4f}")
print(f"Final train AUC: {train_auc[-1]:.4f}")
print(f"Final validation AUC: {val_auc[-1]:.4f}")

# Plot training curves
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(train_logloss, label='Train')
plt.plot(val_logloss, label='Validation')
plt.xlabel('Iteration')
plt.ylabel('Log Loss')
plt.title('Training Curves - Log Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(train_auc, label='Train')
plt.plot(val_auc, label='Validation')
plt.xlabel('Iteration')
plt.ylabel('AUC')
plt.title('Training Curves - AUC')
plt.legend()

plt.tight_layout()
plt.show()

Training Callbacks

Comprehensive callback system for customizing training behavior with built-in callbacks for early stopping, learning rate scheduling, evaluation monitoring, and checkpointing. Callbacks enable advanced training control and monitoring.

import xgboost.callback as callback

class callback.TrainingCallback:
    """Abstract base class for training callbacks."""
    
    def before_training(self, model):
        """Called before training starts. Parameters: model (Booster)"""
    
    def after_training(self, model):
        """Called after training finishes. Parameters: model (Booster)"""
    
    def before_iteration(self, model, epoch, evals_log):
        """Called before each iteration. Parameters: model (Booster), epoch (int), evals_log (dict)"""
    
    def after_iteration(self, model, epoch, evals_log):
        """
        Called after each iteration.
        
        Parameters:
        - model: Current model state (Booster)
        - epoch: Current iteration number (int)
        - evals_log: Evaluation history (dict)
        
        Returns: bool - True to stop training, False to continue
        """

class callback.EarlyStopping:
    def __init__(self, rounds, metric_name=None, data_name=None, 
                 maximize=None, save_best=False, min_delta=0.0):
        """
        Early stopping with best model saving support.
        
        Parameters:
        - rounds: Number of rounds with no improvement to stop (int)
        - metric_name: Metric name for early stopping (str, optional)
        - data_name: Dataset name for early stopping (str, optional)  
        - maximize: Whether to maximize metric (bool, optional, auto-detected)
        - save_best: Save best model vs last model (bool, tree methods only)
        - min_delta: Minimum change to qualify as improvement (float)
        """

class callback.LearningRateScheduler:
    def __init__(self, learning_rates):
        """
        Dynamic learning rate adjustment during training.
        
        Parameters:
        - learning_rates: Callable accepting epoch and returning learning rate,
                         or sequence of learning rates (callable or sequence)
        """

class callback.EvaluationMonitor:
    def __init__(self, rank=0, period=1, show_stdv=False, logger=None):
        """
        Print evaluation results at specified intervals.
        
        Parameters:
        - rank: Which worker prints results (int, default: 0)
        - period: Epochs between printing (int, default: 1)
        - show_stdv: Show standard deviation in CV (bool, default: False)
        - logger: Custom logging function (callable, optional)
        """

class callback.TrainingCheckPoint:
    def __init__(self, directory, name='model', as_pickle=False, interval=100):
        """
        Periodic model checkpointing during training.
        
        Parameters:
        - directory: Output directory for checkpoints (str)
        - name: Checkpoint file name pattern (str, default: 'model')
        - as_pickle: Save as pickle vs model format (bool, default: False)
        - interval: Checkpointing interval in rounds (int, default: 100)
        """

Callback Usage Examples

Custom Callback Development

import xgboost as xgb
from xgboost.callback import TrainingCallback

class CustomCallback(TrainingCallback):
    """Custom callback for advanced training control."""
    
    def __init__(self, log_file='training.log'):
        self.log_file = log_file
        self.iteration_times = []
        self.best_score = float('inf')
        self.plateau_count = 0
    
    def before_training(self, model):
        """Initialize callback state before training."""
        import time
        self.start_time = time.time()
        with open(self.log_file, 'w') as f:
            f.write("Training started\n")
    
    def after_iteration(self, model, epoch, evals_log):
        """Custom logic after each iteration."""
        import time
        current_time = time.time()
        
        # Track iteration timing
        if hasattr(self, 'iter_start'):
            iter_time = current_time - self.iter_start
            self.iteration_times.append(iter_time)
        
        self.iter_start = current_time
        
        # Check for validation score improvement
        if evals_log:
            for data_name, metrics in evals_log.items():
                if 'logloss' in metrics:
                    current_score = metrics['logloss'][-1]
                    if current_score < self.best_score:
                        self.best_score = current_score
                        self.plateau_count = 0
                    else:
                        self.plateau_count += 1
        
        # Dynamic learning rate adjustment based on plateau
        if self.plateau_count > 5:
            current_lr = model.get_xgb_params().get('learning_rate', 0.1)
            new_lr = current_lr * 0.9
            model.set_param('learning_rate', new_lr)
            print(f"Epoch {epoch}: Reduced learning rate to {new_lr:.6f}")
            self.plateau_count = 0
        
        # Log progress
        with open(self.log_file, 'a') as f:
            f.write(f"Epoch {epoch}: Best score = {self.best_score:.6f}\n")
        
        return False  # Continue training
    
    def after_training(self, model):
        """Cleanup and final logging."""
        total_time = time.time() - self.start_time
        avg_iter_time = sum(self.iteration_times) / len(self.iteration_times)
        
        with open(self.log_file, 'a') as f:
            f.write(f"Training completed in {total_time:.2f}s\n")
            f.write(f"Average iteration time: {avg_iter_time:.4f}s\n")
            f.write(f"Final best score: {self.best_score:.6f}\n")

# Use custom callback
custom_cb = CustomCallback('xgb_training.log')
model = xgb.train(params, dtrain, num_boost_round=100, 
                  callbacks=[custom_cb])

Combining Multiple Callbacks

import xgboost as xgb
from xgboost import callback
import numpy as np

# Create multiple callbacks
early_stop = callback.EarlyStopping(rounds=10, save_best=True, min_delta=0.001)
checkpointer = callback.TrainingCheckPoint(directory='./checkpoints', 
                                          interval=25, as_pickle=False)
monitor = callback.EvaluationMonitor(period=5, show_stdv=True)

# Custom learning rate schedule
def cosine_schedule(epoch):
    """Cosine annealing learning rate schedule."""
    max_epochs = 200
    initial_lr = 0.1
    min_lr = 0.001
    
    if epoch >= max_epochs:
        return min_lr
    
    return min_lr + (initial_lr - min_lr) * \
           (1 + np.cos(np.pi * epoch / max_epochs)) / 2

lr_scheduler = callback.LearningRateScheduler(cosine_schedule)

# Combine all callbacks
callbacks = [early_stop, checkpointer, monitor, lr_scheduler]

# Train with all callbacks
model = xgb.train(
    params={'objective': 'binary:logistic', 'eval_metric': 'logloss'},
    dtrain=dtrain,
    num_boost_round=200,
    evals=[(dtrain, 'train'), (dtest, 'validation')],
    callbacks=callbacks,
    verbose_eval=False  # Monitor callback handles output
)

print(f"Training stopped at iteration: {model.best_iteration}")
print(f"Best validation score: {model.best_score}")

Callback Integration with Scikit-learn Interface

from xgboost import XGBClassifier, callback
from sklearn.model_selection import GridSearchCV

# Create callbacks for sklearn interface
early_stop_cb = callback.EarlyStopping(rounds=10, save_best=True)
checkpoint_cb = callback.TrainingCheckPoint('./sklearn_checkpoints', interval=50)

# Use with sklearn estimator
xgb_clf = XGBClassifier(
    n_estimators=500,
    max_depth=6,
    learning_rate=0.1,
    callbacks=[early_stop_cb, checkpoint_cb],
    eval_metric='logloss'
)

# Callbacks work with hyperparameter tuning
param_grid = {
    'max_depth': [3, 6, 9],
    'learning_rate': [0.01, 0.1, 0.2]
}

# Note: Each grid search iteration re-initializes callbacks
grid_search = GridSearchCV(
    xgb_clf, param_grid, 
    cv=5, scoring='roc_auc',
    fit_params={'eval_set': [(X_test, y_test)]}
)

grid_search.fit(X_train, y_train)

Advanced Callback Patterns

class AdaptiveRegularizationCallback(TrainingCallback):
    """Dynamically adjust regularization based on overfitting detection."""
    
    def __init__(self, patience=10, reg_step=0.1):
        self.patience = patience
        self.reg_step = reg_step
        self.train_scores = []
        self.val_scores = []
        self.best_diff = float('inf')
        self.wait_count = 0
    
    def after_iteration(self, model, epoch, evals_log):
        if evals_log and 'train' in evals_log and 'val' in evals_log:
            # Assume both have the same metric
            metric_name = list(evals_log['train'].keys())[0]
            train_score = evals_log['train'][metric_name][-1]
            val_score = evals_log['val'][metric_name][-1]
            
            # Calculate overfitting gap
            score_diff = abs(val_score - train_score)
            
            if score_diff < self.best_diff:
                self.best_diff = score_diff
                self.wait_count = 0
            else:
                self.wait_count += 1
            
            # Increase regularization if overfitting detected
            if self.wait_count >= self.patience:
                current_alpha = model.get_xgb_params().get('reg_alpha', 0)
                current_lambda = model.get_xgb_params().get('reg_lambda', 0)
                
                new_alpha = current_alpha + self.reg_step
                new_lambda = current_lambda + self.reg_step
                
                model.set_param('reg_alpha', new_alpha)
                model.set_param('reg_lambda', new_lambda)
                
                print(f"Epoch {epoch}: Increased regularization - "
                      f"alpha={new_alpha:.3f}, lambda={new_lambda:.3f}")
                
                self.wait_count = 0
        
        return False

# Multi-stage training with callback switching
class MultiStageCallback(TrainingCallback):
    """Switch between different training strategies."""
    
    def __init__(self, stage_epochs=[50, 100]):
        self.stage_epochs = stage_epochs
        self.current_stage = 0
    
    def after_iteration(self, model, epoch, evals_log):
        # Switch training strategy at specified epochs
        for i, stage_epoch in enumerate(self.stage_epochs):
            if epoch == stage_epoch and self.current_stage == i:
                self.current_stage += 1
                
                if i == 0:  # First stage: fast learning
                    model.set_param('learning_rate', 0.01)  # Slower learning
                    model.set_param('max_depth', 8)  # Deeper trees
                    print(f"Stage {i+1}: Switched to fine-tuning mode")
                
                elif i == 1:  # Second stage: regularization
                    model.set_param('reg_alpha', 0.1)
                    model.set_param('reg_lambda', 0.1)
                    print(f"Stage {i+2}: Added regularization")
        
        return False

# Usage with multiple advanced callbacks
adaptive_reg_cb = AdaptiveRegularizationCallback(patience=15)
multi_stage_cb = MultiStageCallback(stage_epochs=[75, 150])
early_stop_cb = callback.EarlyStopping(rounds=25, save_best=True)

advanced_model = xgb.train(
    params={'objective': 'reg:squarederror', 'eval_metric': 'rmse'},
    dtrain=dtrain,
    num_boost_round=200,
    evals=[(dtrain, 'train'), (dtest, 'val')],
    callbacks=[multi_stage_cb, adaptive_reg_cb, early_stop_cb]
)

Install with Tessl CLI

npx tessl i tessl/pypi-xgboost-cpu

docs

core-data-models.md

distributed-computing.md

index.md

sklearn-interface.md

training-evaluation.md

utilities.md

tile.json