CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-autogluon--tabular

AutoGluon TabularPredictor for automated machine learning on tabular datasets

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

configurations.mddocs/

Configuration and Presets

AutoGluon Tabular provides extensive configuration options through presets, hyperparameter configurations, and feature processing settings. These configurations enable users to optimize for different objectives like accuracy, speed, interpretability, or deployment constraints.

Capabilities

Preset Configurations

Pre-configured settings optimized for different use cases, balancing accuracy, training time, and computational resources.

# Available preset configurations
PRESET_CONFIGURATIONS = Literal[
    "best_quality",              # Maximum accuracy, longer training time
    "high_quality",              # High accuracy with fast inference
    "good_quality",              # Good accuracy with very fast inference
    "medium_quality",            # Medium accuracy, very fast training (default)
    "optimize_for_deployment",   # Optimizes for deployment by cleaning up models
    "interpretable"              # Interpretable models only
]

def get_preset_config(preset: str) -> dict:
    """
    Get configuration dictionary for a specific preset.
    
    Parameters:
    - preset: Name of the preset configuration
    
    Returns:
    Dictionary with preset configuration parameters
    """

Hyperparameter Configurations

Systematic hyperparameter configuration system for customizing model training and optimization strategies.

def get_hyperparameter_config(
    preset: str = None,
    model_types: list[str] = None,
    search_strategy: str = "auto"
) -> dict:
    """
    Generate hyperparameter configuration for specified models and preset.
    
    Parameters:
    - preset: Base preset configuration
    - model_types: List of model types to configure
    - search_strategy: Hyperparameter search strategy ('grid', 'random', 'bayesian', 'auto')
    
    Returns:
    Dictionary mapping model names to hyperparameter configurations
    """

# Hyperparameter configuration structure
HYPERPARAMETER_CONFIG = dict[str, dict[str, Any]]
# Example: {'LGB': {'num_leaves': [31, 127], 'learning_rate': [0.01, 0.1]}}

def get_hyperparameter_config_options() -> list[str]:
    """
    Get list of available hyperparameter configuration presets.
    
    Returns:
    List of available configuration names
    """

def get_hyperparameter_config(config_name: str) -> dict:
    """
    Get specific hyperparameter configuration by name.
    
    Parameters:
    - config_name: Name of the hyperparameter configuration preset
    
    Returns:
    Hyperparameter configuration dictionary
    """

Feature Generation Configuration

Automated feature engineering and preprocessing configuration system for handling diverse data types and feature transformations.

def get_default_feature_generator(
    feature_generator: str = "auto",
    feature_metadata: 'FeatureMetadata' = None,
    init_kwargs: dict = None
) -> 'AutoMLPipelineFeatureGenerator':
    """
    Get default feature generator with specified configuration.
    
    Parameters:
    - feature_generator: Feature generation preset ('auto', 'interpretable')
    - feature_metadata: Metadata for feature processing
    - init_kwargs: Additional initialization arguments
    
    Returns:
    Configured feature generator instance
    """

class FeatureGenerator:
    """Base class for feature generation and preprocessing."""
    
    def fit_transform(
        self,
        X: pd.DataFrame,
        feature_metadata: 'FeatureMetadata' = None,
        **kwargs
    ) -> pd.DataFrame:
        """
        Fit feature generator and transform input data.
        
        Parameters:
        - X: Input dataframe
        - feature_metadata: Feature type metadata
        
        Returns:
        Transformed feature dataframe
        """
    
    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
        """Transform input data using fitted generator."""

Advanced Training Arguments

Configuration options for advanced training strategies including bagging, stacking, and resource management.

class AGArgsFit:
    """Arguments for controlling model fitting behavior."""
    
    num_cpus: int = "auto"                    # CPU cores for training
    num_gpus: int = 0                         # GPU devices to use
    memory_limit: int = None                  # Memory limit in MB
    disk_limit: int = None                    # Disk space limit in MB
    time_limit: float = None                  # Time limit per model in seconds
    name_suffix: str = ""                     # Suffix for model names
    priority: int = 0                         # Training priority
    
class AGArgsEnsemble:
    """Arguments for controlling ensemble behavior."""
    
    fold_fitting_strategy: str = "sequential_local"  # Fold fitting strategy
    auto_stack: bool = True                          # Enable automatic stacking
    bagging_mode: str = "oob"                        # Bagging validation mode
    stack_mode: str = "infer"                        # Stacking mode
    ensemble_size_max: int = 25                      # Maximum ensemble size

# Training configuration structure
TRAINING_CONFIG = {
    'num_bag_folds': int,      # Number of bagging folds (default: auto)
    'num_bag_sets': int,       # Number of bagging sets (default: auto)  
    'num_stack_levels': int,   # Number of stacking levels (default: auto)
    'ag_args_fit': dict,       # Advanced fitting arguments
    'ag_args_ensemble': dict,  # Advanced ensemble arguments
}

Evaluation and Metric Configuration

Configuration for evaluation metrics, validation strategies, and performance measurement.

# Classification metrics
CLASSIFICATION_METRICS = [
    "accuracy", "balanced_accuracy", "log_loss", 
    "f1", "f1_macro", "f1_micro", "f1_weighted",
    "roc_auc", "roc_auc_ovo", "roc_auc_ovo_macro", "roc_auc_ovo_weighted",
    "roc_auc_ovr", "roc_auc_ovr_macro", "roc_auc_ovr_micro", "roc_auc_ovr_weighted",
    "average_precision", "precision", "precision_macro", "precision_micro", "precision_weighted",
    "recall", "recall_macro", "recall_micro", "recall_weighted", 
    "mcc", "pac_score"
]

# Regression metrics  
REGRESSION_METRICS = [
    "root_mean_squared_error", "mean_squared_error", "mean_absolute_error",
    "median_absolute_error", "mean_absolute_percentage_error", 
    "r2", "symmetric_mean_absolute_percentage_error"
]

# Quantile regression metrics
QUANTILE_METRICS = ["pinball_loss"]

def get_metric_config(
    problem_type: str,
    eval_metric: str = None,
    greater_is_better: bool = None
) -> dict:
    """
    Get metric configuration for evaluation.
    
    Parameters:
    - problem_type: Type of ML problem
    - eval_metric: Primary evaluation metric
    - greater_is_better: Whether higher metric values are better
    
    Returns:
    Metric configuration dictionary
    """

Resource and Performance Configuration

Settings for optimizing computational resource usage, memory management, and training performance.

class ResourceConfig:
    """Configuration for computational resources and performance optimization."""
    
    # CPU and Memory
    num_cpus: int = "auto"              # Number of CPU cores
    memory_limit_mb: int = None         # Memory limit in megabytes
    
    # GPU Configuration  
    num_gpus: int = 0                   # Number of GPU devices
    gpu_memory_limit: int = None        # GPU memory limit
    
    # Disk and Storage
    disk_limit_mb: int = None           # Disk space limit
    cache_data: bool = True             # Cache preprocessed data
    
    # Performance Optimization
    enable_multiprocessing: bool = True  # Enable multiprocessing
    max_concurrent_models: int = 1       # Maximum concurrent model training
    early_stopping_rounds: int = None    # Early stopping configuration
    
    # Inference Optimization
    optimize_for_deployment: bool = False  # Optimize for deployment
    model_compression: bool = False        # Enable model compression

Usage Examples

Basic Preset Usage

from autogluon.tabular import TabularPredictor
import pandas as pd

# Load data
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')

# Different preset configurations
presets = ['good_quality', 'best_quality', 'optimize_for_deployment', 'interpretable']

results = {}
for preset in presets:
    print(f"\nTraining with preset: {preset}")
    
    predictor = TabularPredictor(
        label='target',
        path=f'./models_{preset}/'
    )
    
    predictor.fit(
        train_data,
        presets=preset,
        time_limit=600  # 10 minutes per preset
    )
    
    # Evaluate performance
    performance = predictor.evaluate(test_data)
    leaderboard = predictor.leaderboard(test_data)
    
    results[preset] = {
        'score': performance,
        'best_model': leaderboard.iloc[0]['model'],
        'num_models': len(leaderboard)
    }
    
    print(f"Best score: {performance}")
    print(f"Best model: {results[preset]['best_model']}")
    print(f"Total models trained: {results[preset]['num_models']}")

# Compare results
print("\nPreset Comparison:")
for preset, result in results.items():
    print(f"{preset}: {result['score']:.4f} ({result['num_models']} models)")

Custom Hyperparameter Configuration

from autogluon.tabular import TabularPredictor

# Advanced hyperparameter configuration
hyperparameters = {
    # Gradient Boosting Models
    'LGB': [
        # Fast configuration
        {
            'num_leaves': 31,
            'learning_rate': 0.1,
            'feature_fraction': 0.9,
            'bagging_fraction': 0.8,
            'bagging_freq': 5,
            'min_data_in_leaf': 20,
            'objective': 'binary',
            'max_depth': -1,
            'save_binary': True,
            'ag_args': {'name_suffix': '_Fast', 'priority': 1}
        },
        # Accurate configuration
        {
            'num_leaves': 127,
            'learning_rate': 0.05,
            'feature_fraction': 0.8,
            'bagging_fraction': 0.9,
            'bagging_freq': 5,
            'min_data_in_leaf': 10,
            'reg_alpha': 0.1,
            'reg_lambda': 0.1,
            'ag_args': {'name_suffix': '_Accurate', 'priority': 2}
        }
    ],
    
    'XGB': {
        'n_estimators': [100, 300, 500],
        'max_depth': [3, 6, 10],
        'learning_rate': [0.01, 0.1, 0.2],
        'subsample': [0.8, 0.9, 1.0],
        'colsample_bytree': [0.8, 0.9, 1.0],
        'reg_alpha': [0, 0.1, 1],
        'reg_lambda': [0, 0.1, 1]
    },
    
    # Neural Networks
    'NN_TORCH': [
        # Small network
        {
            'num_epochs': 50,
            'learning_rate': 0.001,
            'weight_decay': 1e-4,
            'dropout_prob': 0.1,
            'embedding_size_factor': 1.0,
            'ag_args': {'name_suffix': '_Small'}
        },
        # Large network
        {
            'num_epochs': 100,
            'learning_rate': 0.0005,
            'weight_decay': 1e-5,
            'dropout_prob': 0.2,
            'embedding_size_factor': 2.0,
            'ag_args': {'name_suffix': '_Large'}
        }
    ]
}

# Train with custom hyperparameters
predictor = TabularPredictor(label='target')
predictor.fit(
    train_data,
    hyperparameters=hyperparameters,
    time_limit=1800,  # 30 minutes
    num_bag_folds=5,
    num_stack_levels=2
)

Advanced Training Configuration

from autogluon.tabular import TabularPredictor

# Advanced training arguments
ag_args_fit = {
    'num_cpus': 8,                    # Use 8 CPU cores
    'num_gpus': 1,                    # Use 1 GPU
    'memory_limit': 16000,            # 16GB memory limit
    'time_limit': 300,                # 5 minutes per model
}

ag_args_ensemble = {
    'fold_fitting_strategy': 'sequential_local',
    'auto_stack': True,
    'bagging_mode': 'oob',           # Out-of-bag validation
    'stack_mode': 'infer',
    'ensemble_size_max': 50          # Maximum ensemble size
}

# Feature generation configuration
feature_generator_kwargs = {
    'enable_raw_text_features': True,
    'enable_nlp_features': True,
    'text_ngram_size': 300,
    'text_special_features': ['word_count', 'char_count']
}

predictor = TabularPredictor(
    label='target',
    eval_metric='roc_auc',
    sample_weight='sample_weights'
)

predictor.fit(
    train_data,
    tuning_data=validation_data,
    time_limit=3600,                  # 1 hour total
    presets='best_quality',
    
    # Advanced configurations
    ag_args_fit=ag_args_fit,
    ag_args_ensemble=ag_args_ensemble,
    feature_generator_kwargs=feature_generator_kwargs,
    
    # Bagging and stacking
    num_bag_folds=10,
    num_bag_sets=3,
    num_stack_levels=3,
    
    # Model selection
    excluded_model_types=['KNN'],      # Exclude slow models
    
    # Hyperparameter tuning
    hyperparameter_tune_kwargs={
        'scheduler': 'local',
        'searcher': 'bayesopt',
        'num_trials': 100
    }
)

Deployment Optimization Configuration

from autogluon.tabular import TabularPredictor

# Configuration optimized for deployment
deployment_hyperparameters = {
    'LGB': {
        'num_leaves': 31,              # Smaller trees
        'max_depth': 6,
        'min_data_in_leaf': 50,        # Regularization
        'bagging_freq': 0,             # Disable bagging for speed
        'feature_fraction': 1.0,       # Use all features
    },
    'CAT': {
        'iterations': 100,             # Fewer iterations
        'depth': 6,
        'l2_leaf_reg': 3,
        'bootstrap_type': 'No'         # Disable bootstrap
    }
}

predictor = TabularPredictor(
    label='target',
    path='./deployment_model/'
)

predictor.fit(
    train_data,
    presets='optimize_for_deployment',
    hyperparameters=deployment_hyperparameters,
    time_limit=300,                    # Fast training
    num_bag_folds=0,                   # Disable bagging
    num_stack_levels=0,                # Disable stacking
    
    # Focus on fast, simple models
    included_model_types=['LGB', 'CAT', 'LR']
)

# Create deployment-optimized clone
deployment_predictor = predictor.clone_for_deployment(
    path='./deployment_ready/',
    model='best'                       # Single best model only
)

# Test inference speed
import time
start_time = time.time()
predictions = deployment_predictor.predict(test_data)
inference_time = time.time() - start_time

print(f"Inference time: {inference_time:.3f} seconds")
print(f"Predictions per second: {len(test_data) / inference_time:.0f}")

Interpretable Model Configuration

from autogluon.tabular import TabularPredictor

# Configuration for interpretable models
interpretable_hyperparameters = {
    'LR': {                            # Logistic Regression
        'C': [0.01, 0.1, 1.0, 10],     # Regularization
        'penalty': ['l1', 'l2'],
        'solver': ['liblinear', 'saga']
    },
    'RF': {                            # Random Forest
        'n_estimators': [50, 100, 200],
        'max_depth': [3, 5, 10],       # Limit depth for interpretability
        'min_samples_split': [10, 20, 50],
        'max_features': ['sqrt', 'log2']
    },
    'XGB': {                           # XGBoost (regularized)
        'n_estimators': [50, 100],
        'max_depth': [3, 4, 5],        # Shallow trees
        'learning_rate': [0.1, 0.2],
        'reg_alpha': [0.1, 1.0],       # L1 regularization
        'reg_lambda': [0.1, 1.0]       # L2 regularization
    }
}

predictor = TabularPredictor(
    label='target',
    eval_metric='accuracy'
)

predictor.fit(
    train_data,
    presets='interpretable',
    hyperparameters=interpretable_hyperparameters,
    
    # Enable only interpretable models
    included_model_types=['LR', 'RF', 'XGB'],
    
    # Simpler ensemble strategies
    num_bag_folds=3,
    num_stack_levels=1,
    
    # Feature processing for interpretability
    feature_generator='auto'           # Minimal feature engineering
)

# Analyze model interpretability
leaderboard = predictor.leaderboard(extra_info=True)
print("Interpretable models ranking:")
print(leaderboard[['model', 'score_val', 'fit_time']].head())

Configuration Reference

Preset Details

PresetTraining TimeModel DiversityEnsemblingBest For
medium_qualityLowMediumNoneQuick prototyping, default preset
good_qualityMediumHighModerateGeneral use, balanced performance
high_qualityHighHighExtensiveHigh accuracy with fast inference
best_qualityVery HighVery HighExtensiveMaximum accuracy, competitions
optimize_for_deployment---Post-training optimization
interpretableLowLimitedSimpleRegulated industries, explainability

Model Type Abbreviations

CodeFull NameCategory
LGBLightGBMGradient Boosting
XGBXGBoostGradient Boosting
CATCatBoostGradient Boosting
RFRandom ForestTree Ensemble
XTExtra TreesTree Ensemble
LRLinear/Logistic RegressionLinear
KNNK-Nearest NeighborsInstance-based
NN_TORCHPyTorch Neural NetworkDeep Learning
FASTAIFastAI Neural NetworkDeep Learning
TABPFNTabPFNFoundation Model

Resource Configuration Guidelines

Use CaseCPU CoresMemory (GB)Time LimitBag Folds
Quick Prototype2-44-85-15 min2-3
Production Model8-1616-3230-60 min5-10
Competition16-3232-642-8 hours10-20
Large Dataset16+64+4+ hours5-10

Install with Tessl CLI

npx tessl i tessl/pypi-autogluon--tabular

docs

configurations.md

experimental.md

index.md

models.md

predictor.md

tile.json