CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-catboost

CatBoost is a fast, scalable, high performance gradient boosting on decision trees library used for ranking, classification, regression and other ML tasks.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

index.mddocs/

CatBoost

CatBoost is a fast, scalable, high performance gradient boosting on decision trees library. Used for ranking, classification, regression and other ML tasks. CatBoost provides superior quality compared to other GBDT libraries, best-in-class prediction speed, native GPU and multi-GPU support, built-in visualization tools, and distributed training capabilities.

Package Information

  • Package Name: catboost
  • Package Type: pypi
  • Language: Python
  • Installation: pip install catboost

Core Imports

import catboost

Common for working with models:

from catboost import CatBoostClassifier, CatBoostRegressor, CatBoostRanker
from catboost import Pool, cv, train

Submodule imports:

# Dataset utilities
from catboost import datasets
# or specific functions
from catboost.datasets import titanic, adult, amazon

# Utility functions
from catboost import utils
# or specific functions  
from catboost.utils import eval_metric, get_roc_curve, create_cd

# Evaluation framework
from catboost import eval
# or specific classes
from catboost.eval import CatboostEvaluation, EvaluationResults

# Metrics framework
from catboost import metrics
# or specific metrics
from catboost.metrics import Logloss, AUC, RMSE

# Text processing
from catboost.text_processing import Tokenizer, Dictionary

# Model interpretation
from catboost.monoforest import to_polynom, explain_features

Basic Usage

from catboost import CatBoostClassifier, Pool
import pandas as pd
import numpy as np

# Prepare data
train_data = pd.DataFrame({
    'feature1': np.random.randn(1000),
    'feature2': np.random.randn(1000),
    'category': np.random.choice(['A', 'B', 'C'], 1000)
})
train_labels = np.random.randint(0, 2, 1000)

# Create CatBoost pool with categorical features
train_pool = Pool(
    data=train_data,
    label=train_labels,
    cat_features=['category']
)

# Initialize and train classifier
model = CatBoostClassifier(
    iterations=100,
    learning_rate=0.1,
    depth=6,
    verbose=True
)

model.fit(train_pool)

# Make predictions
predictions = model.predict(train_data)
probabilities = model.predict_proba(train_data)

# Get feature importance
feature_importance = model.get_feature_importance()

Architecture

CatBoost is built around several key components:

  • Model Classes: CatBoost, CatBoostClassifier, CatBoostRegressor, and CatBoostRanker provide different interfaces for gradient boosting tasks
  • Data Handling: Pool class efficiently manages training data with categorical features, text features, and metadata
  • Training Pipeline: Support for cross-validation, hyperparameter tuning, and early stopping
  • Feature Analysis: Comprehensive feature importance, SHAP values, and automatic feature selection
  • GPU Acceleration: Native GPU support for training and prediction across multiple devices

Capabilities

Core Model Classes

Scikit-learn compatible classifier, regressor, and ranker implementations with the base CatBoost class providing the core gradient boosting functionality.

class CatBoostClassifier:
    def __init__(self, iterations=500, learning_rate=None, depth=6, l2_leaf_reg=3.0, 
                 loss_function='Logloss', **kwargs): ...
    def fit(self, X, y, cat_features=None, sample_weight=None, baseline=None, 
            use_best_model=None, eval_set=None, **kwargs): ...
    def predict(self, data, prediction_type='Class', **kwargs): ...
    def predict_proba(self, X, **kwargs): ...

class CatBoostRegressor:
    def __init__(self, iterations=500, learning_rate=None, depth=6, l2_leaf_reg=3.0,
                 loss_function='RMSE', **kwargs): ...
    def fit(self, X, y, **kwargs): ...
    def predict(self, data, **kwargs): ...

class CatBoostRanker:
    def __init__(self, iterations=500, learning_rate=None, depth=6, l2_leaf_reg=3.0,
                 loss_function='YetiRank', **kwargs): ...
    def fit(self, X, y, **kwargs): ...
    def predict(self, data, **kwargs): ...

Core Model Classes

Data Handling

Pool class and FeaturesData for efficient data management with categorical features, text features, embeddings, and metadata like groups and weights.

class Pool:
    def __init__(self, data, label=None, cat_features=None, text_features=None,
                 embedding_features=None, column_description=None, pairs=None, 
                 delimiter='\t', has_header=False, weight=None, group_id=None, 
                 **kwargs): ...
    def slice(self, rindex): ...
    def save(self, fname): ...
    def quantize(self, **kwargs): ...

class FeaturesData:
    # Container for feature data with metadata
    ...

Data Handling

Training and Evaluation

Cross-validation, training functions, and model evaluation utilities for comprehensive model development and assessment.

def train(pool, params=None, dtrain=None, logging_level=None, verbose=None, 
          iterations=None, **kwargs): ...

def cv(pool, params=None, dtrain=None, iterations=None, num_boost_round=None,
       fold_count=3, inverted=False, shuffle=True, partition_random_seed=0,
       stratified=None, **kwargs): ...

def sample_gaussian_process(X, y, **kwargs): ...

Training and Evaluation

Feature Analysis

Feature importance calculation, SHAP values, feature selection algorithms, and interpretability tools for understanding model behavior.

# Enums for feature analysis
class EFstrType:
    PredictionValuesChange = 0
    LossFunctionChange = 1
    FeatureImportance = 2
    Interaction = 3
    ShapValues = 4
    PredictionDiff = 5
    ShapInteractionValues = 6
    SageValues = 7

class EShapCalcType:
    Regular = "Regular"
    Approximate = "Approximate"
    Exact = "Exact"

class EFeaturesSelectionAlgorithm:
    RecursiveByPredictionValuesChange = "RecursiveByPredictionValuesChange"
    RecursiveByLossFunctionChange = "RecursiveByLossFunctionChange"
    RecursiveByShapValues = "RecursiveByShapValues"

class EFeaturesSelectionGrouping:
    Individual = "Individual"
    ByTags = "ByTags"

Feature Analysis

Utility Functions

Model conversion, GPU utilities, metric evaluation, confusion matrices, ROC curves, and threshold selection tools.

def sum_models(models, weights=None, ctr_merge_policy='IntersectingCountersAverage'): ...
def to_regressor(model): ...
def to_classifier(model): ...
def to_ranker(model): ...

# From catboost.utils
def eval_metric(label, approx, metric, weight=None, group_id=None, **kwargs): ...
def get_gpu_device_count(): ...
def get_confusion_matrix(model, data, thread_count=-1): ...
def get_roc_curve(model, data, thread_count=-1, plot=False): ...
def select_threshold(model, data, curve=None, FPR=None, FNR=None, thread_count=-1): ...

Utilities

Dataset Utilities

Built-in datasets for testing and learning, including Titanic, Amazon, IMDB, Adult, Higgs, and ranking datasets.

# From catboost.datasets
def titanic(): ...
def amazon(): ...
def adult(): ...
def imdb(): ...
def higgs(): ...
def msrank(): ...
def msrank_10k(): ...
def epsilon(): ...
def rotten_tomatoes(): ...
def monotonic1(): ...
def monotonic2(): ...
def set_cache_path(path): ...

Dataset Utilities

Visualization

Interactive widgets for Jupyter notebooks, metrics plotting, and compatibility with XGBoost and LightGBM plotting callbacks.

# From catboost.widget (conditionally imported)
class MetricVisualizer:
    # Interactive metric visualization widget for Jupyter
    ...

class MetricsPlotter:
    # Plotting utility for training metrics
    ...

def XGBPlottingCallback(): ...
def lgbm_plotting_callback(): ...

Visualization

Advanced Features

Text processing, monoforest model interpretation, custom metrics and objectives for specialized use cases.

# Custom metrics and objectives
class MultiRegressionCustomMetric: ...
class MultiRegressionCustomObjective: ...
class MultiTargetCustomMetric: ...  # Alias
class MultiTargetCustomObjective: ...  # Alias

# From catboost.text_processing
class Tokenizer: ...
class Dictionary: ...

# From catboost.monoforest
def to_polynom(model): ...
def to_polynom_string(model): ...
def explain_features(model): ...
class FeatureExplanation: ...

Advanced Features

Model Evaluation Framework

Comprehensive evaluation framework for statistical testing, performance comparisons, and model validation with confidence intervals.

# From catboost.eval
class EvalType: ...
class CatboostEvaluation: ...
class ScoreType: ...
class ScoreConfig: ...
class CaseEvaluationResult: ...
class MetricEvaluationResult: ...
class EvaluationResults: ...
class ExecutionCase: ...

def calc_wilcoxon_test(): ...
def calc_bootstrap_ci_for_mean(): ...
def make_dirs_if_not_exists(): ...
def series_to_line(): ...
def save_plot(): ...

Model Evaluation Framework

Metrics Framework

Dynamic metric classes for evaluating model performance across classification, regression, and ranking tasks.

# From catboost.metrics
class BuiltinMetric:
    def eval(self, label, approx, weight=None, group_id=None, **kwargs): ...
    def is_max_optimal(self): ...
    def is_min_optimal(self): ...
    def set_hints(self, **hints): ...
    @staticmethod
    def params_with_defaults(): ...

# Dynamically generated metric classes (examples)
class Logloss(BuiltinMetric): ...
class CrossEntropy(BuiltinMetric): ...
class Accuracy(BuiltinMetric): ...
class AUC(BuiltinMetric): ...
class RMSE(BuiltinMetric): ...
class MAE(BuiltinMetric): ...
class NDCG(BuiltinMetric): ...
class MAP(BuiltinMetric): ...

Metrics Framework

Constants and Exceptions

class CatBoostError(Exception):
    """Main exception class for CatBoost errors."""
    ...

# Compatibility alias
CatboostError = CatBoostError

__version__: str  # Currently '1.2.8'

Install with Tessl CLI

npx tessl i tessl/pypi-catboost

docs

advanced-features.md

core-models.md

data-handling.md

datasets.md

evaluation.md

feature-analysis.md

index.md

metrics.md

training-evaluation.md

utilities.md

visualization.md

tile.json