CatBoost is a fast, scalable, high performance gradient boosting on decision trees library used for ranking, classification, regression and other ML tasks.
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
CatBoost provides a comprehensive evaluation framework for conducting statistical tests, performance comparisons, and model validation. This framework enables rigorous analysis of model performance with statistical significance testing and confidence interval calculations.
Core classes for managing evaluation processes and organizing results.
class EvalType:
"""
Enumeration of evaluation types for CatBoost models.
Defines different modes of model evaluation and comparison.
"""
class CatboostEvaluation:
"""
Main evaluation class for conducting comprehensive model assessments.
Provides infrastructure for running evaluations, collecting metrics,
and managing evaluation workflows across multiple models and datasets.
"""
class ExecutionCase:
"""
Represents a single execution case in an evaluation workflow.
Manages the configuration, execution, and results of individual
evaluation runs within a larger evaluation framework.
"""Configuration classes for defining evaluation metrics and scoring approaches.
class ScoreType:
"""
Enumeration of score types for evaluation metrics.
Defines different approaches to scoring and metric calculation
during model evaluation processes.
"""
class ScoreConfig:
"""
Configuration class for evaluation scoring parameters.
Manages scoring configuration including metric types,
calculation methods, and evaluation parameters.
"""Result classes for organizing and accessing evaluation outcomes.
class CaseEvaluationResult:
"""
Results from a single evaluation case.
Contains performance metrics, statistical measures, and
evaluation outcomes for individual test cases.
"""
class MetricEvaluationResult:
"""
Results for specific metric evaluations.
Stores detailed results for individual metrics including
values, confidence intervals, and statistical significance.
"""
class EvaluationResults:
"""
Container for comprehensive evaluation results.
Aggregates results across multiple cases, metrics, and
evaluation runs for comprehensive analysis.
"""Statistical testing and confidence interval calculation functions.
def calc_wilcoxon_test():
"""
Calculate Wilcoxon signed-rank test for paired samples.
Performs non-parametric statistical test to compare
paired samples and determine statistical significance.
Returns:
Statistical test results with p-values and significance indicators
"""
def calc_bootstrap_ci_for_mean():
"""
Calculate bootstrap confidence intervals for mean values.
Uses bootstrap resampling to estimate confidence intervals
for sample means, providing robust statistical inference.
Returns:
Confidence interval bounds and bootstrap statistics
"""Helper functions for evaluation workflow management and result processing.
def make_dirs_if_not_exists():
"""
Create directories if they don't exist.
Utility function for managing directory structure
during evaluation workflows and result storage.
"""
def series_to_line():
"""
Convert data series to line representation.
Transforms evaluation data series into line format
for visualization and analysis purposes.
"""
def save_plot():
"""
Save evaluation plots to files.
Handles saving of evaluation visualizations,
charts, and plots generated during analysis.
"""from catboost.eval import CatboostEvaluation, ExecutionCase, ScoreConfig
from catboost import CatBoostClassifier
# Set up evaluation configuration
score_config = ScoreConfig()
evaluation = CatboostEvaluation()
# Create execution case
case = ExecutionCase()
# Configure evaluation parameters
# (Detailed configuration depends on specific evaluation needs)
# Run evaluation
results = evaluation.run_evaluation(case, score_config)
print("Evaluation completed")
print(f"Results: {results}")from catboost.eval import calc_wilcoxon_test, calc_bootstrap_ci_for_mean
# Perform Wilcoxon test on model comparison results
# (Assumes you have paired performance metrics from two models)
wilcoxon_results = calc_wilcoxon_test()
print(f"Wilcoxon test results: {wilcoxon_results}")
# Calculate bootstrap confidence intervals
bootstrap_ci = calc_bootstrap_ci_for_mean()
print(f"Bootstrap confidence interval: {bootstrap_ci}")from catboost.eval import make_dirs_if_not_exists, save_plot, series_to_line
# Set up evaluation directory structure
make_dirs_if_not_exists()
# Process evaluation data
line_data = series_to_line()
# Save evaluation visualizations
save_plot()
print("Evaluation workflow completed")The evaluation framework integrates seamlessly with core CatBoost functionality:
from catboost import CatBoostClassifier, Pool
from catboost.eval import CatboostEvaluation, EvaluationResults
# Train models for comparison
model1 = CatBoostClassifier(iterations=100, depth=4)
model2 = CatBoostClassifier(iterations=200, depth=6)
# Prepare evaluation data
train_pool = Pool(X_train, y_train, cat_features=cat_features)
test_pool = Pool(X_test, y_test, cat_features=cat_features)
# Set up comprehensive evaluation
evaluation = CatboostEvaluation()
# Configure evaluation to compare both models
# (Specific configuration depends on evaluation requirements)
# Execute evaluation
results = evaluation.compare_models(model1, model2, test_pool)
print("Model comparison evaluation completed")Install with Tessl CLI
npx tessl i tessl/pypi-catboost