0
# Model Evaluation Framework
1
2
CatBoost provides a comprehensive evaluation framework for conducting statistical tests, performance comparisons, and model validation. This framework enables rigorous analysis of model performance with statistical significance testing and confidence interval calculations.
3
4
## Capabilities
5
6
### Evaluation Infrastructure
7
8
Core classes for managing evaluation processes and organizing results.
9
10
```python { .api }
11
class EvalType:
12
"""
13
Enumeration of evaluation types for CatBoost models.
14
15
Defines different modes of model evaluation and comparison.
16
"""
17
18
class CatboostEvaluation:
19
"""
20
Main evaluation class for conducting comprehensive model assessments.
21
22
Provides infrastructure for running evaluations, collecting metrics,
23
and managing evaluation workflows across multiple models and datasets.
24
"""
25
26
class ExecutionCase:
27
"""
28
Represents a single execution case in an evaluation workflow.
29
30
Manages the configuration, execution, and results of individual
31
evaluation runs within a larger evaluation framework.
32
"""
33
```
34
35
### Scoring and Configuration
36
37
Configuration classes for defining evaluation metrics and scoring approaches.
38
39
```python { .api }
40
class ScoreType:
41
"""
42
Enumeration of score types for evaluation metrics.
43
44
Defines different approaches to scoring and metric calculation
45
during model evaluation processes.
46
"""
47
48
class ScoreConfig:
49
"""
50
Configuration class for evaluation scoring parameters.
51
52
Manages scoring configuration including metric types,
53
calculation methods, and evaluation parameters.
54
"""
55
```
56
57
### Evaluation Results
58
59
Result classes for organizing and accessing evaluation outcomes.
60
61
```python { .api }
62
class CaseEvaluationResult:
63
"""
64
Results from a single evaluation case.
65
66
Contains performance metrics, statistical measures, and
67
evaluation outcomes for individual test cases.
68
"""
69
70
class MetricEvaluationResult:
71
"""
72
Results for specific metric evaluations.
73
74
Stores detailed results for individual metrics including
75
values, confidence intervals, and statistical significance.
76
"""
77
78
class EvaluationResults:
79
"""
80
Container for comprehensive evaluation results.
81
82
Aggregates results across multiple cases, metrics, and
83
evaluation runs for comprehensive analysis.
84
"""
85
```
86
87
### Statistical Analysis
88
89
Statistical testing and confidence interval calculation functions.
90
91
```python { .api }
92
def calc_wilcoxon_test():
93
"""
94
Calculate Wilcoxon signed-rank test for paired samples.
95
96
Performs non-parametric statistical test to compare
97
paired samples and determine statistical significance.
98
99
Returns:
100
Statistical test results with p-values and significance indicators
101
"""
102
103
def calc_bootstrap_ci_for_mean():
104
"""
105
Calculate bootstrap confidence intervals for mean values.
106
107
Uses bootstrap resampling to estimate confidence intervals
108
for sample means, providing robust statistical inference.
109
110
Returns:
111
Confidence interval bounds and bootstrap statistics
112
"""
113
```
114
115
### Utility Functions
116
117
Helper functions for evaluation workflow management and result processing.
118
119
```python { .api }
120
def make_dirs_if_not_exists():
121
"""
122
Create directories if they don't exist.
123
124
Utility function for managing directory structure
125
during evaluation workflows and result storage.
126
"""
127
128
def series_to_line():
129
"""
130
Convert data series to line representation.
131
132
Transforms evaluation data series into line format
133
for visualization and analysis purposes.
134
"""
135
136
def save_plot():
137
"""
138
Save evaluation plots to files.
139
140
Handles saving of evaluation visualizations,
141
charts, and plots generated during analysis.
142
"""
143
```
144
145
## Evaluation Examples
146
147
### Basic Model Evaluation
148
149
```python
150
from catboost.eval import CatboostEvaluation, ExecutionCase, ScoreConfig
151
from catboost import CatBoostClassifier
152
153
# Set up evaluation configuration
154
score_config = ScoreConfig()
155
evaluation = CatboostEvaluation()
156
157
# Create execution case
158
case = ExecutionCase()
159
160
# Configure evaluation parameters
161
# (Detailed configuration depends on specific evaluation needs)
162
163
# Run evaluation
164
results = evaluation.run_evaluation(case, score_config)
165
166
print("Evaluation completed")
167
print(f"Results: {results}")
168
```
169
170
### Statistical Significance Testing
171
172
```python
173
from catboost.eval import calc_wilcoxon_test, calc_bootstrap_ci_for_mean
174
175
# Perform Wilcoxon test on model comparison results
176
# (Assumes you have paired performance metrics from two models)
177
wilcoxon_results = calc_wilcoxon_test()
178
print(f"Wilcoxon test results: {wilcoxon_results}")
179
180
# Calculate bootstrap confidence intervals
181
bootstrap_ci = calc_bootstrap_ci_for_mean()
182
print(f"Bootstrap confidence interval: {bootstrap_ci}")
183
```
184
185
### Evaluation Workflow Management
186
187
```python
188
from catboost.eval import make_dirs_if_not_exists, save_plot, series_to_line
189
190
# Set up evaluation directory structure
191
make_dirs_if_not_exists()
192
193
# Process evaluation data
194
line_data = series_to_line()
195
196
# Save evaluation visualizations
197
save_plot()
198
199
print("Evaluation workflow completed")
200
```
201
202
## Integration with Core CatBoost
203
204
The evaluation framework integrates seamlessly with core CatBoost functionality:
205
206
```python
207
from catboost import CatBoostClassifier, Pool
208
from catboost.eval import CatboostEvaluation, EvaluationResults
209
210
# Train models for comparison
211
model1 = CatBoostClassifier(iterations=100, depth=4)
212
model2 = CatBoostClassifier(iterations=200, depth=6)
213
214
# Prepare evaluation data
215
train_pool = Pool(X_train, y_train, cat_features=cat_features)
216
test_pool = Pool(X_test, y_test, cat_features=cat_features)
217
218
# Set up comprehensive evaluation
219
evaluation = CatboostEvaluation()
220
221
# Configure evaluation to compare both models
222
# (Specific configuration depends on evaluation requirements)
223
224
# Execute evaluation
225
results = evaluation.compare_models(model1, model2, test_pool)
226
227
print("Model comparison evaluation completed")
228
```