Tessl Tile for pypi/catboost@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-features.md core-models.md data-handling.md datasets.md evaluation.md feature-analysis.md index.md metrics.md training-evaluation.md utilities.md visualization.md

evaluation.mddocs/

0
# Model Evaluation Framework
1

2
CatBoost provides a comprehensive evaluation framework for conducting statistical tests, performance comparisons, and model validation. This framework enables rigorous analysis of model performance with statistical significance testing and confidence interval calculations.
3

4
## Capabilities
5

6
### Evaluation Infrastructure
7

8
Core classes for managing evaluation processes and organizing results.
9

10
```python { .api }
11
class EvalType:
12
    """
13
    Enumeration of evaluation types for CatBoost models.
14
    
15
    Defines different modes of model evaluation and comparison.
16
    """
17

18
class CatboostEvaluation:
19
    """
20
    Main evaluation class for conducting comprehensive model assessments.
21
    
22
    Provides infrastructure for running evaluations, collecting metrics,
23
    and managing evaluation workflows across multiple models and datasets.
24
    """
25

26
class ExecutionCase:
27
    """
28
    Represents a single execution case in an evaluation workflow.
29
    
30
    Manages the configuration, execution, and results of individual
31
    evaluation runs within a larger evaluation framework.
32
    """
33
```
34

35
### Scoring and Configuration
36

37
Configuration classes for defining evaluation metrics and scoring approaches.
38

39
```python { .api }
40
class ScoreType:
41
    """
42
    Enumeration of score types for evaluation metrics.
43
    
44
    Defines different approaches to scoring and metric calculation
45
    during model evaluation processes.
46
    """
47

48
class ScoreConfig:
49
    """
50
    Configuration class for evaluation scoring parameters.
51
    
52
    Manages scoring configuration including metric types,
53
    calculation methods, and evaluation parameters.
54
    """
55
```
56

57
### Evaluation Results
58

59
Result classes for organizing and accessing evaluation outcomes.
60

61
```python { .api }
62
class CaseEvaluationResult:
63
    """
64
    Results from a single evaluation case.
65
    
66
    Contains performance metrics, statistical measures, and
67
    evaluation outcomes for individual test cases.
68
    """
69

70
class MetricEvaluationResult:
71
    """
72
    Results for specific metric evaluations.
73
    
74
    Stores detailed results for individual metrics including
75
    values, confidence intervals, and statistical significance.
76
    """
77

78
class EvaluationResults:
79
    """
80
    Container for comprehensive evaluation results.
81
    
82
    Aggregates results across multiple cases, metrics, and
83
    evaluation runs for comprehensive analysis.
84
    """
85
```
86

87
### Statistical Analysis
88

89
Statistical testing and confidence interval calculation functions.
90

91
```python { .api }
92
def calc_wilcoxon_test():
93
    """
94
    Calculate Wilcoxon signed-rank test for paired samples.
95
    
96
    Performs non-parametric statistical test to compare
97
    paired samples and determine statistical significance.
98
    
99
    Returns:
100
    Statistical test results with p-values and significance indicators
101
    """
102

103
def calc_bootstrap_ci_for_mean():
104
    """
105
    Calculate bootstrap confidence intervals for mean values.
106
    
107
    Uses bootstrap resampling to estimate confidence intervals
108
    for sample means, providing robust statistical inference.
109
    
110
    Returns:
111
    Confidence interval bounds and bootstrap statistics
112
    """
113
```
114

115
### Utility Functions
116

117
Helper functions for evaluation workflow management and result processing.
118

119
```python { .api }
120
def make_dirs_if_not_exists():
121
    """
122
    Create directories if they don't exist.
123
    
124
    Utility function for managing directory structure
125
    during evaluation workflows and result storage.
126
    """
127

128
def series_to_line():
129
    """
130
    Convert data series to line representation.
131
    
132
    Transforms evaluation data series into line format
133
    for visualization and analysis purposes.
134
    """
135

136
def save_plot():
137
    """
138
    Save evaluation plots to files.
139
    
140
    Handles saving of evaluation visualizations,
141
    charts, and plots generated during analysis.
142
    """
143
```
144

145
## Evaluation Examples
146

147
### Basic Model Evaluation
148

149
```python
150
from catboost.eval import CatboostEvaluation, ExecutionCase, ScoreConfig
151
from catboost import CatBoostClassifier
152

153
# Set up evaluation configuration
154
score_config = ScoreConfig()
155
evaluation = CatboostEvaluation()
156

157
# Create execution case
158
case = ExecutionCase()
159

160
# Configure evaluation parameters
161
# (Detailed configuration depends on specific evaluation needs)
162

163
# Run evaluation
164
results = evaluation.run_evaluation(case, score_config)
165

166
print("Evaluation completed")
167
print(f"Results: {results}")
168
```
169

170
### Statistical Significance Testing
171

172
```python
173
from catboost.eval import calc_wilcoxon_test, calc_bootstrap_ci_for_mean
174

175
# Perform Wilcoxon test on model comparison results
176
# (Assumes you have paired performance metrics from two models)
177
wilcoxon_results = calc_wilcoxon_test()
178
print(f"Wilcoxon test results: {wilcoxon_results}")
179

180
# Calculate bootstrap confidence intervals
181
bootstrap_ci = calc_bootstrap_ci_for_mean()
182
print(f"Bootstrap confidence interval: {bootstrap_ci}")
183
```
184

185
### Evaluation Workflow Management
186

187
```python
188
from catboost.eval import make_dirs_if_not_exists, save_plot, series_to_line
189

190
# Set up evaluation directory structure
191
make_dirs_if_not_exists()
192

193
# Process evaluation data
194
line_data = series_to_line()
195

196
# Save evaluation visualizations
197
save_plot()
198

199
print("Evaluation workflow completed")
200
```
201

202
## Integration with Core CatBoost
203

204
The evaluation framework integrates seamlessly with core CatBoost functionality:
205

206
```python
207
from catboost import CatBoostClassifier, Pool
208
from catboost.eval import CatboostEvaluation, EvaluationResults
209

210
# Train models for comparison
211
model1 = CatBoostClassifier(iterations=100, depth=4)
212
model2 = CatBoostClassifier(iterations=200, depth=6)
213

214
# Prepare evaluation data
215
train_pool = Pool(X_train, y_train, cat_features=cat_features)
216
test_pool = Pool(X_test, y_test, cat_features=cat_features)
217

218
# Set up comprehensive evaluation
219
evaluation = CatboostEvaluation()
220

221
# Configure evaluation to compare both models
222
# (Specific configuration depends on evaluation requirements)
223

224
# Execute evaluation
225
results = evaluation.compare_models(model1, model2, test_pool)
226

227
print("Model comparison evaluation completed")
228
```

Version

Tile

Files

evaluation.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

evaluation.mddocs/