or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-features.mdcore-models.mddata-handling.mddatasets.mdevaluation.mdfeature-analysis.mdindex.mdmetrics.mdtraining-evaluation.mdutilities.mdvisualization.md

evaluation.mddocs/

0

# Model Evaluation Framework

1

2

CatBoost provides a comprehensive evaluation framework for conducting statistical tests, performance comparisons, and model validation. This framework enables rigorous analysis of model performance with statistical significance testing and confidence interval calculations.

3

4

## Capabilities

5

6

### Evaluation Infrastructure

7

8

Core classes for managing evaluation processes and organizing results.

9

10

```python { .api }

11

class EvalType:

12

"""

13

Enumeration of evaluation types for CatBoost models.

14

15

Defines different modes of model evaluation and comparison.

16

"""

17

18

class CatboostEvaluation:

19

"""

20

Main evaluation class for conducting comprehensive model assessments.

21

22

Provides infrastructure for running evaluations, collecting metrics,

23

and managing evaluation workflows across multiple models and datasets.

24

"""

25

26

class ExecutionCase:

27

"""

28

Represents a single execution case in an evaluation workflow.

29

30

Manages the configuration, execution, and results of individual

31

evaluation runs within a larger evaluation framework.

32

"""

33

```

34

35

### Scoring and Configuration

36

37

Configuration classes for defining evaluation metrics and scoring approaches.

38

39

```python { .api }

40

class ScoreType:

41

"""

42

Enumeration of score types for evaluation metrics.

43

44

Defines different approaches to scoring and metric calculation

45

during model evaluation processes.

46

"""

47

48

class ScoreConfig:

49

"""

50

Configuration class for evaluation scoring parameters.

51

52

Manages scoring configuration including metric types,

53

calculation methods, and evaluation parameters.

54

"""

55

```

56

57

### Evaluation Results

58

59

Result classes for organizing and accessing evaluation outcomes.

60

61

```python { .api }

62

class CaseEvaluationResult:

63

"""

64

Results from a single evaluation case.

65

66

Contains performance metrics, statistical measures, and

67

evaluation outcomes for individual test cases.

68

"""

69

70

class MetricEvaluationResult:

71

"""

72

Results for specific metric evaluations.

73

74

Stores detailed results for individual metrics including

75

values, confidence intervals, and statistical significance.

76

"""

77

78

class EvaluationResults:

79

"""

80

Container for comprehensive evaluation results.

81

82

Aggregates results across multiple cases, metrics, and

83

evaluation runs for comprehensive analysis.

84

"""

85

```

86

87

### Statistical Analysis

88

89

Statistical testing and confidence interval calculation functions.

90

91

```python { .api }

92

def calc_wilcoxon_test():

93

"""

94

Calculate Wilcoxon signed-rank test for paired samples.

95

96

Performs non-parametric statistical test to compare

97

paired samples and determine statistical significance.

98

99

Returns:

100

Statistical test results with p-values and significance indicators

101

"""

102

103

def calc_bootstrap_ci_for_mean():

104

"""

105

Calculate bootstrap confidence intervals for mean values.

106

107

Uses bootstrap resampling to estimate confidence intervals

108

for sample means, providing robust statistical inference.

109

110

Returns:

111

Confidence interval bounds and bootstrap statistics

112

"""

113

```

114

115

### Utility Functions

116

117

Helper functions for evaluation workflow management and result processing.

118

119

```python { .api }

120

def make_dirs_if_not_exists():

121

"""

122

Create directories if they don't exist.

123

124

Utility function for managing directory structure

125

during evaluation workflows and result storage.

126

"""

127

128

def series_to_line():

129

"""

130

Convert data series to line representation.

131

132

Transforms evaluation data series into line format

133

for visualization and analysis purposes.

134

"""

135

136

def save_plot():

137

"""

138

Save evaluation plots to files.

139

140

Handles saving of evaluation visualizations,

141

charts, and plots generated during analysis.

142

"""

143

```

144

145

## Evaluation Examples

146

147

### Basic Model Evaluation

148

149

```python

150

from catboost.eval import CatboostEvaluation, ExecutionCase, ScoreConfig

151

from catboost import CatBoostClassifier

152

153

# Set up evaluation configuration

154

score_config = ScoreConfig()

155

evaluation = CatboostEvaluation()

156

157

# Create execution case

158

case = ExecutionCase()

159

160

# Configure evaluation parameters

161

# (Detailed configuration depends on specific evaluation needs)

162

163

# Run evaluation

164

results = evaluation.run_evaluation(case, score_config)

165

166

print("Evaluation completed")

167

print(f"Results: {results}")

168

```

169

170

### Statistical Significance Testing

171

172

```python

173

from catboost.eval import calc_wilcoxon_test, calc_bootstrap_ci_for_mean

174

175

# Perform Wilcoxon test on model comparison results

176

# (Assumes you have paired performance metrics from two models)

177

wilcoxon_results = calc_wilcoxon_test()

178

print(f"Wilcoxon test results: {wilcoxon_results}")

179

180

# Calculate bootstrap confidence intervals

181

bootstrap_ci = calc_bootstrap_ci_for_mean()

182

print(f"Bootstrap confidence interval: {bootstrap_ci}")

183

```

184

185

### Evaluation Workflow Management

186

187

```python

188

from catboost.eval import make_dirs_if_not_exists, save_plot, series_to_line

189

190

# Set up evaluation directory structure

191

make_dirs_if_not_exists()

192

193

# Process evaluation data

194

line_data = series_to_line()

195

196

# Save evaluation visualizations

197

save_plot()

198

199

print("Evaluation workflow completed")

200

```

201

202

## Integration with Core CatBoost

203

204

The evaluation framework integrates seamlessly with core CatBoost functionality:

205

206

```python

207

from catboost import CatBoostClassifier, Pool

208

from catboost.eval import CatboostEvaluation, EvaluationResults

209

210

# Train models for comparison

211

model1 = CatBoostClassifier(iterations=100, depth=4)

212

model2 = CatBoostClassifier(iterations=200, depth=6)

213

214

# Prepare evaluation data

215

train_pool = Pool(X_train, y_train, cat_features=cat_features)

216

test_pool = Pool(X_test, y_test, cat_features=cat_features)

217

218

# Set up comprehensive evaluation

219

evaluation = CatboostEvaluation()

220

221

# Configure evaluation to compare both models

222

# (Specific configuration depends on evaluation requirements)

223

224

# Execute evaluation

225

results = evaluation.compare_models(model1, model2, test_pool)

226

227

print("Model comparison evaluation completed")

228

```