0
# Metrics Framework
1
2
CatBoost provides a comprehensive metrics framework for evaluating model performance across various machine learning tasks. The framework includes built-in metrics for classification, regression, and ranking, with dynamic class generation for metric-specific functionality.
3
4
## Capabilities
5
6
### Base Metric Infrastructure
7
8
Core base class and infrastructure for all CatBoost metrics.
9
10
```python { .api }
11
class BuiltinMetric:
12
"""
13
Base class for all CatBoost built-in metrics.
14
15
Provides common interface and functionality for metric evaluation,
16
parameter validation, and configuration management across all
17
metric types in CatBoost.
18
"""
19
20
@staticmethod
21
def params_with_defaults():
22
"""
23
Get valid metric parameters with their default values.
24
25
Returns:
26
dict: Parameter names mapped to default values and mandatory flags
27
- 'default_value': Default parameter value or None
28
- 'is_mandatory': Whether parameter is required (bool)
29
"""
30
31
def __str__(self):
32
"""
33
Get string representation of the metric with parameters.
34
35
Returns:
36
str: Metric string representation
37
"""
38
39
def set_hints(self, **hints):
40
"""
41
Set hints for metric calculation (not validated).
42
43
Parameters:
44
- **hints: Arbitrary hint parameters for metric behavior
45
46
Returns:
47
self: For method chaining
48
"""
49
50
def eval(self, label, approx, weight=None, group_id=None,
51
group_weight=None, subgroup_id=None, pairs=None,
52
thread_count=-1):
53
"""
54
Evaluate metric with raw predictions and labels.
55
56
Parameters:
57
- label: True target values (array-like)
58
- approx: Model predictions (array-like)
59
- weight: Sample weights (array-like, optional)
60
- group_id: Group identifiers for ranking (array-like, optional)
61
- group_weight: Group weights (array-like, optional)
62
- subgroup_id: Subgroup identifiers (array-like, optional)
63
- pairs: Pairwise constraints for ranking (array-like or path, optional)
64
- thread_count: Number of threads for computation (int)
65
66
Returns:
67
float: Metric value
68
"""
69
70
def is_max_optimal(self):
71
"""
72
Check if higher metric values indicate better performance.
73
74
Returns:
75
bool: True if metric should be maximized, False if minimized
76
"""
77
78
def is_min_optimal(self):
79
"""
80
Check if lower metric values indicate better performance.
81
82
Returns:
83
bool: True if metric should be minimized, False if maximized
84
"""
85
```
86
87
### Dynamic Metric Classes
88
89
CatBoost dynamically generates metric classes based on the underlying C++ implementation. Each metric type has specific variants with different parameter configurations.
90
91
```python { .api }
92
# Classification Metrics (examples of dynamically generated classes)
93
class Logloss(BuiltinMetric):
94
"""Logarithmic loss for binary and multi-class classification."""
95
96
class CrossEntropy(BuiltinMetric):
97
"""Cross-entropy loss for classification tasks."""
98
99
class MultiClass(BuiltinMetric):
100
"""Multi-class classification accuracy."""
101
102
class Accuracy(BuiltinMetric):
103
"""Classification accuracy metric."""
104
105
class Precision(BuiltinMetric):
106
"""Precision metric for classification."""
107
108
class Recall(BuiltinMetric):
109
"""Recall metric for classification."""
110
111
class F1(BuiltinMetric):
112
"""F1-score metric for classification."""
113
114
class AUC(BuiltinMetric):
115
"""Area Under the ROC Curve metric."""
116
117
# Regression Metrics
118
class RMSE(BuiltinMetric):
119
"""Root Mean Squared Error for regression."""
120
121
class MAE(BuiltinMetric):
122
"""Mean Absolute Error for regression."""
123
124
class MAPE(BuiltinMetric):
125
"""Mean Absolute Percentage Error for regression."""
126
127
class R2(BuiltinMetric):
128
"""R-squared coefficient of determination."""
129
130
class MSLE(BuiltinMetric):
131
"""Mean Squared Logarithmic Error for regression."""
132
133
# Ranking Metrics
134
class NDCG(BuiltinMetric):
135
"""Normalized Discounted Cumulative Gain for ranking."""
136
137
class DCG(BuiltinMetric):
138
"""Discounted Cumulative Gain for ranking."""
139
140
class MAP(BuiltinMetric):
141
"""Mean Average Precision for ranking."""
142
143
class MRR(BuiltinMetric):
144
"""Mean Reciprocal Rank for ranking."""
145
146
class ERR(BuiltinMetric):
147
"""Expected Reciprocal Rank for ranking."""
148
```
149
150
## Metric Usage Examples
151
152
### Basic Metric Evaluation
153
154
```python
155
from catboost import metrics
156
import numpy as np
157
158
# Create sample data
159
y_true = np.array([0, 1, 1, 0, 1])
160
y_pred = np.array([0.1, 0.8, 0.7, 0.3, 0.9])
161
162
# Initialize and evaluate classification metrics
163
logloss = metrics.Logloss()
164
accuracy = metrics.Accuracy()
165
auc = metrics.AUC()
166
167
# Evaluate metrics
168
logloss_value = logloss.eval(y_true, y_pred)
169
accuracy_value = accuracy.eval(y_true, y_pred > 0.5)
170
auc_value = auc.eval(y_true, y_pred)
171
172
print(f"LogLoss: {logloss_value:.4f}")
173
print(f"Accuracy: {accuracy_value:.4f}")
174
print(f"AUC: {auc_value:.4f}")
175
176
# Check optimization direction
177
print(f"LogLoss should be minimized: {logloss.is_min_optimal()}")
178
print(f"AUC should be maximized: {auc.is_max_optimal()}")
179
```
180
181
### Regression Metrics
182
183
```python
184
from catboost import metrics
185
import numpy as np
186
187
# Sample regression data
188
y_true = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
189
y_pred = np.array([1.1, 2.2, 2.8, 4.2, 4.8])
190
191
# Initialize regression metrics
192
rmse = metrics.RMSE()
193
mae = metrics.MAE()
194
r2 = metrics.R2()
195
196
# Evaluate metrics
197
rmse_value = rmse.eval(y_true, y_pred)
198
mae_value = mae.eval(y_true, y_pred)
199
r2_value = r2.eval(y_true, y_pred)
200
201
print(f"RMSE: {rmse_value:.4f}")
202
print(f"MAE: {mae_value:.4f}")
203
print(f"R²: {r2_value:.4f}")
204
205
# Get metric parameters
206
print(f"RMSE parameters: {rmse.params_with_defaults()}")
207
```
208
209
### Ranking Metrics with Groups
210
211
```python
212
from catboost import metrics
213
import numpy as np
214
215
# Sample ranking data
216
y_true = np.array([2, 1, 0, 3, 1, 2]) # Relevance scores
217
y_pred = np.array([0.8, 0.6, 0.3, 0.9, 0.5, 0.7]) # Predictions
218
group_ids = np.array([0, 0, 0, 1, 1, 1]) # Query groups
219
220
# Initialize ranking metrics
221
ndcg = metrics.NDCG()
222
dcg = metrics.DCG()
223
map_metric = metrics.MAP()
224
225
# Evaluate with group information
226
ndcg_value = ndcg.eval(y_true, y_pred, group_id=group_ids)
227
dcg_value = dcg.eval(y_true, y_pred, group_id=group_ids)
228
map_value = map_metric.eval(y_true, y_pred, group_id=group_ids)
229
230
print(f"NDCG: {ndcg_value:.4f}")
231
print(f"DCG: {dcg_value:.4f}")
232
print(f"MAP: {map_value:.4f}")
233
```
234
235
### Weighted Metric Evaluation
236
237
```python
238
from catboost import metrics
239
import numpy as np
240
241
# Data with sample weights
242
y_true = np.array([0, 1, 1, 0, 1])
243
y_pred = np.array([0.1, 0.8, 0.7, 0.3, 0.9])
244
weights = np.array([1.0, 2.0, 1.5, 1.0, 2.5]) # Sample importance
245
246
# Initialize metrics
247
logloss = metrics.Logloss()
248
precision = metrics.Precision()
249
250
# Evaluate with weights
251
weighted_logloss = logloss.eval(y_true, y_pred, weight=weights)
252
weighted_precision = precision.eval(y_true, y_pred > 0.5, weight=weights)
253
254
print(f"Weighted LogLoss: {weighted_logloss:.4f}")
255
print(f"Weighted Precision: {weighted_precision:.4f}")
256
```
257
258
### Custom Metric Configuration
259
260
```python
261
from catboost import metrics
262
263
# Initialize metric with specific parameters
264
# (Parameter availability depends on metric type)
265
auc_metric = metrics.AUC()
266
f1_metric = metrics.F1()
267
268
# Set hints for metric behavior
269
auc_metric.set_hints(skip_train=True)
270
f1_metric.set_hints(use_weights=True)
271
272
# Get string representation with parameters
273
print(f"AUC metric: {auc_metric}")
274
print(f"F1 metric: {f1_metric}")
275
276
# Check available parameters
277
print(f"AUC parameters: {auc_metric.params_with_defaults()}")
278
```
279
280
### Multi-threaded Evaluation
281
282
```python
283
from catboost import metrics
284
import numpy as np
285
286
# Large dataset simulation
287
np.random.seed(42)
288
n_samples = 100000
289
y_true = np.random.randint(0, 2, n_samples)
290
y_pred = np.random.random(n_samples)
291
292
# Initialize metric
293
auc = metrics.AUC()
294
295
# Evaluate with multiple threads for large datasets
296
auc_value = auc.eval(y_true, y_pred, thread_count=4)
297
print(f"AUC (4 threads): {auc_value:.6f}")
298
299
# Compare with single-threaded evaluation
300
auc_single = auc.eval(y_true, y_pred, thread_count=1)
301
print(f"AUC (1 thread): {auc_single:.6f}")
302
```
303
304
## Integration with CatBoost Models
305
306
The metrics framework integrates seamlessly with CatBoost model training and evaluation:
307
308
```python
309
from catboost import CatBoostClassifier, metrics
310
import numpy as np
311
312
# Create model with custom metric
313
model = CatBoostClassifier(
314
iterations=100,
315
eval_metric='AUC', # Use built-in metric name
316
verbose=False
317
)
318
319
# Train model
320
model.fit(X_train, y_train, eval_set=(X_test, y_test))
321
322
# Manual metric evaluation
323
auc_metric = metrics.AUC()
324
predictions = model.predict_proba(X_test)[:, 1]
325
manual_auc = auc_metric.eval(y_test, predictions)
326
327
print(f"Manual AUC calculation: {manual_auc:.6f}")
328
329
# Compare with model's built-in evaluation
330
model_metrics = model.get_evals_result()
331
print(f"Model's AUC: {model_metrics['validation']['AUC'][-1]:.6f}")
332
```
333
334
## Available Metric Types
335
336
The CatBoost metrics framework provides extensive coverage across machine learning tasks:
337
338
- **Classification**: Logloss, CrossEntropy, Accuracy, Precision, Recall, F1, AUC, MultiClass, and variants
339
- **Regression**: RMSE, MAE, MAPE, R2, MSLE, MedianAbsoluteError, SMAPE, and variants
340
- **Ranking**: NDCG, DCG, MAP, MRR, ERR, and variants with different parameters
341
- **Multi-target**: Specialized metrics for multi-output problems
342
343
Each metric type may have multiple variants with different default parameters, all dynamically generated from the underlying CatBoost implementation.