0
# Evaluation
1
2
Comprehensive evaluation framework for measuring hierarchical forecast accuracy across different levels of the hierarchy. The evaluation system integrates with utilsforecast.losses metrics and provides specialized functionality for hierarchical forecasting assessment.
3
4
## Capabilities
5
6
### Main Evaluation Function
7
8
Primary function for evaluating hierarchical forecasts using standard accuracy metrics from utilsforecast library.
9
10
```python { .api }
11
def evaluate(
12
df: FrameT,
13
metrics: list[Callable],
14
tags: dict[str, np.ndarray],
15
models: Optional[list[str]] = None,
16
train_df: Optional[FrameT] = None,
17
level: Optional[list[int]] = None,
18
id_col: str = 'unique_id',
19
time_col: str = 'ds',
20
target_col: str = 'y',
21
agg_fn: Optional[str] = 'mean',
22
benchmark: Optional[str] = None
23
) -> FrameT:
24
"""
25
Evaluate hierarchical forecasts using specified metrics.
26
27
Parameters:
28
- df: DataFrame with actual values and forecasts
29
Must contain id_col, time_col, target_col, and model prediction columns
30
- metrics: list of callable metric functions from utilsforecast.losses
31
Examples: [mse, mae, mape, smape, rmse]
32
- tags: dict mapping hierarchy levels to series indices
33
Format: {'level_name': array_of_indices}
34
- models: list of model names to evaluate (if None, evaluates all model columns)
35
- train_df: DataFrame with training data (required for some metrics like msse)
36
- level: list of confidence levels for probabilistic metrics (e.g., [80, 95])
37
- id_col: str, name of the unique identifier column
38
- time_col: str, name of the time column
39
- target_col: str, name of the target variable column
40
- agg_fn: str, aggregation function for combining scores ('mean', 'median', 'sum')
41
- benchmark: str, name of benchmark model for scaled metrics
42
43
Returns:
44
DataFrame with evaluation results by hierarchy level and model
45
Columns: ['unique_id', 'metric', <model_names>...]
46
"""
47
```
48
49
## Usage Examples
50
51
### Basic Evaluation
52
53
```python
54
import pandas as pd
55
from hierarchicalforecast.evaluation import evaluate
56
from utilsforecast.losses import mse, mae, mape
57
58
# Prepare evaluation data with actuals and forecasts
59
eval_df = pd.DataFrame({
60
'unique_id': ['A', 'B', 'Total', 'A', 'B', 'Total'],
61
'ds': pd.to_datetime(['2023-01-01', '2023-01-01', '2023-01-01',
62
'2023-01-02', '2023-01-02', '2023-01-02']),
63
'y': [100, 200, 300, 110, 210, 320], # actual values
64
'BottomUp': [95, 205, 300, 105, 215, 320],
65
'MinTrace': [98, 198, 296, 108, 212, 320]
66
})
67
68
# Define hierarchy tags
69
tags = {
70
'Bottom': np.array([0, 1]), # Series A and B
71
'Total': np.array([2]) # Aggregated series
72
}
73
74
# Evaluate with multiple metrics
75
results = evaluate(
76
df=eval_df,
77
metrics=[mse, mae, mape],
78
tags=tags,
79
models=['BottomUp', 'MinTrace']
80
)
81
82
print(results)
83
```
84
85
### Evaluation by Hierarchy Level
86
87
```python
88
# Evaluate performance at different hierarchy levels
89
from utilsforecast.losses import rmse, smape
90
91
results = evaluate(
92
df=forecast_results,
93
metrics=[rmse, smape],
94
tags=hierarchy_tags,
95
models=['BottomUp', 'TopDown', 'MinTrace'],
96
agg_fn='mean'
97
)
98
99
# Results will show performance for each hierarchy level
100
# Example output:
101
# unique_id metric BottomUp TopDown MinTrace
102
# Bottom rmse 12.5 15.2 11.8
103
# Middle rmse 8.9 9.1 8.7
104
# Total rmse 5.2 5.8 4.9
105
```
106
107
### Probabilistic Evaluation
108
109
```python
110
from utilsforecast.losses import quantile_loss
111
112
# Evaluate prediction intervals
113
prob_results = evaluate(
114
df=forecasts_with_intervals,
115
metrics=[quantile_loss],
116
tags=hierarchy_tags,
117
level=[80, 95], # Confidence levels to evaluate
118
models=['BottomUp', 'MinTrace']
119
)
120
```
121
122
### Scaled Metrics Evaluation
123
124
```python
125
from utilsforecast.losses import msse, mase
126
127
# Use scaled metrics with training data
128
scaled_results = evaluate(
129
df=test_forecasts,
130
metrics=[msse, mase],
131
tags=hierarchy_tags,
132
train_df=training_data, # Required for scaled metrics
133
benchmark='Naive', # Benchmark model for scaling
134
models=['BottomUp', 'TopDown', 'MinTrace']
135
)
136
```
137
138
### Custom Aggregation
139
140
```python
141
# Use different aggregation functions
142
results_median = evaluate(
143
df=eval_df,
144
metrics=[mse, mae],
145
tags=tags,
146
agg_fn='median' # Use median instead of mean
147
)
148
149
results_sum = evaluate(
150
df=eval_df,
151
metrics=[mse],
152
tags=tags,
153
agg_fn='sum' # Sum across hierarchy levels
154
)
155
```
156
157
## Supported Metrics
158
159
The evaluation function works with any metric from utilsforecast.losses. Common metrics include:
160
161
### Point Forecast Metrics
162
163
```python
164
from utilsforecast.losses import (
165
mse, # Mean Squared Error
166
mae, # Mean Absolute Error
167
mape, # Mean Absolute Percentage Error
168
smape, # Symmetric Mean Absolute Percentage Error
169
rmse, # Root Mean Squared Error
170
)
171
```
172
173
### Scaled Metrics
174
175
```python
176
from utilsforecast.losses import (
177
msse, # Mean Scaled Squared Error
178
mase, # Mean Absolute Scaled Error
179
rmsse, # Root Mean Scaled Squared Error
180
)
181
```
182
183
### Probabilistic Metrics
184
185
```python
186
from utilsforecast.losses import (
187
quantile_loss, # Quantile Loss
188
coverage, # Coverage probability
189
mis, # Mean Interval Score
190
)
191
```
192
193
## Integration with HierarchicalReconciliation
194
195
```python
196
from hierarchicalforecast import HierarchicalReconciliation
197
from hierarchicalforecast.methods import BottomUp, MinTrace
198
from hierarchicalforecast.evaluation import evaluate
199
from utilsforecast.losses import mse, mae
200
201
# Generate reconciled forecasts
202
reconcilers = [BottomUp(), MinTrace(method='ols')]
203
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
204
205
reconciled = hrec.reconcile(
206
Y_hat_df=base_forecasts,
207
S=summing_matrix,
208
tags=hierarchy_tags,
209
Y_df=historical_data
210
)
211
212
# Evaluate reconciled forecasts
213
evaluation_results = evaluate(
214
df=reconciled,
215
metrics=[mse, mae],
216
tags=hierarchy_tags,
217
models=['BottomUp', 'MinTrace']
218
)
219
```
220
221
## Deprecated Components
222
223
### Legacy Evaluation Class
224
225
```python { .api }
226
class HierarchicalEvaluation:
227
"""
228
Deprecated: Use the evaluate() function instead.
229
230
Legacy evaluation class that will be removed in future versions.
231
All functionality has been migrated to the evaluate() function.
232
"""
233
# This class is deprecated - use evaluate() function
234
```
235
236
### Deprecated Loss Functions
237
238
The following functions are deprecated and will be removed. Use equivalent functions from utilsforecast.losses instead:
239
240
- `mse()` → use `utilsforecast.losses.mse`
241
- `mqloss()` → use `utilsforecast.losses.quantile_loss`
242
- `rel_mse()` → use custom implementation with utilsforecast metrics
243
- `msse()` → use `utilsforecast.losses.msse`
244
- `scaled_crps()` → use `utilsforecast.losses.scaled_crps`
245
- `energy_score()` → use custom implementation
246
- `log_score()` → use custom implementation