Tessl Tile for pypi/hierarchicalforecast@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-reconciliation.md data-utilities.md evaluation.md index.md probabilistic-methods.md reconciliation-methods.md visualization.md

evaluation.mddocs/

0
# Evaluation
1

2
Comprehensive evaluation framework for measuring hierarchical forecast accuracy across different levels of the hierarchy. The evaluation system integrates with utilsforecast.losses metrics and provides specialized functionality for hierarchical forecasting assessment.
3

4
## Capabilities
5

6
### Main Evaluation Function
7

8
Primary function for evaluating hierarchical forecasts using standard accuracy metrics from utilsforecast library.
9

10
```python { .api }
11
def evaluate(
12
    df: FrameT,
13
    metrics: list[Callable],
14
    tags: dict[str, np.ndarray],
15
    models: Optional[list[str]] = None,
16
    train_df: Optional[FrameT] = None,
17
    level: Optional[list[int]] = None,
18
    id_col: str = 'unique_id',
19
    time_col: str = 'ds',
20
    target_col: str = 'y',
21
    agg_fn: Optional[str] = 'mean',
22
    benchmark: Optional[str] = None
23
) -> FrameT:
24
    """
25
    Evaluate hierarchical forecasts using specified metrics.
26
    
27
    Parameters:
28
    - df: DataFrame with actual values and forecasts
29
        Must contain id_col, time_col, target_col, and model prediction columns
30
    - metrics: list of callable metric functions from utilsforecast.losses
31
        Examples: [mse, mae, mape, smape, rmse]
32
    - tags: dict mapping hierarchy levels to series indices
33
        Format: {'level_name': array_of_indices}
34
    - models: list of model names to evaluate (if None, evaluates all model columns)
35
    - train_df: DataFrame with training data (required for some metrics like msse)
36
    - level: list of confidence levels for probabilistic metrics (e.g., [80, 95])
37
    - id_col: str, name of the unique identifier column
38
    - time_col: str, name of the time column
39
    - target_col: str, name of the target variable column
40
    - agg_fn: str, aggregation function for combining scores ('mean', 'median', 'sum')
41
    - benchmark: str, name of benchmark model for scaled metrics
42
    
43
    Returns:
44
    DataFrame with evaluation results by hierarchy level and model
45
    Columns: ['unique_id', 'metric', <model_names>...]
46
    """
47
```
48

49
## Usage Examples
50

51
### Basic Evaluation
52

53
```python
54
import pandas as pd
55
from hierarchicalforecast.evaluation import evaluate
56
from utilsforecast.losses import mse, mae, mape
57

58
# Prepare evaluation data with actuals and forecasts
59
eval_df = pd.DataFrame({
60
    'unique_id': ['A', 'B', 'Total', 'A', 'B', 'Total'],
61
    'ds': pd.to_datetime(['2023-01-01', '2023-01-01', '2023-01-01', 
62
                         '2023-01-02', '2023-01-02', '2023-01-02']),
63
    'y': [100, 200, 300, 110, 210, 320],  # actual values
64
    'BottomUp': [95, 205, 300, 105, 215, 320],
65
    'MinTrace': [98, 198, 296, 108, 212, 320]
66
})
67

68
# Define hierarchy tags
69
tags = {
70
    'Bottom': np.array([0, 1]),  # Series A and B
71
    'Total': np.array([2])       # Aggregated series
72
}
73

74
# Evaluate with multiple metrics
75
results = evaluate(
76
    df=eval_df,
77
    metrics=[mse, mae, mape],
78
    tags=tags,
79
    models=['BottomUp', 'MinTrace']
80
)
81

82
print(results)
83
```
84

85
### Evaluation by Hierarchy Level
86

87
```python
88
# Evaluate performance at different hierarchy levels
89
from utilsforecast.losses import rmse, smape
90

91
results = evaluate(
92
    df=forecast_results,
93
    metrics=[rmse, smape], 
94
    tags=hierarchy_tags,
95
    models=['BottomUp', 'TopDown', 'MinTrace'],
96
    agg_fn='mean'
97
)
98

99
# Results will show performance for each hierarchy level
100
# Example output:
101
#   unique_id    metric    BottomUp  TopDown  MinTrace
102
#   Bottom       rmse      12.5      15.2     11.8
103
#   Middle       rmse      8.9       9.1      8.7
104
#   Total        rmse      5.2       5.8      4.9
105
```
106

107
### Probabilistic Evaluation
108

109
```python
110
from utilsforecast.losses import quantile_loss
111

112
# Evaluate prediction intervals
113
prob_results = evaluate(
114
    df=forecasts_with_intervals,
115
    metrics=[quantile_loss],
116
    tags=hierarchy_tags,
117
    level=[80, 95],  # Confidence levels to evaluate
118
    models=['BottomUp', 'MinTrace']
119
)
120
```
121

122
### Scaled Metrics Evaluation
123

124
```python
125
from utilsforecast.losses import msse, mase
126

127
# Use scaled metrics with training data
128
scaled_results = evaluate(
129
    df=test_forecasts,
130
    metrics=[msse, mase],
131
    tags=hierarchy_tags,
132
    train_df=training_data,  # Required for scaled metrics
133
    benchmark='Naive',       # Benchmark model for scaling
134
    models=['BottomUp', 'TopDown', 'MinTrace']
135
)
136
```
137

138
### Custom Aggregation
139

140
```python
141
# Use different aggregation functions
142
results_median = evaluate(
143
    df=eval_df,
144
    metrics=[mse, mae],
145
    tags=tags,
146
    agg_fn='median'  # Use median instead of mean
147
)
148

149
results_sum = evaluate(
150
    df=eval_df,
151
    metrics=[mse],
152
    tags=tags,
153
    agg_fn='sum'  # Sum across hierarchy levels
154
)
155
```
156

157
## Supported Metrics
158

159
The evaluation function works with any metric from utilsforecast.losses. Common metrics include:
160

161
### Point Forecast Metrics
162

163
```python
164
from utilsforecast.losses import (
165
    mse,        # Mean Squared Error
166
    mae,        # Mean Absolute Error  
167
    mape,       # Mean Absolute Percentage Error
168
    smape,      # Symmetric Mean Absolute Percentage Error
169
    rmse,       # Root Mean Squared Error
170
)
171
```
172

173
### Scaled Metrics
174

175
```python
176
from utilsforecast.losses import (
177
    msse,       # Mean Scaled Squared Error
178
    mase,       # Mean Absolute Scaled Error
179
    rmsse,      # Root Mean Scaled Squared Error
180
)
181
```
182

183
### Probabilistic Metrics
184

185
```python
186
from utilsforecast.losses import (
187
    quantile_loss,    # Quantile Loss
188
    coverage,         # Coverage probability
189
    mis,             # Mean Interval Score
190
)
191
```
192

193
## Integration with HierarchicalReconciliation
194

195
```python
196
from hierarchicalforecast import HierarchicalReconciliation
197
from hierarchicalforecast.methods import BottomUp, MinTrace
198
from hierarchicalforecast.evaluation import evaluate
199
from utilsforecast.losses import mse, mae
200

201
# Generate reconciled forecasts
202
reconcilers = [BottomUp(), MinTrace(method='ols')]
203
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
204

205
reconciled = hrec.reconcile(
206
    Y_hat_df=base_forecasts,
207
    S=summing_matrix,
208
    tags=hierarchy_tags,
209
    Y_df=historical_data
210
)
211

212
# Evaluate reconciled forecasts
213
evaluation_results = evaluate(
214
    df=reconciled,
215
    metrics=[mse, mae],
216
    tags=hierarchy_tags,
217
    models=['BottomUp', 'MinTrace']
218
)
219
```
220

221
## Deprecated Components
222

223
### Legacy Evaluation Class
224

225
```python { .api }
226
class HierarchicalEvaluation:
227
    """
228
    Deprecated: Use the evaluate() function instead.
229
    
230
    Legacy evaluation class that will be removed in future versions.
231
    All functionality has been migrated to the evaluate() function.
232
    """
233
    # This class is deprecated - use evaluate() function
234
```
235

236
### Deprecated Loss Functions
237

238
The following functions are deprecated and will be removed. Use equivalent functions from utilsforecast.losses instead:
239

240
- `mse()` → use `utilsforecast.losses.mse`
241
- `mqloss()` → use `utilsforecast.losses.quantile_loss`
242
- `rel_mse()` → use custom implementation with utilsforecast metrics
243
- `msse()` → use `utilsforecast.losses.msse`
244
- `scaled_crps()` → use `utilsforecast.losses.scaled_crps`
245
- `energy_score()` → use custom implementation
246
- `log_score()` → use custom implementation

Version

Tile

Files

evaluation.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

evaluation.mddocs/