or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-reconciliation.mddata-utilities.mdevaluation.mdindex.mdprobabilistic-methods.mdreconciliation-methods.mdvisualization.md

evaluation.mddocs/

0

# Evaluation

1

2

Comprehensive evaluation framework for measuring hierarchical forecast accuracy across different levels of the hierarchy. The evaluation system integrates with utilsforecast.losses metrics and provides specialized functionality for hierarchical forecasting assessment.

3

4

## Capabilities

5

6

### Main Evaluation Function

7

8

Primary function for evaluating hierarchical forecasts using standard accuracy metrics from utilsforecast library.

9

10

```python { .api }

11

def evaluate(

12

df: FrameT,

13

metrics: list[Callable],

14

tags: dict[str, np.ndarray],

15

models: Optional[list[str]] = None,

16

train_df: Optional[FrameT] = None,

17

level: Optional[list[int]] = None,

18

id_col: str = 'unique_id',

19

time_col: str = 'ds',

20

target_col: str = 'y',

21

agg_fn: Optional[str] = 'mean',

22

benchmark: Optional[str] = None

23

) -> FrameT:

24

"""

25

Evaluate hierarchical forecasts using specified metrics.

26

27

Parameters:

28

- df: DataFrame with actual values and forecasts

29

Must contain id_col, time_col, target_col, and model prediction columns

30

- metrics: list of callable metric functions from utilsforecast.losses

31

Examples: [mse, mae, mape, smape, rmse]

32

- tags: dict mapping hierarchy levels to series indices

33

Format: {'level_name': array_of_indices}

34

- models: list of model names to evaluate (if None, evaluates all model columns)

35

- train_df: DataFrame with training data (required for some metrics like msse)

36

- level: list of confidence levels for probabilistic metrics (e.g., [80, 95])

37

- id_col: str, name of the unique identifier column

38

- time_col: str, name of the time column

39

- target_col: str, name of the target variable column

40

- agg_fn: str, aggregation function for combining scores ('mean', 'median', 'sum')

41

- benchmark: str, name of benchmark model for scaled metrics

42

43

Returns:

44

DataFrame with evaluation results by hierarchy level and model

45

Columns: ['unique_id', 'metric', <model_names>...]

46

"""

47

```

48

49

## Usage Examples

50

51

### Basic Evaluation

52

53

```python

54

import pandas as pd

55

from hierarchicalforecast.evaluation import evaluate

56

from utilsforecast.losses import mse, mae, mape

57

58

# Prepare evaluation data with actuals and forecasts

59

eval_df = pd.DataFrame({

60

'unique_id': ['A', 'B', 'Total', 'A', 'B', 'Total'],

61

'ds': pd.to_datetime(['2023-01-01', '2023-01-01', '2023-01-01',

62

'2023-01-02', '2023-01-02', '2023-01-02']),

63

'y': [100, 200, 300, 110, 210, 320], # actual values

64

'BottomUp': [95, 205, 300, 105, 215, 320],

65

'MinTrace': [98, 198, 296, 108, 212, 320]

66

})

67

68

# Define hierarchy tags

69

tags = {

70

'Bottom': np.array([0, 1]), # Series A and B

71

'Total': np.array([2]) # Aggregated series

72

}

73

74

# Evaluate with multiple metrics

75

results = evaluate(

76

df=eval_df,

77

metrics=[mse, mae, mape],

78

tags=tags,

79

models=['BottomUp', 'MinTrace']

80

)

81

82

print(results)

83

```

84

85

### Evaluation by Hierarchy Level

86

87

```python

88

# Evaluate performance at different hierarchy levels

89

from utilsforecast.losses import rmse, smape

90

91

results = evaluate(

92

df=forecast_results,

93

metrics=[rmse, smape],

94

tags=hierarchy_tags,

95

models=['BottomUp', 'TopDown', 'MinTrace'],

96

agg_fn='mean'

97

)

98

99

# Results will show performance for each hierarchy level

100

# Example output:

101

# unique_id metric BottomUp TopDown MinTrace

102

# Bottom rmse 12.5 15.2 11.8

103

# Middle rmse 8.9 9.1 8.7

104

# Total rmse 5.2 5.8 4.9

105

```

106

107

### Probabilistic Evaluation

108

109

```python

110

from utilsforecast.losses import quantile_loss

111

112

# Evaluate prediction intervals

113

prob_results = evaluate(

114

df=forecasts_with_intervals,

115

metrics=[quantile_loss],

116

tags=hierarchy_tags,

117

level=[80, 95], # Confidence levels to evaluate

118

models=['BottomUp', 'MinTrace']

119

)

120

```

121

122

### Scaled Metrics Evaluation

123

124

```python

125

from utilsforecast.losses import msse, mase

126

127

# Use scaled metrics with training data

128

scaled_results = evaluate(

129

df=test_forecasts,

130

metrics=[msse, mase],

131

tags=hierarchy_tags,

132

train_df=training_data, # Required for scaled metrics

133

benchmark='Naive', # Benchmark model for scaling

134

models=['BottomUp', 'TopDown', 'MinTrace']

135

)

136

```

137

138

### Custom Aggregation

139

140

```python

141

# Use different aggregation functions

142

results_median = evaluate(

143

df=eval_df,

144

metrics=[mse, mae],

145

tags=tags,

146

agg_fn='median' # Use median instead of mean

147

)

148

149

results_sum = evaluate(

150

df=eval_df,

151

metrics=[mse],

152

tags=tags,

153

agg_fn='sum' # Sum across hierarchy levels

154

)

155

```

156

157

## Supported Metrics

158

159

The evaluation function works with any metric from utilsforecast.losses. Common metrics include:

160

161

### Point Forecast Metrics

162

163

```python

164

from utilsforecast.losses import (

165

mse, # Mean Squared Error

166

mae, # Mean Absolute Error

167

mape, # Mean Absolute Percentage Error

168

smape, # Symmetric Mean Absolute Percentage Error

169

rmse, # Root Mean Squared Error

170

)

171

```

172

173

### Scaled Metrics

174

175

```python

176

from utilsforecast.losses import (

177

msse, # Mean Scaled Squared Error

178

mase, # Mean Absolute Scaled Error

179

rmsse, # Root Mean Scaled Squared Error

180

)

181

```

182

183

### Probabilistic Metrics

184

185

```python

186

from utilsforecast.losses import (

187

quantile_loss, # Quantile Loss

188

coverage, # Coverage probability

189

mis, # Mean Interval Score

190

)

191

```

192

193

## Integration with HierarchicalReconciliation

194

195

```python

196

from hierarchicalforecast import HierarchicalReconciliation

197

from hierarchicalforecast.methods import BottomUp, MinTrace

198

from hierarchicalforecast.evaluation import evaluate

199

from utilsforecast.losses import mse, mae

200

201

# Generate reconciled forecasts

202

reconcilers = [BottomUp(), MinTrace(method='ols')]

203

hrec = HierarchicalReconciliation(reconcilers=reconcilers)

204

205

reconciled = hrec.reconcile(

206

Y_hat_df=base_forecasts,

207

S=summing_matrix,

208

tags=hierarchy_tags,

209

Y_df=historical_data

210

)

211

212

# Evaluate reconciled forecasts

213

evaluation_results = evaluate(

214

df=reconciled,

215

metrics=[mse, mae],

216

tags=hierarchy_tags,

217

models=['BottomUp', 'MinTrace']

218

)

219

```

220

221

## Deprecated Components

222

223

### Legacy Evaluation Class

224

225

```python { .api }

226

class HierarchicalEvaluation:

227

"""

228

Deprecated: Use the evaluate() function instead.

229

230

Legacy evaluation class that will be removed in future versions.

231

All functionality has been migrated to the evaluate() function.

232

"""

233

# This class is deprecated - use evaluate() function

234

```

235

236

### Deprecated Loss Functions

237

238

The following functions are deprecated and will be removed. Use equivalent functions from utilsforecast.losses instead:

239

240

- `mse()` → use `utilsforecast.losses.mse`

241

- `mqloss()` → use `utilsforecast.losses.quantile_loss`

242

- `rel_mse()` → use custom implementation with utilsforecast metrics

243

- `msse()` → use `utilsforecast.losses.msse`

244

- `scaled_crps()` → use `utilsforecast.losses.scaled_crps`

245

- `energy_score()` → use custom implementation

246

- `log_score()` → use custom implementation