or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-reconciliation.mddata-utilities.mdevaluation.mdindex.mdprobabilistic-methods.mdreconciliation-methods.mdvisualization.md

data-utilities.mddocs/

0

# Data Utilities

1

2

Utilities for creating hierarchical data structures from bottom-level time series data. These functions handle aggregation across multiple dimensions, create summing matrices, and prepare data in the format required by hierarchical reconciliation methods.

3

4

## Capabilities

5

6

### Cross-sectional Aggregation

7

8

Main function for creating hierarchical structures from bottom-level time series by aggregating across categorical dimensions.

9

10

```python { .api }

11

def aggregate(

12

df: Frame,

13

spec: list[list[str]],

14

exog_vars: Optional[dict[str, Union[str, list[str]]]] = None,

15

sparse_s: bool = False,

16

id_col: str = 'unique_id',

17

time_col: str = 'ds',

18

id_time_col: Optional[str] = None,

19

target_cols: Sequence[str] = ('y',)

20

) -> tuple[FrameT, FrameT, dict]:

21

"""

22

Create hierarchical structure from bottom-level time series.

23

24

Parameters:

25

- df: DataFrame with bottom-level time series data

26

Must contain id_col, time_col, and target_cols

27

- spec: list of aggregation specifications

28

Each inner list defines groupings for that level

29

Example: [['A', 'B'], ['C', 'D']] creates two aggregation levels

30

- exog_vars: dict mapping exogenous variable names to aggregation functions

31

Example: {'price': 'mean', 'volume': 'sum'}

32

- sparse_s: bool, whether to return sparse summing matrix for memory efficiency

33

- id_col: str, name of unique identifier column

34

- time_col: str, name of time column

35

- id_time_col: str, temporal hierarchy identifier (for temporal aggregation)

36

- target_cols: tuple of target variable column names

37

38

Returns:

39

- Y_df: DataFrame with hierarchically structured series

40

- S_df: DataFrame representation of summing matrix (or sparse matrix if sparse_s=True)

41

- tags: dict mapping hierarchy level names to series indices

42

"""

43

```

44

45

### Temporal Aggregation

46

47

Function for creating temporal hierarchies by aggregating time series at different frequencies.

48

49

```python { .api }

50

def aggregate_temporal(

51

df: Frame,

52

spec: dict[str, int],

53

exog_vars: Optional[dict[str, Union[str, list[str]]]] = None,

54

sparse_s: bool = False,

55

id_col: str = 'unique_id',

56

time_col: str = 'ds',

57

id_time_col: str = 'temporal_id',

58

target_cols: Sequence[str] = ('y',),

59

aggregation_type: str = 'local'

60

) -> tuple[FrameT, FrameT, dict]:

61

"""

62

Create temporal hierarchy from time series data.

63

64

Parameters:

65

- df: DataFrame with time series data at base frequency

66

- spec: dict mapping temporal level names to aggregation frequencies

67

Example: {'Monthly': 12, 'Quarterly': 4, 'Annual': 1}

68

- exog_vars: dict of exogenous variables and their aggregation functions

69

- sparse_s: bool, return sparse summing matrix

70

- id_col: str, unique identifier column name

71

- time_col: str, time column name

72

- id_time_col: str, temporal hierarchy identifier column name

73

- target_cols: tuple of target variable names

74

- aggregation_type: str, type of temporal aggregation ('local' or 'global')

75

76

Returns:

77

- Y_df: DataFrame with temporal hierarchy

78

- S_df: Temporal summing matrix

79

- tags: dict mapping temporal levels to indices

80

"""

81

```

82

83

### Future Dataframe Creation

84

85

Utility for creating future timestamp dataframes for forecasting.

86

87

```python { .api }

88

def make_future_dataframe(

89

df: Frame,

90

freq: Union[str, int],

91

h: int,

92

id_col: str = 'unique_id',

93

time_col: str = 'ds'

94

) -> FrameT:

95

"""

96

Create dataframe with future timestamps for forecasting.

97

98

Parameters:

99

- df: DataFrame with historical time series data

100

- freq: str, frequency string (e.g., 'D', 'M', 'Q', 'Y')

101

- h: int, forecast horizon (number of periods ahead)

102

- id_col: str, unique identifier column name

103

- time_col: str, time column name

104

105

Returns:

106

DataFrame with future timestamps for each series

107

"""

108

```

109

110

### Cross-Temporal Tags

111

112

Function for generating tags that combine cross-sectional and temporal hierarchies.

113

114

```python { .api }

115

def get_cross_temporal_tags(

116

df: pd.DataFrame,

117

tags_cs: dict,

118

tags_te: dict,

119

sep: str = '//',

120

id_col: str = 'unique_id',

121

id_time_col: str = 'temporal_id',

122

cross_temporal_id_col: str = 'cross_temporal_id'

123

) -> tuple[pd.DataFrame, dict]:

124

"""

125

Generate cross-temporal hierarchy tags.

126

127

Parameters:

128

- df: DataFrame with cross-temporal data

129

- tags_cs: dict with cross-sectional hierarchy tags

130

- tags_te: dict with temporal hierarchy tags

131

- sep: str, separator for combining cross-sectional and temporal identifiers

132

- id_col: str, cross-sectional identifier column

133

- id_time_col: str, temporal identifier column

134

- cross_temporal_id_col: str, combined identifier column name

135

136

Returns:

137

- Updated DataFrame with cross-temporal identifiers

138

- Combined tags dictionary for cross-temporal hierarchy

139

"""

140

```

141

142

### Hierarchy Structure Validation

143

144

Utility function to check if a hierarchy structure is strictly hierarchical.

145

146

```python { .api }

147

def is_strictly_hierarchical(S: pd.DataFrame, tags: dict) -> bool:

148

"""

149

Check if hierarchy structure is strictly hierarchical.

150

151

Parameters:

152

- S: summing matrix DataFrame

153

- tags: hierarchy tags dictionary

154

155

Returns:

156

bool indicating whether structure is strictly hierarchical

157

"""

158

```

159

160

## Usage Examples

161

162

### Basic Cross-sectional Aggregation

163

164

```python

165

import pandas as pd

166

from hierarchicalforecast.utils import aggregate

167

168

# Bottom-level data

169

df = pd.DataFrame({

170

'unique_id': ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'],

171

'ds': pd.date_range('2020-01-01', periods=2, freq='D').tolist() * 4,

172

'y': [100, 110, 200, 220, 150, 160, 180, 190],

173

'category': ['X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y'],

174

'region': ['North', 'North', 'North', 'North', 'South', 'South', 'South', 'South']

175

})

176

177

# Define hierarchy specification

178

spec = [

179

['A', 'B', 'C', 'D'], # Bottom level (no aggregation)

180

['category'], # Aggregate by category

181

['region'], # Aggregate by region

182

]

183

184

# Create hierarchical structure

185

Y_df, S_df, tags = aggregate(df, spec)

186

187

print("Hierarchical series:")

188

print(Y_df.head())

189

print("\nHierarchy tags:")

190

print(tags)

191

```

192

193

### Temporal Aggregation

194

195

```python

196

from hierarchicalforecast.utils import aggregate_temporal

197

198

# Daily data to be aggregated temporally

199

daily_df = pd.DataFrame({

200

'unique_id': ['series1'] * 365,

201

'ds': pd.date_range('2020-01-01', periods=365, freq='D'),

202

'y': np.random.randn(365).cumsum() + 100

203

})

204

205

# Define temporal aggregation specification

206

temporal_spec = {

207

'Daily': 1, # Base frequency

208

'Weekly': 7, # Aggregate every 7 days

209

'Monthly': 30, # Aggregate every 30 days

210

'Quarterly': 90 # Aggregate every 90 days

211

}

212

213

# Create temporal hierarchy

214

Y_temporal, S_temporal, tags_temporal = aggregate_temporal(

215

daily_df,

216

temporal_spec

217

)

218

```

219

220

### Aggregation with Exogenous Variables

221

222

```python

223

# Data with exogenous variables

224

df_with_exog = pd.DataFrame({

225

'unique_id': ['A', 'A', 'B', 'B'],

226

'ds': pd.date_range('2020-01-01', periods=2, freq='D').tolist() * 2,

227

'y': [100, 110, 200, 220],

228

'price': [10.5, 10.8, 12.0, 12.3],

229

'volume': [1000, 1100, 2000, 2200]

230

})

231

232

# Specify how to aggregate exogenous variables

233

exog_aggregation = {

234

'price': 'mean', # Average price across aggregated series

235

'volume': 'sum' # Sum volume across aggregated series

236

}

237

238

spec = [['A', 'B']] # Simple aggregation

239

240

Y_df, S_df, tags = aggregate(

241

df_with_exog,

242

spec,

243

exog_vars=exog_aggregation

244

)

245

```

246

247

### Large Hierarchy with Sparse Matrix

248

249

```python

250

# For very large hierarchies, use sparse matrices

251

Y_df_sparse, S_sparse, tags_sparse = aggregate(

252

large_dataset,

253

complex_spec,

254

sparse_s=True # Returns scipy.sparse matrix for S

255

)

256

257

# S_sparse will be a scipy sparse matrix instead of DataFrame

258

print(f"Sparse matrix shape: {S_sparse.shape}")

259

print(f"Non-zero elements: {S_sparse.nnz}")

260

```

261

262

### Creating Future Dataframes

263

264

```python

265

from hierarchicalforecast.utils import make_future_dataframe

266

267

# Create future timestamps for forecasting

268

future_df = make_future_dataframe(

269

df=historical_data,

270

freq='D', # Daily frequency

271

h=30, # 30 days ahead

272

id_col='unique_id',

273

time_col='ds'

274

)

275

276

print("Future timestamps:")

277

print(future_df.head())

278

```

279

280

### Combined Cross-sectional and Temporal Hierarchies

281

282

```python

283

from hierarchicalforecast.utils import get_cross_temporal_tags

284

285

# First create cross-sectional hierarchy

286

Y_cs, S_cs, tags_cs = aggregate(df, cross_sectional_spec)

287

288

# Then create temporal hierarchy

289

Y_te, S_te, tags_te = aggregate_temporal(Y_cs, temporal_spec)

290

291

# Combine them

292

Y_cross_temp, tags_cross_temp = get_cross_temporal_tags(

293

df=Y_te,

294

tags_cs=tags_cs,

295

tags_te=tags_te,

296

sep='//'

297

)

298

```

299

300

### Validation

301

302

```python

303

from hierarchicalforecast.utils import is_strictly_hierarchical

304

305

# Check if hierarchy is strictly hierarchical

306

is_strict = is_strictly_hierarchical(S_df, tags)

307

print(f"Strictly hierarchical: {is_strict}")

308

```

309

310

## Output Utility Functions

311

312

Utility functions for converting prediction intervals and samples to different output formats.

313

314

```python { .api }

315

def level_to_outputs(level: list[int]) -> list[str]:

316

"""

317

Convert confidence levels to output column names.

318

319

Parameters:

320

- level: list of confidence levels (e.g., [80, 95])

321

322

Returns:

323

List of column name strings for low and high bounds

324

"""

325

326

def quantiles_to_outputs(quantiles: list[float]) -> list[str]:

327

"""

328

Convert quantiles to output column names.

329

330

Parameters:

331

- quantiles: list of quantile levels (e.g., [0.1, 0.5, 0.9])

332

333

Returns:

334

List of quantile column name strings

335

"""

336

337

def samples_to_quantiles_df(

338

samples: np.ndarray,

339

unique_ids: list,

340

dates: list,

341

quantiles: list[float],

342

id_col: str = 'unique_id',

343

time_col: str = 'ds'

344

) -> pd.DataFrame:

345

"""

346

Transform samples array to quantile DataFrame.

347

348

Parameters:

349

- samples: array of forecast samples

350

- unique_ids: list of series identifiers

351

- dates: list of forecast dates

352

- quantiles: list of quantile levels to compute

353

- id_col: identifier column name

354

- time_col: time column name

355

356

Returns:

357

DataFrame with quantile columns

358

"""

359

```