Tessl Tile for pypi/hierarchicalforecast@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-reconciliation.md data-utilities.md evaluation.md index.md probabilistic-methods.md reconciliation-methods.md visualization.md

data-utilities.mddocs/

0
# Data Utilities
1

2
Utilities for creating hierarchical data structures from bottom-level time series data. These functions handle aggregation across multiple dimensions, create summing matrices, and prepare data in the format required by hierarchical reconciliation methods.
3

4
## Capabilities
5

6
### Cross-sectional Aggregation
7

8
Main function for creating hierarchical structures from bottom-level time series by aggregating across categorical dimensions.
9

10
```python { .api }
11
def aggregate(
12
    df: Frame,
13
    spec: list[list[str]],
14
    exog_vars: Optional[dict[str, Union[str, list[str]]]] = None,
15
    sparse_s: bool = False,
16
    id_col: str = 'unique_id',
17
    time_col: str = 'ds',
18
    id_time_col: Optional[str] = None,
19
    target_cols: Sequence[str] = ('y',)
20
) -> tuple[FrameT, FrameT, dict]:
21
    """
22
    Create hierarchical structure from bottom-level time series.
23
    
24
    Parameters:
25
    - df: DataFrame with bottom-level time series data
26
        Must contain id_col, time_col, and target_cols
27
    - spec: list of aggregation specifications
28
        Each inner list defines groupings for that level
29
        Example: [['A', 'B'], ['C', 'D']] creates two aggregation levels
30
    - exog_vars: dict mapping exogenous variable names to aggregation functions
31
        Example: {'price': 'mean', 'volume': 'sum'}
32
    - sparse_s: bool, whether to return sparse summing matrix for memory efficiency
33
    - id_col: str, name of unique identifier column
34
    - time_col: str, name of time column
35
    - id_time_col: str, temporal hierarchy identifier (for temporal aggregation)
36
    - target_cols: tuple of target variable column names
37
    
38
    Returns:
39
    - Y_df: DataFrame with hierarchically structured series
40
    - S_df: DataFrame representation of summing matrix (or sparse matrix if sparse_s=True)
41
    - tags: dict mapping hierarchy level names to series indices
42
    """
43
```
44

45
### Temporal Aggregation
46

47
Function for creating temporal hierarchies by aggregating time series at different frequencies.
48

49
```python { .api }
50
def aggregate_temporal(
51
    df: Frame,
52
    spec: dict[str, int],
53
    exog_vars: Optional[dict[str, Union[str, list[str]]]] = None,
54
    sparse_s: bool = False,
55
    id_col: str = 'unique_id',
56
    time_col: str = 'ds',
57
    id_time_col: str = 'temporal_id',
58
    target_cols: Sequence[str] = ('y',),
59
    aggregation_type: str = 'local'
60
) -> tuple[FrameT, FrameT, dict]:
61
    """
62
    Create temporal hierarchy from time series data.
63
    
64
    Parameters:
65
    - df: DataFrame with time series data at base frequency
66
    - spec: dict mapping temporal level names to aggregation frequencies
67
        Example: {'Monthly': 12, 'Quarterly': 4, 'Annual': 1}
68
    - exog_vars: dict of exogenous variables and their aggregation functions
69
    - sparse_s: bool, return sparse summing matrix
70
    - id_col: str, unique identifier column name
71
    - time_col: str, time column name
72
    - id_time_col: str, temporal hierarchy identifier column name
73
    - target_cols: tuple of target variable names
74
    - aggregation_type: str, type of temporal aggregation ('local' or 'global')
75
    
76
    Returns:
77
    - Y_df: DataFrame with temporal hierarchy
78
    - S_df: Temporal summing matrix
79
    - tags: dict mapping temporal levels to indices
80
    """
81
```
82

83
### Future Dataframe Creation
84

85
Utility for creating future timestamp dataframes for forecasting.
86

87
```python { .api }
88
def make_future_dataframe(
89
    df: Frame,
90
    freq: Union[str, int],
91
    h: int,
92
    id_col: str = 'unique_id',
93
    time_col: str = 'ds'
94
) -> FrameT:
95
    """
96
    Create dataframe with future timestamps for forecasting.
97
    
98
    Parameters:
99
    - df: DataFrame with historical time series data
100
    - freq: str, frequency string (e.g., 'D', 'M', 'Q', 'Y')
101
    - h: int, forecast horizon (number of periods ahead)
102
    - id_col: str, unique identifier column name
103
    - time_col: str, time column name
104
    
105
    Returns:
106
    DataFrame with future timestamps for each series
107
    """
108
```
109

110
### Cross-Temporal Tags
111

112
Function for generating tags that combine cross-sectional and temporal hierarchies.
113

114
```python { .api }
115
def get_cross_temporal_tags(
116
    df: pd.DataFrame,
117
    tags_cs: dict,
118
    tags_te: dict,
119
    sep: str = '//',
120
    id_col: str = 'unique_id',
121
    id_time_col: str = 'temporal_id',
122
    cross_temporal_id_col: str = 'cross_temporal_id'
123
) -> tuple[pd.DataFrame, dict]:
124
    """
125
    Generate cross-temporal hierarchy tags.
126
    
127
    Parameters:
128
    - df: DataFrame with cross-temporal data
129
    - tags_cs: dict with cross-sectional hierarchy tags
130
    - tags_te: dict with temporal hierarchy tags
131
    - sep: str, separator for combining cross-sectional and temporal identifiers
132
    - id_col: str, cross-sectional identifier column
133
    - id_time_col: str, temporal identifier column
134
    - cross_temporal_id_col: str, combined identifier column name
135
    
136
    Returns:
137
    - Updated DataFrame with cross-temporal identifiers
138
    - Combined tags dictionary for cross-temporal hierarchy
139
    """
140
```
141

142
### Hierarchy Structure Validation
143

144
Utility function to check if a hierarchy structure is strictly hierarchical.
145

146
```python { .api }
147
def is_strictly_hierarchical(S: pd.DataFrame, tags: dict) -> bool:
148
    """
149
    Check if hierarchy structure is strictly hierarchical.
150
    
151
    Parameters:
152
    - S: summing matrix DataFrame
153
    - tags: hierarchy tags dictionary
154
    
155
    Returns:
156
    bool indicating whether structure is strictly hierarchical
157
    """
158
```
159

160
## Usage Examples
161

162
### Basic Cross-sectional Aggregation
163

164
```python
165
import pandas as pd
166
from hierarchicalforecast.utils import aggregate
167

168
# Bottom-level data
169
df = pd.DataFrame({
170
    'unique_id': ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'],
171
    'ds': pd.date_range('2020-01-01', periods=2, freq='D').tolist() * 4,
172
    'y': [100, 110, 200, 220, 150, 160, 180, 190],
173
    'category': ['X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y'],
174
    'region': ['North', 'North', 'North', 'North', 'South', 'South', 'South', 'South']
175
})
176

177
# Define hierarchy specification
178
spec = [
179
    ['A', 'B', 'C', 'D'],      # Bottom level (no aggregation)
180
    ['category'],               # Aggregate by category
181
    ['region'],                 # Aggregate by region
182
]
183

184
# Create hierarchical structure
185
Y_df, S_df, tags = aggregate(df, spec)
186

187
print("Hierarchical series:")
188
print(Y_df.head())
189
print("\nHierarchy tags:")
190
print(tags)
191
```
192

193
### Temporal Aggregation
194

195
```python
196
from hierarchicalforecast.utils import aggregate_temporal
197

198
# Daily data to be aggregated temporally
199
daily_df = pd.DataFrame({
200
    'unique_id': ['series1'] * 365,
201
    'ds': pd.date_range('2020-01-01', periods=365, freq='D'),
202
    'y': np.random.randn(365).cumsum() + 100
203
})
204

205
# Define temporal aggregation specification
206
temporal_spec = {
207
    'Daily': 1,     # Base frequency
208
    'Weekly': 7,    # Aggregate every 7 days
209
    'Monthly': 30,  # Aggregate every 30 days
210
    'Quarterly': 90 # Aggregate every 90 days
211
}
212

213
# Create temporal hierarchy
214
Y_temporal, S_temporal, tags_temporal = aggregate_temporal(
215
    daily_df, 
216
    temporal_spec
217
)
218
```
219

220
### Aggregation with Exogenous Variables
221

222
```python
223
# Data with exogenous variables
224
df_with_exog = pd.DataFrame({
225
    'unique_id': ['A', 'A', 'B', 'B'],
226
    'ds': pd.date_range('2020-01-01', periods=2, freq='D').tolist() * 2,
227
    'y': [100, 110, 200, 220],
228
    'price': [10.5, 10.8, 12.0, 12.3],
229
    'volume': [1000, 1100, 2000, 2200]
230
})
231

232
# Specify how to aggregate exogenous variables
233
exog_aggregation = {
234
    'price': 'mean',    # Average price across aggregated series
235
    'volume': 'sum'     # Sum volume across aggregated series
236
}
237

238
spec = [['A', 'B']]  # Simple aggregation
239

240
Y_df, S_df, tags = aggregate(
241
    df_with_exog, 
242
    spec, 
243
    exog_vars=exog_aggregation
244
)
245
```
246

247
### Large Hierarchy with Sparse Matrix
248

249
```python
250
# For very large hierarchies, use sparse matrices
251
Y_df_sparse, S_sparse, tags_sparse = aggregate(
252
    large_dataset,
253
    complex_spec,
254
    sparse_s=True  # Returns scipy.sparse matrix for S
255
)
256

257
# S_sparse will be a scipy sparse matrix instead of DataFrame
258
print(f"Sparse matrix shape: {S_sparse.shape}")
259
print(f"Non-zero elements: {S_sparse.nnz}")
260
```
261

262
### Creating Future Dataframes
263

264
```python
265
from hierarchicalforecast.utils import make_future_dataframe
266

267
# Create future timestamps for forecasting
268
future_df = make_future_dataframe(
269
    df=historical_data,
270
    freq='D',          # Daily frequency
271
    h=30,              # 30 days ahead
272
    id_col='unique_id',
273
    time_col='ds'
274
)
275

276
print("Future timestamps:")
277
print(future_df.head())
278
```
279

280
### Combined Cross-sectional and Temporal Hierarchies
281

282
```python
283
from hierarchicalforecast.utils import get_cross_temporal_tags
284

285
# First create cross-sectional hierarchy
286
Y_cs, S_cs, tags_cs = aggregate(df, cross_sectional_spec)
287

288
# Then create temporal hierarchy
289
Y_te, S_te, tags_te = aggregate_temporal(Y_cs, temporal_spec)
290

291
# Combine them
292
Y_cross_temp, tags_cross_temp = get_cross_temporal_tags(
293
    df=Y_te,
294
    tags_cs=tags_cs,
295
    tags_te=tags_te,
296
    sep='//'
297
)
298
```
299

300
### Validation
301

302
```python
303
from hierarchicalforecast.utils import is_strictly_hierarchical
304

305
# Check if hierarchy is strictly hierarchical
306
is_strict = is_strictly_hierarchical(S_df, tags)
307
print(f"Strictly hierarchical: {is_strict}")
308
```
309

310
## Output Utility Functions
311

312
Utility functions for converting prediction intervals and samples to different output formats.
313

314
```python { .api }
315
def level_to_outputs(level: list[int]) -> list[str]:
316
    """
317
    Convert confidence levels to output column names.
318
    
319
    Parameters:
320
    - level: list of confidence levels (e.g., [80, 95])
321
    
322
    Returns:
323
    List of column name strings for low and high bounds
324
    """
325

326
def quantiles_to_outputs(quantiles: list[float]) -> list[str]:
327
    """
328
    Convert quantiles to output column names.
329
    
330
    Parameters:
331
    - quantiles: list of quantile levels (e.g., [0.1, 0.5, 0.9])
332
    
333
    Returns:
334
    List of quantile column name strings
335
    """
336

337
def samples_to_quantiles_df(
338
    samples: np.ndarray,
339
    unique_ids: list,
340
    dates: list,
341
    quantiles: list[float],
342
    id_col: str = 'unique_id',
343
    time_col: str = 'ds'
344
) -> pd.DataFrame:
345
    """
346
    Transform samples array to quantile DataFrame.
347
    
348
    Parameters:
349
    - samples: array of forecast samples
350
    - unique_ids: list of series identifiers
351
    - dates: list of forecast dates
352
    - quantiles: list of quantile levels to compute
353
    - id_col: identifier column name
354
    - time_col: time column name
355
    
356
    Returns:
357
    DataFrame with quantile columns
358
    """
359
```

Version

Tile

Files

data-utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-utilities.mddocs/