0
# Label Generation for Machine Learning
1
2
Look-ahead analysis tools for generating labels from future price movements, enabling machine learning model training on financial time series data. The labels module provides various methods to create target variables for supervised learning applications in quantitative finance.
3
4
## Capabilities
5
6
### Future Statistical Measures
7
8
Generators for statistical measures computed over future time windows, commonly used for regression and forecasting tasks.
9
10
```python { .api }
11
class FMEAN:
12
"""
13
Future mean label generator.
14
15
Calculates the mean of future values over a specified window,
16
useful for predicting future average prices or returns.
17
"""
18
19
@classmethod
20
def run(cls, close, window, **kwargs):
21
"""
22
Calculate future mean labels.
23
24
Parameters:
25
- close: pd.Series or pd.DataFrame, price data
26
- window: int, forward-looking window size
27
- pct_change: bool, use percentage change (default: False)
28
29
Returns:
30
FMEAN: Label generator with fmean attribute
31
"""
32
33
class FSTD:
34
"""
35
Future standard deviation label generator.
36
37
Calculates the standard deviation of future values over a window,
38
useful for volatility prediction and risk modeling.
39
"""
40
41
@classmethod
42
def run(cls, close, window, **kwargs):
43
"""
44
Calculate future standard deviation labels.
45
46
Parameters:
47
- close: pd.Series or pd.DataFrame, price data
48
- window: int, forward-looking window size
49
- pct_change: bool, use percentage change (default: False)
50
- ddof: int, degrees of freedom (default: 1)
51
52
Returns:
53
FSTD: Label generator with fstd attribute
54
"""
55
56
class FMIN:
57
"""
58
Future minimum label generator.
59
60
Finds the minimum value over future time windows,
61
useful for support level prediction and drawdown analysis.
62
"""
63
64
@classmethod
65
def run(cls, close, window, **kwargs):
66
"""
67
Calculate future minimum labels.
68
69
Parameters:
70
- close: pd.Series or pd.DataFrame, price data
71
- window: int, forward-looking window size
72
- pct_change: bool, use percentage change from current (default: False)
73
74
Returns:
75
FMIN: Label generator with fmin attribute
76
"""
77
78
class FMAX:
79
"""
80
Future maximum label generator.
81
82
Finds the maximum value over future time windows,
83
useful for resistance level prediction and profit target analysis.
84
"""
85
86
@classmethod
87
def run(cls, close, window, **kwargs):
88
"""
89
Calculate future maximum labels.
90
91
Parameters:
92
- close: pd.Series or pd.DataFrame, price data
93
- window: int, forward-looking window size
94
- pct_change: bool, use percentage change from current (default: False)
95
96
Returns:
97
FMAX: Label generator with fmax attribute
98
"""
99
```
100
101
### Fixed and Mean-Based Labels
102
103
Simple labeling methods for basic classification and regression tasks.
104
105
```python { .api }
106
class FIXLB:
107
"""
108
Fixed label generator.
109
110
Generates constant labels across all time periods,
111
useful for baseline models and control experiments.
112
"""
113
114
@classmethod
115
def run(cls, shape, value=1, **kwargs):
116
"""
117
Generate fixed labels.
118
119
Parameters:
120
- shape: tuple, output shape (n_rows, n_cols)
121
- value: scalar, fixed label value
122
- dtype: data type for labels
123
124
Returns:
125
FIXLB: Label generator with fixed labels
126
"""
127
128
class MEANLB:
129
"""
130
Mean-based label generator.
131
132
Generates labels based on deviations from mean values,
133
useful for mean reversion strategies and anomaly detection.
134
"""
135
136
@classmethod
137
def run(cls, close, window, threshold=0, **kwargs):
138
"""
139
Generate mean-based labels.
140
141
Parameters:
142
- close: pd.Series or pd.DataFrame, price data
143
- window: int, rolling window for mean calculation
144
- threshold: float, threshold for label generation
145
- above: bool, label when above mean (default: True)
146
147
Returns:
148
MEANLB: Label generator with mean-based labels
149
"""
150
```
151
152
### Lexicographic and Ranking Labels
153
154
Advanced labeling methods for ranking and relative performance analysis.
155
156
```python { .api }
157
class LEXLB:
158
"""
159
Lexicographic label generator.
160
161
Generates labels based on lexicographic ordering of multiple criteria,
162
useful for multi-objective optimization and ranking problems.
163
"""
164
165
@classmethod
166
def run(cls, *args, **kwargs):
167
"""
168
Generate lexicographic labels.
169
170
Parameters:
171
- args: sequence of arrays for lexicographic comparison
172
- descending: bool, use descending order (default: False)
173
174
Returns:
175
LEXLB: Label generator with lexicographic rankings
176
"""
177
```
178
179
### Trend-Based Labels
180
181
Sophisticated trend analysis and classification for directional predictions.
182
183
```python { .api }
184
class TRENDLB:
185
"""
186
Trend-based label generator.
187
188
Analyzes price trends over various time horizons and generates
189
labels for trend direction, strength, and continuation patterns.
190
"""
191
192
@classmethod
193
def run(cls, close, window=20, mode='binary', **kwargs):
194
"""
195
Generate trend-based labels.
196
197
Parameters:
198
- close: pd.Series or pd.DataFrame, price data
199
- window: int, trend analysis window
200
- mode: str, trend mode (see TrendMode enum)
201
- min_pct_change: float, minimum change for trend (default: 0.01)
202
- smooth_window: int, smoothing window for trend (default: None)
203
204
Returns:
205
TRENDLB: Label generator with trend labels
206
"""
207
208
class TrendMode(IntEnum):
209
"""
210
Trend calculation modes for TRENDLB.
211
212
Defines different methods for calculating and categorizing trends
213
in financial time series data.
214
"""
215
Binary = 0 # Simple up/down binary classification
216
BinaryCont = 1 # Binary with continuation signals
217
BinaryContSat = 2 # Binary with continuation and saturation
218
PctChange = 3 # Percentage change-based trends
219
PctChangeNorm = 4 # Normalized percentage change trends
220
```
221
222
### Binary Outcome Labels
223
224
Specialized generators for binary classification tasks in trading applications.
225
226
```python { .api }
227
class BOLB:
228
"""
229
Binary outcome label generator.
230
231
Generates binary labels for classification tasks such as
232
profitable/unprofitable trades or directional movements.
233
"""
234
235
@classmethod
236
def run(cls, close, window, threshold=0, **kwargs):
237
"""
238
Generate binary outcome labels.
239
240
Parameters:
241
- close: pd.Series or pd.DataFrame, price data
242
- window: int, forward-looking window for outcome
243
- threshold: float, threshold for binary classification
244
- return_type: str, type of return calculation ('simple', 'log')
245
- min_periods: int, minimum periods for valid calculation
246
247
Returns:
248
BOLB: Label generator with binary outcome labels
249
"""
250
```
251
252
## Usage Examples
253
254
### Basic Future Labels
255
256
```python
257
import vectorbt as vbt
258
import pandas as pd
259
260
# Download data
261
data = vbt.YFData.download("AAPL", start="2020-01-01", end="2023-01-01")
262
close = data.get("Close")
263
264
# Generate future statistical labels
265
future_mean = vbt.FMEAN.run(close, window=5)
266
future_std = vbt.FSTD.run(close, window=10)
267
future_min = vbt.FMIN.run(close, window=20, pct_change=True)
268
future_max = vbt.FMAX.run(close, window=20, pct_change=True)
269
270
# Access label values
271
mean_labels = future_mean.fmean
272
std_labels = future_std.fstd
273
min_labels = future_min.fmin # Future minimum % change
274
max_labels = future_max.fmax # Future maximum % change
275
```
276
277
### Trend Analysis Labels
278
279
```python
280
# Generate trend-based labels with different modes
281
trend_binary = vbt.TRENDLB.run(
282
close,
283
window=20,
284
mode='binary'
285
)
286
287
trend_pct = vbt.TRENDLB.run(
288
close,
289
window=20,
290
mode='pct_change',
291
min_pct_change=0.02 # 2% minimum change
292
)
293
294
trend_smooth = vbt.TRENDLB.run(
295
close,
296
window=20,
297
mode='binary_cont',
298
smooth_window=5
299
)
300
301
# Access trend labels
302
binary_trends = trend_binary.trend
303
pct_trends = trend_pct.trend
304
smooth_trends = trend_smooth.trend
305
```
306
307
### Classification Labels for ML
308
309
```python
310
# Binary outcome labels for profitable trades
311
profitable_trades = vbt.BOLB.run(
312
close,
313
window=10, # 10-day forward window
314
threshold=0.05, # 5% profit threshold
315
return_type='simple'
316
)
317
318
# Mean reversion labels
319
mean_reversion = vbt.MEANLB.run(
320
close,
321
window=20, # 20-day rolling mean
322
threshold=0.02, # 2% deviation threshold
323
above=True # Label when above mean
324
)
325
326
# Access binary labels
327
profit_labels = profitable_trades.labels # True for profitable periods
328
reversion_labels = mean_reversion.labels # True when above mean
329
```
330
331
### Multi-Asset Label Generation
332
333
```python
334
# Download multiple assets
335
symbols = ["AAPL", "GOOGL", "MSFT", "TSLA"]
336
data = vbt.YFData.download(symbols, start="2020-01-01", end="2023-01-01")
337
close = data.get("Close")
338
339
# Generate labels for all assets
340
future_returns = {}
341
trend_labels = {}
342
343
for symbol in symbols:
344
# Future return labels
345
future_returns[symbol] = vbt.FMEAN.run(
346
close[symbol],
347
window=5,
348
pct_change=True
349
).fmean
350
351
# Trend labels
352
trend_labels[symbol] = vbt.TRENDLB.run(
353
close[symbol],
354
window=20,
355
mode='binary'
356
).trend
357
358
# Combine into DataFrames
359
future_returns_df = pd.DataFrame(future_returns)
360
trend_labels_df = pd.DataFrame(trend_labels)
361
```
362
363
### Labels for Strategy Development
364
365
```python
366
# Generate labels for different time horizons
367
short_term = vbt.FMAX.run(close, window=5, pct_change=True) # 5-day max return
368
medium_term = vbt.FMAX.run(close, window=20, pct_change=True) # 20-day max return
369
long_term = vbt.FMAX.run(close, window=60, pct_change=True) # 60-day max return
370
371
# Create multi-horizon labels
372
horizon_labels = pd.DataFrame({
373
'short_max': short_term.fmax,
374
'medium_max': medium_term.fmax,
375
'long_max': long_term.fmax
376
})
377
378
# Classification thresholds
379
horizon_labels['short_profitable'] = horizon_labels['short_max'] > 0.03
380
horizon_labels['medium_profitable'] = horizon_labels['medium_max'] > 0.10
381
horizon_labels['long_profitable'] = horizon_labels['long_max'] > 0.25
382
```
383
384
### Advanced ML Pipeline
385
386
```python
387
import numpy as np
388
from sklearn.model_selection import train_test_split
389
from sklearn.ensemble import RandomForestClassifier
390
391
# Generate features (indicators)
392
ma_20 = vbt.MA.run(close, 20).ma
393
ma_50 = vbt.MA.run(close, 50).ma
394
rsi = vbt.RSI.run(close, 14).rsi
395
macd = vbt.MACD.run(close)
396
397
# Create feature matrix
398
features = pd.DataFrame({
399
'ma_ratio': ma_20 / ma_50,
400
'rsi': rsi,
401
'macd': macd.macd,
402
'macd_signal': macd.signal,
403
'returns_5d': close.pct_change(5),
404
'volatility': close.rolling(20).std()
405
})
406
407
# Generate labels
408
target = vbt.BOLB.run(
409
close,
410
window=10,
411
threshold=0.05, # 5% profit in next 10 days
412
return_type='simple'
413
).labels
414
415
# Prepare data for ML
416
X = features.dropna()
417
y = target.reindex(X.index).dropna()
418
419
# Align X and y
420
common_index = X.index.intersection(y.index)
421
X = X.loc[common_index]
422
y = y.loc[common_index]
423
424
# Train-test split
425
X_train, X_test, y_train, y_test = train_test_split(
426
X, y, test_size=0.2, random_state=42
427
)
428
429
# Train model
430
model = RandomForestClassifier(n_estimators=100, random_state=42)
431
model.fit(X_train, y_train)
432
433
# Evaluate
434
train_score = model.score(X_train, y_train)
435
test_score = model.score(X_test, y_test)
436
print(f"Train Score: {train_score:.3f}")
437
print(f"Test Score: {test_score:.3f}")
438
```
439
440
### Custom Label Generators
441
442
```python
443
class CustomVolatilityLabel:
444
"""Custom label for volatility regime classification."""
445
446
@classmethod
447
def run(cls, close, short_window=5, long_window=20, threshold=1.5):
448
# Calculate short and long-term volatility
449
short_vol = close.rolling(short_window).std()
450
long_vol = close.rolling(long_window).std()
451
452
# Volatility ratio
453
vol_ratio = short_vol / long_vol
454
455
# Classify regime
456
labels = pd.Series(0, index=close.index) # Low volatility
457
labels[vol_ratio > threshold] = 1 # High volatility
458
labels[vol_ratio > threshold * 1.5] = 2 # Very high volatility
459
460
return labels
461
462
# Use custom label generator
463
vol_labels = CustomVolatilityLabel.run(close)
464
```