or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-management.mdgeneric-analysis.mdindex.mdindicators-signals.mdlabel-generation.mdportfolio-analysis.mdrecords-management.mdutilities-config.md

label-generation.mddocs/

0

# Label Generation for Machine Learning

1

2

Look-ahead analysis tools for generating labels from future price movements, enabling machine learning model training on financial time series data. The labels module provides various methods to create target variables for supervised learning applications in quantitative finance.

3

4

## Capabilities

5

6

### Future Statistical Measures

7

8

Generators for statistical measures computed over future time windows, commonly used for regression and forecasting tasks.

9

10

```python { .api }

11

class FMEAN:

12

"""

13

Future mean label generator.

14

15

Calculates the mean of future values over a specified window,

16

useful for predicting future average prices or returns.

17

"""

18

19

@classmethod

20

def run(cls, close, window, **kwargs):

21

"""

22

Calculate future mean labels.

23

24

Parameters:

25

- close: pd.Series or pd.DataFrame, price data

26

- window: int, forward-looking window size

27

- pct_change: bool, use percentage change (default: False)

28

29

Returns:

30

FMEAN: Label generator with fmean attribute

31

"""

32

33

class FSTD:

34

"""

35

Future standard deviation label generator.

36

37

Calculates the standard deviation of future values over a window,

38

useful for volatility prediction and risk modeling.

39

"""

40

41

@classmethod

42

def run(cls, close, window, **kwargs):

43

"""

44

Calculate future standard deviation labels.

45

46

Parameters:

47

- close: pd.Series or pd.DataFrame, price data

48

- window: int, forward-looking window size

49

- pct_change: bool, use percentage change (default: False)

50

- ddof: int, degrees of freedom (default: 1)

51

52

Returns:

53

FSTD: Label generator with fstd attribute

54

"""

55

56

class FMIN:

57

"""

58

Future minimum label generator.

59

60

Finds the minimum value over future time windows,

61

useful for support level prediction and drawdown analysis.

62

"""

63

64

@classmethod

65

def run(cls, close, window, **kwargs):

66

"""

67

Calculate future minimum labels.

68

69

Parameters:

70

- close: pd.Series or pd.DataFrame, price data

71

- window: int, forward-looking window size

72

- pct_change: bool, use percentage change from current (default: False)

73

74

Returns:

75

FMIN: Label generator with fmin attribute

76

"""

77

78

class FMAX:

79

"""

80

Future maximum label generator.

81

82

Finds the maximum value over future time windows,

83

useful for resistance level prediction and profit target analysis.

84

"""

85

86

@classmethod

87

def run(cls, close, window, **kwargs):

88

"""

89

Calculate future maximum labels.

90

91

Parameters:

92

- close: pd.Series or pd.DataFrame, price data

93

- window: int, forward-looking window size

94

- pct_change: bool, use percentage change from current (default: False)

95

96

Returns:

97

FMAX: Label generator with fmax attribute

98

"""

99

```

100

101

### Fixed and Mean-Based Labels

102

103

Simple labeling methods for basic classification and regression tasks.

104

105

```python { .api }

106

class FIXLB:

107

"""

108

Fixed label generator.

109

110

Generates constant labels across all time periods,

111

useful for baseline models and control experiments.

112

"""

113

114

@classmethod

115

def run(cls, shape, value=1, **kwargs):

116

"""

117

Generate fixed labels.

118

119

Parameters:

120

- shape: tuple, output shape (n_rows, n_cols)

121

- value: scalar, fixed label value

122

- dtype: data type for labels

123

124

Returns:

125

FIXLB: Label generator with fixed labels

126

"""

127

128

class MEANLB:

129

"""

130

Mean-based label generator.

131

132

Generates labels based on deviations from mean values,

133

useful for mean reversion strategies and anomaly detection.

134

"""

135

136

@classmethod

137

def run(cls, close, window, threshold=0, **kwargs):

138

"""

139

Generate mean-based labels.

140

141

Parameters:

142

- close: pd.Series or pd.DataFrame, price data

143

- window: int, rolling window for mean calculation

144

- threshold: float, threshold for label generation

145

- above: bool, label when above mean (default: True)

146

147

Returns:

148

MEANLB: Label generator with mean-based labels

149

"""

150

```

151

152

### Lexicographic and Ranking Labels

153

154

Advanced labeling methods for ranking and relative performance analysis.

155

156

```python { .api }

157

class LEXLB:

158

"""

159

Lexicographic label generator.

160

161

Generates labels based on lexicographic ordering of multiple criteria,

162

useful for multi-objective optimization and ranking problems.

163

"""

164

165

@classmethod

166

def run(cls, *args, **kwargs):

167

"""

168

Generate lexicographic labels.

169

170

Parameters:

171

- args: sequence of arrays for lexicographic comparison

172

- descending: bool, use descending order (default: False)

173

174

Returns:

175

LEXLB: Label generator with lexicographic rankings

176

"""

177

```

178

179

### Trend-Based Labels

180

181

Sophisticated trend analysis and classification for directional predictions.

182

183

```python { .api }

184

class TRENDLB:

185

"""

186

Trend-based label generator.

187

188

Analyzes price trends over various time horizons and generates

189

labels for trend direction, strength, and continuation patterns.

190

"""

191

192

@classmethod

193

def run(cls, close, window=20, mode='binary', **kwargs):

194

"""

195

Generate trend-based labels.

196

197

Parameters:

198

- close: pd.Series or pd.DataFrame, price data

199

- window: int, trend analysis window

200

- mode: str, trend mode (see TrendMode enum)

201

- min_pct_change: float, minimum change for trend (default: 0.01)

202

- smooth_window: int, smoothing window for trend (default: None)

203

204

Returns:

205

TRENDLB: Label generator with trend labels

206

"""

207

208

class TrendMode(IntEnum):

209

"""

210

Trend calculation modes for TRENDLB.

211

212

Defines different methods for calculating and categorizing trends

213

in financial time series data.

214

"""

215

Binary = 0 # Simple up/down binary classification

216

BinaryCont = 1 # Binary with continuation signals

217

BinaryContSat = 2 # Binary with continuation and saturation

218

PctChange = 3 # Percentage change-based trends

219

PctChangeNorm = 4 # Normalized percentage change trends

220

```

221

222

### Binary Outcome Labels

223

224

Specialized generators for binary classification tasks in trading applications.

225

226

```python { .api }

227

class BOLB:

228

"""

229

Binary outcome label generator.

230

231

Generates binary labels for classification tasks such as

232

profitable/unprofitable trades or directional movements.

233

"""

234

235

@classmethod

236

def run(cls, close, window, threshold=0, **kwargs):

237

"""

238

Generate binary outcome labels.

239

240

Parameters:

241

- close: pd.Series or pd.DataFrame, price data

242

- window: int, forward-looking window for outcome

243

- threshold: float, threshold for binary classification

244

- return_type: str, type of return calculation ('simple', 'log')

245

- min_periods: int, minimum periods for valid calculation

246

247

Returns:

248

BOLB: Label generator with binary outcome labels

249

"""

250

```

251

252

## Usage Examples

253

254

### Basic Future Labels

255

256

```python

257

import vectorbt as vbt

258

import pandas as pd

259

260

# Download data

261

data = vbt.YFData.download("AAPL", start="2020-01-01", end="2023-01-01")

262

close = data.get("Close")

263

264

# Generate future statistical labels

265

future_mean = vbt.FMEAN.run(close, window=5)

266

future_std = vbt.FSTD.run(close, window=10)

267

future_min = vbt.FMIN.run(close, window=20, pct_change=True)

268

future_max = vbt.FMAX.run(close, window=20, pct_change=True)

269

270

# Access label values

271

mean_labels = future_mean.fmean

272

std_labels = future_std.fstd

273

min_labels = future_min.fmin # Future minimum % change

274

max_labels = future_max.fmax # Future maximum % change

275

```

276

277

### Trend Analysis Labels

278

279

```python

280

# Generate trend-based labels with different modes

281

trend_binary = vbt.TRENDLB.run(

282

close,

283

window=20,

284

mode='binary'

285

)

286

287

trend_pct = vbt.TRENDLB.run(

288

close,

289

window=20,

290

mode='pct_change',

291

min_pct_change=0.02 # 2% minimum change

292

)

293

294

trend_smooth = vbt.TRENDLB.run(

295

close,

296

window=20,

297

mode='binary_cont',

298

smooth_window=5

299

)

300

301

# Access trend labels

302

binary_trends = trend_binary.trend

303

pct_trends = trend_pct.trend

304

smooth_trends = trend_smooth.trend

305

```

306

307

### Classification Labels for ML

308

309

```python

310

# Binary outcome labels for profitable trades

311

profitable_trades = vbt.BOLB.run(

312

close,

313

window=10, # 10-day forward window

314

threshold=0.05, # 5% profit threshold

315

return_type='simple'

316

)

317

318

# Mean reversion labels

319

mean_reversion = vbt.MEANLB.run(

320

close,

321

window=20, # 20-day rolling mean

322

threshold=0.02, # 2% deviation threshold

323

above=True # Label when above mean

324

)

325

326

# Access binary labels

327

profit_labels = profitable_trades.labels # True for profitable periods

328

reversion_labels = mean_reversion.labels # True when above mean

329

```

330

331

### Multi-Asset Label Generation

332

333

```python

334

# Download multiple assets

335

symbols = ["AAPL", "GOOGL", "MSFT", "TSLA"]

336

data = vbt.YFData.download(symbols, start="2020-01-01", end="2023-01-01")

337

close = data.get("Close")

338

339

# Generate labels for all assets

340

future_returns = {}

341

trend_labels = {}

342

343

for symbol in symbols:

344

# Future return labels

345

future_returns[symbol] = vbt.FMEAN.run(

346

close[symbol],

347

window=5,

348

pct_change=True

349

).fmean

350

351

# Trend labels

352

trend_labels[symbol] = vbt.TRENDLB.run(

353

close[symbol],

354

window=20,

355

mode='binary'

356

).trend

357

358

# Combine into DataFrames

359

future_returns_df = pd.DataFrame(future_returns)

360

trend_labels_df = pd.DataFrame(trend_labels)

361

```

362

363

### Labels for Strategy Development

364

365

```python

366

# Generate labels for different time horizons

367

short_term = vbt.FMAX.run(close, window=5, pct_change=True) # 5-day max return

368

medium_term = vbt.FMAX.run(close, window=20, pct_change=True) # 20-day max return

369

long_term = vbt.FMAX.run(close, window=60, pct_change=True) # 60-day max return

370

371

# Create multi-horizon labels

372

horizon_labels = pd.DataFrame({

373

'short_max': short_term.fmax,

374

'medium_max': medium_term.fmax,

375

'long_max': long_term.fmax

376

})

377

378

# Classification thresholds

379

horizon_labels['short_profitable'] = horizon_labels['short_max'] > 0.03

380

horizon_labels['medium_profitable'] = horizon_labels['medium_max'] > 0.10

381

horizon_labels['long_profitable'] = horizon_labels['long_max'] > 0.25

382

```

383

384

### Advanced ML Pipeline

385

386

```python

387

import numpy as np

388

from sklearn.model_selection import train_test_split

389

from sklearn.ensemble import RandomForestClassifier

390

391

# Generate features (indicators)

392

ma_20 = vbt.MA.run(close, 20).ma

393

ma_50 = vbt.MA.run(close, 50).ma

394

rsi = vbt.RSI.run(close, 14).rsi

395

macd = vbt.MACD.run(close)

396

397

# Create feature matrix

398

features = pd.DataFrame({

399

'ma_ratio': ma_20 / ma_50,

400

'rsi': rsi,

401

'macd': macd.macd,

402

'macd_signal': macd.signal,

403

'returns_5d': close.pct_change(5),

404

'volatility': close.rolling(20).std()

405

})

406

407

# Generate labels

408

target = vbt.BOLB.run(

409

close,

410

window=10,

411

threshold=0.05, # 5% profit in next 10 days

412

return_type='simple'

413

).labels

414

415

# Prepare data for ML

416

X = features.dropna()

417

y = target.reindex(X.index).dropna()

418

419

# Align X and y

420

common_index = X.index.intersection(y.index)

421

X = X.loc[common_index]

422

y = y.loc[common_index]

423

424

# Train-test split

425

X_train, X_test, y_train, y_test = train_test_split(

426

X, y, test_size=0.2, random_state=42

427

)

428

429

# Train model

430

model = RandomForestClassifier(n_estimators=100, random_state=42)

431

model.fit(X_train, y_train)

432

433

# Evaluate

434

train_score = model.score(X_train, y_train)

435

test_score = model.score(X_test, y_test)

436

print(f"Train Score: {train_score:.3f}")

437

print(f"Test Score: {test_score:.3f}")

438

```

439

440

### Custom Label Generators

441

442

```python

443

class CustomVolatilityLabel:

444

"""Custom label for volatility regime classification."""

445

446

@classmethod

447

def run(cls, close, short_window=5, long_window=20, threshold=1.5):

448

# Calculate short and long-term volatility

449

short_vol = close.rolling(short_window).std()

450

long_vol = close.rolling(long_window).std()

451

452

# Volatility ratio

453

vol_ratio = short_vol / long_vol

454

455

# Classify regime

456

labels = pd.Series(0, index=close.index) # Low volatility

457

labels[vol_ratio > threshold] = 1 # High volatility

458

labels[vol_ratio > threshold * 1.5] = 2 # Very high volatility

459

460

return labels

461

462

# Use custom label generator

463

vol_labels = CustomVolatilityLabel.run(close)

464

```