or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-management.mdgeneric-analysis.mdindex.mdindicators-signals.mdlabel-generation.mdportfolio-analysis.mdrecords-management.mdutilities-config.md

data-management.mddocs/

0

# Data Sources & Management

1

2

Unified data acquisition and management system supporting multiple financial data providers with automatic synchronization, caching, and preprocessing capabilities. The data module provides consistent interfaces for accessing market data from various sources.

3

4

## Capabilities

5

6

### Yahoo Finance Data

7

8

Access to Yahoo Finance historical and real-time market data with automatic caching and data validation.

9

10

```python { .api }

11

class YFData:

12

"""

13

Yahoo Finance data provider with caching and update capabilities.

14

15

Provides access to historical OHLCV data, dividends, stock splits,

16

and basic fundamental data from Yahoo Finance.

17

"""

18

19

@classmethod

20

def download(cls, symbols, start=None, end=None, **kwargs):

21

"""

22

Download historical data from Yahoo Finance.

23

24

Parameters:

25

- symbols: str or list, ticker symbols to download

26

- start: str or datetime, start date (default: 1 year ago)

27

- end: str or datetime, end date (default: today)

28

- period: str, period instead of start/end ('1d', '5d', '1mo', etc.)

29

- interval: str, data interval ('1d', '1h', '5m', etc.)

30

- auto_adjust: bool, adjust OHLC for splits/dividends (default: True)

31

- prepost: bool, include pre/post market data (default: False)

32

- threads: bool, use threading for multiple symbols (default: True)

33

34

Returns:

35

YFData: Data instance with downloaded data

36

"""

37

38

def get(self, column=None):

39

"""

40

Get data columns.

41

42

Parameters:

43

- column: str, column name ('Open', 'High', 'Low', 'Close', 'Volume')

44

45

Returns:

46

pd.DataFrame or pd.Series: Requested data

47

"""

48

49

def update(self, **kwargs):

50

"""

51

Update data with latest available data.

52

53

Returns:

54

YFData: Updated data instance

55

"""

56

57

def save(self, path):

58

"""Save data to file."""

59

60

@classmethod

61

def load(cls, path):

62

"""Load data from file."""

63

```

64

65

### Binance Data

66

67

Access to Binance cryptocurrency exchange data including spot and futures markets.

68

69

```python { .api }

70

class BinanceData:

71

"""

72

Binance exchange data provider for cryptocurrency markets.

73

74

Supports spot and futures data with various intervals and

75

comprehensive symbol coverage.

76

"""

77

78

@classmethod

79

def download(cls, symbols, start=None, end=None, **kwargs):

80

"""

81

Download data from Binance.

82

83

Parameters:

84

- symbols: str or list, trading pairs (e.g., 'BTCUSDT')

85

- start: str or datetime, start date

86

- end: str or datetime, end date

87

- interval: str, kline interval ('1m', '5m', '1h', '1d', etc.)

88

- market: str, market type ('spot', 'futures')

89

90

Returns:

91

BinanceData: Data instance with downloaded data

92

"""

93

94

def get(self, column=None):

95

"""Get data columns."""

96

97

def update(self, **kwargs):

98

"""Update with latest data."""

99

```

100

101

### CCXT Exchange Data

102

103

Universal cryptocurrency exchange data access through the CCXT library supporting 100+ exchanges.

104

105

```python { .api }

106

class CCXTData:

107

"""

108

Universal cryptocurrency exchange data via CCXT library.

109

110

Provides unified access to data from 100+ cryptocurrency exchanges

111

with consistent interface and automatic rate limiting.

112

"""

113

114

@classmethod

115

def download(cls, symbols, start=None, end=None, exchange='binance', **kwargs):

116

"""

117

Download data from CCXT-supported exchange.

118

119

Parameters:

120

- symbols: str or list, trading pairs

121

- start: str or datetime, start date

122

- end: str or datetime, end date

123

- exchange: str, exchange name (e.g., 'binance', 'coinbase')

124

- timeframe: str, timeframe ('1m', '5m', '1h', '1d', etc.)

125

126

Returns:

127

CCXTData: Data instance with exchange data

128

"""

129

130

def get_exchanges(self):

131

"""Get list of supported exchanges."""

132

133

def get_symbols(self, exchange):

134

"""Get available symbols for exchange."""

135

```

136

137

### Alpaca Data

138

139

Access to Alpaca trading API for US equities and ETFs with commission-free trading integration.

140

141

```python { .api }

142

class AlpacaData:

143

"""

144

Alpaca trading API data provider.

145

146

Provides access to US equity and ETF data with real-time and

147

historical data capabilities.

148

"""

149

150

@classmethod

151

def download(cls, symbols, start=None, end=None, **kwargs):

152

"""

153

Download data from Alpaca.

154

155

Parameters:

156

- symbols: str or list, US equity symbols

157

- start: str or datetime, start date

158

- end: str or datetime, end date

159

- timeframe: str, bar timeframe ('1Min', '5Min', '1Hour', '1Day')

160

- api_key: str, Alpaca API key

161

- secret_key: str, Alpaca secret key

162

- paper: bool, use paper trading endpoint (default: True)

163

164

Returns:

165

AlpacaData: Data instance with Alpaca data

166

"""

167

```

168

169

### Base Data Classes

170

171

Core data management functionality providing the foundation for all data sources.

172

173

```python { .api }

174

class Data:

175

"""

176

Base data management class.

177

178

Provides common functionality for data storage, manipulation,

179

and preprocessing across all data sources.

180

"""

181

182

def __init__(self, data, **kwargs):

183

"""

184

Initialize data instance.

185

186

Parameters:

187

- data: pd.DataFrame, market data

188

- symbols: list, symbol names

189

- wrapper: ArrayWrapper, data wrapper configuration

190

"""

191

192

def get(self, column=None, **kwargs):

193

"""

194

Get data columns with optional preprocessing.

195

196

Parameters:

197

- column: str or list, column names to retrieve

198

199

Returns:

200

pd.DataFrame or pd.Series: Requested data

201

"""

202

203

def resample(self, freq, **kwargs):

204

"""

205

Resample data to different frequency.

206

207

Parameters:

208

- freq: str, target frequency ('1H', '1D', '1W', etc.)

209

210

Returns:

211

Data: Resampled data instance

212

"""

213

214

def dropna(self, **kwargs):

215

"""Remove missing values."""

216

217

def fillna(self, method='ffill', **kwargs):

218

"""Fill missing values."""

219

220

class DataUpdater:

221

"""

222

Data updating and synchronization utilities.

223

224

Handles incremental data updates, cache management,

225

and data validation across multiple sources.

226

"""

227

228

def __init__(self, data_cls, **kwargs):

229

"""Initialize updater for specific data class."""

230

231

def update(self, **kwargs):

232

"""Update data with latest available."""

233

234

def schedule_update(self, freq, **kwargs):

235

"""Schedule automatic data updates."""

236

```

237

238

### Synthetic Data Generation

239

240

Tools for generating synthetic market data for strategy testing and Monte Carlo simulations.

241

242

```python { .api }

243

class SyntheticData:

244

"""

245

Base class for synthetic data generation.

246

247

Provides framework for creating artificial market data

248

with specified statistical properties.

249

"""

250

251

def generate(self, n_samples, **kwargs):

252

"""

253

Generate synthetic data.

254

255

Parameters:

256

- n_samples: int, number of samples to generate

257

258

Returns:

259

pd.DataFrame: Generated synthetic data

260

"""

261

262

class GBMData:

263

"""

264

Geometric Brownian Motion data generator.

265

266

Generates synthetic price data following GBM process,

267

commonly used for option pricing and Monte Carlo simulations.

268

"""

269

270

@classmethod

271

def generate(cls, n_samples, start_price=100, mu=0.05, sigma=0.2, **kwargs):

272

"""

273

Generate GBM price series.

274

275

Parameters:

276

- n_samples: int, number of time steps

277

- start_price: float, initial price

278

- mu: float, drift rate (annualized)

279

- sigma: float, volatility (annualized)

280

- dt: float, time step (default: 1/252 for daily)

281

- seed: int, random seed for reproducibility

282

283

Returns:

284

pd.Series: Generated price series

285

"""

286

```

287

288

### Utility Functions

289

290

Helper functions for data processing and symbol management.

291

292

```python { .api }

293

def symbol_dict(*args, **kwargs):

294

"""

295

Create symbol dictionary for multi-symbol operations.

296

297

Parameters:

298

- args: symbol specifications

299

- kwargs: symbol name mappings

300

301

Returns:

302

dict: Symbol mapping dictionary

303

"""

304

```

305

306

## Usage Examples

307

308

### Basic Data Download

309

310

```python

311

import vectorbt as vbt

312

313

# Download single symbol

314

data = vbt.YFData.download("AAPL", start="2020-01-01", end="2023-01-01")

315

close = data.get("Close")

316

317

# Download multiple symbols

318

symbols = ["AAPL", "GOOGL", "MSFT"]

319

data = vbt.YFData.download(symbols, period="2y")

320

close = data.get("Close")

321

322

# Access OHLCV data

323

ohlcv = data.get() # All columns

324

volume = data.get("Volume")

325

```

326

327

### Cryptocurrency Data

328

329

```python

330

# Binance spot data

331

btc_data = vbt.BinanceData.download(

332

"BTCUSDT",

333

start="2023-01-01",

334

interval="1h"

335

)

336

337

# Multiple exchanges via CCXT

338

exchanges = ["binance", "coinbase", "kraken"]

339

btc_prices = {}

340

341

for exchange in exchanges:

342

data = vbt.CCXTData.download(

343

"BTC/USDT",

344

start="2023-01-01",

345

exchange=exchange,

346

timeframe="1d"

347

)

348

btc_prices[exchange] = data.get("Close")

349

```

350

351

### Data Updates and Caching

352

353

```python

354

# Initial download with caching

355

data = vbt.YFData.download("AAPL", start="2020-01-01")

356

357

# Update with latest data

358

updated_data = data.update()

359

360

# Save and load data

361

data.save("aapl_data.pkl")

362

loaded_data = vbt.YFData.load("aapl_data.pkl")

363

364

# Automatic updates

365

updater = vbt.DataUpdater(vbt.YFData, symbols="AAPL")

366

updater.schedule_update(freq="1H") # Update hourly

367

```

368

369

### Synthetic Data Generation

370

371

```python

372

# Generate GBM price series

373

synthetic_prices = vbt.GBMData.generate(

374

n_samples=252*2, # 2 years daily

375

start_price=100,

376

mu=0.08, # 8% annual drift

377

sigma=0.25, # 25% annual volatility

378

seed=42

379

)

380

381

# Monte Carlo simulation

382

n_simulations = 1000

383

simulations = []

384

385

for i in range(n_simulations):

386

sim = vbt.GBMData.generate(

387

n_samples=252,

388

start_price=100,

389

mu=0.05,

390

sigma=0.2,

391

seed=i

392

)

393

simulations.append(sim)

394

395

# Analyze distribution of outcomes

396

final_prices = [sim.iloc[-1] for sim in simulations]

397

```

398

399

### Multi-Source Data Pipeline

400

401

```python

402

# Create unified data pipeline

403

class MultiSourceData:

404

def __init__(self):

405

self.sources = {

406

'stocks': vbt.YFData,

407

'crypto': vbt.BinanceData,

408

'futures': vbt.AlpacaData

409

}

410

411

def download_all(self, symbols_dict, **kwargs):

412

data = {}

413

for source, symbols in symbols_dict.items():

414

if source in self.sources:

415

data[source] = self.sources[source].download(symbols, **kwargs)

416

return data

417

418

# Usage

419

pipeline = MultiSourceData()

420

all_data = pipeline.download_all({

421

'stocks': ['AAPL', 'GOOGL'],

422

'crypto': ['BTCUSDT'],

423

'futures': ['ES']

424

})

425

```