0
# Data Sources & Management
1
2
Unified data acquisition and management system supporting multiple financial data providers with automatic synchronization, caching, and preprocessing capabilities. The data module provides consistent interfaces for accessing market data from various sources.
3
4
## Capabilities
5
6
### Yahoo Finance Data
7
8
Access to Yahoo Finance historical and real-time market data with automatic caching and data validation.
9
10
```python { .api }
11
class YFData:
12
"""
13
Yahoo Finance data provider with caching and update capabilities.
14
15
Provides access to historical OHLCV data, dividends, stock splits,
16
and basic fundamental data from Yahoo Finance.
17
"""
18
19
@classmethod
20
def download(cls, symbols, start=None, end=None, **kwargs):
21
"""
22
Download historical data from Yahoo Finance.
23
24
Parameters:
25
- symbols: str or list, ticker symbols to download
26
- start: str or datetime, start date (default: 1 year ago)
27
- end: str or datetime, end date (default: today)
28
- period: str, period instead of start/end ('1d', '5d', '1mo', etc.)
29
- interval: str, data interval ('1d', '1h', '5m', etc.)
30
- auto_adjust: bool, adjust OHLC for splits/dividends (default: True)
31
- prepost: bool, include pre/post market data (default: False)
32
- threads: bool, use threading for multiple symbols (default: True)
33
34
Returns:
35
YFData: Data instance with downloaded data
36
"""
37
38
def get(self, column=None):
39
"""
40
Get data columns.
41
42
Parameters:
43
- column: str, column name ('Open', 'High', 'Low', 'Close', 'Volume')
44
45
Returns:
46
pd.DataFrame or pd.Series: Requested data
47
"""
48
49
def update(self, **kwargs):
50
"""
51
Update data with latest available data.
52
53
Returns:
54
YFData: Updated data instance
55
"""
56
57
def save(self, path):
58
"""Save data to file."""
59
60
@classmethod
61
def load(cls, path):
62
"""Load data from file."""
63
```
64
65
### Binance Data
66
67
Access to Binance cryptocurrency exchange data including spot and futures markets.
68
69
```python { .api }
70
class BinanceData:
71
"""
72
Binance exchange data provider for cryptocurrency markets.
73
74
Supports spot and futures data with various intervals and
75
comprehensive symbol coverage.
76
"""
77
78
@classmethod
79
def download(cls, symbols, start=None, end=None, **kwargs):
80
"""
81
Download data from Binance.
82
83
Parameters:
84
- symbols: str or list, trading pairs (e.g., 'BTCUSDT')
85
- start: str or datetime, start date
86
- end: str or datetime, end date
87
- interval: str, kline interval ('1m', '5m', '1h', '1d', etc.)
88
- market: str, market type ('spot', 'futures')
89
90
Returns:
91
BinanceData: Data instance with downloaded data
92
"""
93
94
def get(self, column=None):
95
"""Get data columns."""
96
97
def update(self, **kwargs):
98
"""Update with latest data."""
99
```
100
101
### CCXT Exchange Data
102
103
Universal cryptocurrency exchange data access through the CCXT library supporting 100+ exchanges.
104
105
```python { .api }
106
class CCXTData:
107
"""
108
Universal cryptocurrency exchange data via CCXT library.
109
110
Provides unified access to data from 100+ cryptocurrency exchanges
111
with consistent interface and automatic rate limiting.
112
"""
113
114
@classmethod
115
def download(cls, symbols, start=None, end=None, exchange='binance', **kwargs):
116
"""
117
Download data from CCXT-supported exchange.
118
119
Parameters:
120
- symbols: str or list, trading pairs
121
- start: str or datetime, start date
122
- end: str or datetime, end date
123
- exchange: str, exchange name (e.g., 'binance', 'coinbase')
124
- timeframe: str, timeframe ('1m', '5m', '1h', '1d', etc.)
125
126
Returns:
127
CCXTData: Data instance with exchange data
128
"""
129
130
def get_exchanges(self):
131
"""Get list of supported exchanges."""
132
133
def get_symbols(self, exchange):
134
"""Get available symbols for exchange."""
135
```
136
137
### Alpaca Data
138
139
Access to Alpaca trading API for US equities and ETFs with commission-free trading integration.
140
141
```python { .api }
142
class AlpacaData:
143
"""
144
Alpaca trading API data provider.
145
146
Provides access to US equity and ETF data with real-time and
147
historical data capabilities.
148
"""
149
150
@classmethod
151
def download(cls, symbols, start=None, end=None, **kwargs):
152
"""
153
Download data from Alpaca.
154
155
Parameters:
156
- symbols: str or list, US equity symbols
157
- start: str or datetime, start date
158
- end: str or datetime, end date
159
- timeframe: str, bar timeframe ('1Min', '5Min', '1Hour', '1Day')
160
- api_key: str, Alpaca API key
161
- secret_key: str, Alpaca secret key
162
- paper: bool, use paper trading endpoint (default: True)
163
164
Returns:
165
AlpacaData: Data instance with Alpaca data
166
"""
167
```
168
169
### Base Data Classes
170
171
Core data management functionality providing the foundation for all data sources.
172
173
```python { .api }
174
class Data:
175
"""
176
Base data management class.
177
178
Provides common functionality for data storage, manipulation,
179
and preprocessing across all data sources.
180
"""
181
182
def __init__(self, data, **kwargs):
183
"""
184
Initialize data instance.
185
186
Parameters:
187
- data: pd.DataFrame, market data
188
- symbols: list, symbol names
189
- wrapper: ArrayWrapper, data wrapper configuration
190
"""
191
192
def get(self, column=None, **kwargs):
193
"""
194
Get data columns with optional preprocessing.
195
196
Parameters:
197
- column: str or list, column names to retrieve
198
199
Returns:
200
pd.DataFrame or pd.Series: Requested data
201
"""
202
203
def resample(self, freq, **kwargs):
204
"""
205
Resample data to different frequency.
206
207
Parameters:
208
- freq: str, target frequency ('1H', '1D', '1W', etc.)
209
210
Returns:
211
Data: Resampled data instance
212
"""
213
214
def dropna(self, **kwargs):
215
"""Remove missing values."""
216
217
def fillna(self, method='ffill', **kwargs):
218
"""Fill missing values."""
219
220
class DataUpdater:
221
"""
222
Data updating and synchronization utilities.
223
224
Handles incremental data updates, cache management,
225
and data validation across multiple sources.
226
"""
227
228
def __init__(self, data_cls, **kwargs):
229
"""Initialize updater for specific data class."""
230
231
def update(self, **kwargs):
232
"""Update data with latest available."""
233
234
def schedule_update(self, freq, **kwargs):
235
"""Schedule automatic data updates."""
236
```
237
238
### Synthetic Data Generation
239
240
Tools for generating synthetic market data for strategy testing and Monte Carlo simulations.
241
242
```python { .api }
243
class SyntheticData:
244
"""
245
Base class for synthetic data generation.
246
247
Provides framework for creating artificial market data
248
with specified statistical properties.
249
"""
250
251
def generate(self, n_samples, **kwargs):
252
"""
253
Generate synthetic data.
254
255
Parameters:
256
- n_samples: int, number of samples to generate
257
258
Returns:
259
pd.DataFrame: Generated synthetic data
260
"""
261
262
class GBMData:
263
"""
264
Geometric Brownian Motion data generator.
265
266
Generates synthetic price data following GBM process,
267
commonly used for option pricing and Monte Carlo simulations.
268
"""
269
270
@classmethod
271
def generate(cls, n_samples, start_price=100, mu=0.05, sigma=0.2, **kwargs):
272
"""
273
Generate GBM price series.
274
275
Parameters:
276
- n_samples: int, number of time steps
277
- start_price: float, initial price
278
- mu: float, drift rate (annualized)
279
- sigma: float, volatility (annualized)
280
- dt: float, time step (default: 1/252 for daily)
281
- seed: int, random seed for reproducibility
282
283
Returns:
284
pd.Series: Generated price series
285
"""
286
```
287
288
### Utility Functions
289
290
Helper functions for data processing and symbol management.
291
292
```python { .api }
293
def symbol_dict(*args, **kwargs):
294
"""
295
Create symbol dictionary for multi-symbol operations.
296
297
Parameters:
298
- args: symbol specifications
299
- kwargs: symbol name mappings
300
301
Returns:
302
dict: Symbol mapping dictionary
303
"""
304
```
305
306
## Usage Examples
307
308
### Basic Data Download
309
310
```python
311
import vectorbt as vbt
312
313
# Download single symbol
314
data = vbt.YFData.download("AAPL", start="2020-01-01", end="2023-01-01")
315
close = data.get("Close")
316
317
# Download multiple symbols
318
symbols = ["AAPL", "GOOGL", "MSFT"]
319
data = vbt.YFData.download(symbols, period="2y")
320
close = data.get("Close")
321
322
# Access OHLCV data
323
ohlcv = data.get() # All columns
324
volume = data.get("Volume")
325
```
326
327
### Cryptocurrency Data
328
329
```python
330
# Binance spot data
331
btc_data = vbt.BinanceData.download(
332
"BTCUSDT",
333
start="2023-01-01",
334
interval="1h"
335
)
336
337
# Multiple exchanges via CCXT
338
exchanges = ["binance", "coinbase", "kraken"]
339
btc_prices = {}
340
341
for exchange in exchanges:
342
data = vbt.CCXTData.download(
343
"BTC/USDT",
344
start="2023-01-01",
345
exchange=exchange,
346
timeframe="1d"
347
)
348
btc_prices[exchange] = data.get("Close")
349
```
350
351
### Data Updates and Caching
352
353
```python
354
# Initial download with caching
355
data = vbt.YFData.download("AAPL", start="2020-01-01")
356
357
# Update with latest data
358
updated_data = data.update()
359
360
# Save and load data
361
data.save("aapl_data.pkl")
362
loaded_data = vbt.YFData.load("aapl_data.pkl")
363
364
# Automatic updates
365
updater = vbt.DataUpdater(vbt.YFData, symbols="AAPL")
366
updater.schedule_update(freq="1H") # Update hourly
367
```
368
369
### Synthetic Data Generation
370
371
```python
372
# Generate GBM price series
373
synthetic_prices = vbt.GBMData.generate(
374
n_samples=252*2, # 2 years daily
375
start_price=100,
376
mu=0.08, # 8% annual drift
377
sigma=0.25, # 25% annual volatility
378
seed=42
379
)
380
381
# Monte Carlo simulation
382
n_simulations = 1000
383
simulations = []
384
385
for i in range(n_simulations):
386
sim = vbt.GBMData.generate(
387
n_samples=252,
388
start_price=100,
389
mu=0.05,
390
sigma=0.2,
391
seed=i
392
)
393
simulations.append(sim)
394
395
# Analyze distribution of outcomes
396
final_prices = [sim.iloc[-1] for sim in simulations]
397
```
398
399
### Multi-Source Data Pipeline
400
401
```python
402
# Create unified data pipeline
403
class MultiSourceData:
404
def __init__(self):
405
self.sources = {
406
'stocks': vbt.YFData,
407
'crypto': vbt.BinanceData,
408
'futures': vbt.AlpacaData
409
}
410
411
def download_all(self, symbols_dict, **kwargs):
412
data = {}
413
for source, symbols in symbols_dict.items():
414
if source in self.sources:
415
data[source] = self.sources[source].download(symbols, **kwargs)
416
return data
417
418
# Usage
419
pipeline = MultiSourceData()
420
all_data = pipeline.download_all({
421
'stocks': ['AAPL', 'GOOGL'],
422
'crypto': ['BTCUSDT'],
423
'futures': ['ES']
424
})
425
```