0
# Data Utilities
1
2
Data preparation and utility functions for converting between prices and returns, data validation, aggregation, benchmarking preparation, and integration with external data sources for quantitative analysis workflows.
3
4
## Capabilities
5
6
### Data Conversion Functions
7
8
Convert between different data formats commonly used in quantitative finance.
9
10
```python { .api }
11
def to_returns(prices, rf=0.0):
12
"""
13
Convert price series to return series.
14
15
Parameters:
16
- prices: pandas Series of prices
17
- rf: float, risk-free rate to subtract from returns
18
19
Returns:
20
pandas Series: Returns calculated as pct_change()
21
"""
22
23
def to_prices(returns, base=1e5):
24
"""
25
Convert return series to price index.
26
27
Parameters:
28
- returns: pandas Series of returns
29
- base: float, starting value for price index
30
31
Returns:
32
pandas Series: Cumulative price index
33
"""
34
35
def log_returns(returns, rf=0.0, nperiods=None):
36
"""
37
Convert returns to log returns.
38
39
Parameters:
40
- returns: pandas Series of returns
41
- rf: float, risk-free rate
42
- nperiods: int, number of periods for annualization
43
44
Returns:
45
pandas Series: Log returns
46
"""
47
48
def to_log_returns(returns, rf=0.0, nperiods=None):
49
"""
50
Alias for log_returns function.
51
52
Parameters:
53
- returns: pandas Series of returns
54
- rf: float, risk-free rate
55
- nperiods: int, number of periods
56
57
Returns:
58
pandas Series: Log returns
59
"""
60
61
def to_excess_returns(returns, rf, nperiods=None):
62
"""
63
Calculate excess returns above risk-free rate.
64
65
Parameters:
66
- returns: pandas Series of returns
67
- rf: float, risk-free rate
68
- nperiods: int, number of periods for rate conversion
69
70
Returns:
71
pandas Series: Excess returns
72
"""
73
74
def rebase(prices, base=100.0):
75
"""
76
Rebase price series to start at specified value.
77
78
Parameters:
79
- prices: pandas Series of prices
80
- base: float, new base value
81
82
Returns:
83
pandas Series: Rebased price series
84
"""
85
```
86
87
### Data Validation and Preparation
88
89
Ensure data quality and prepare data for analysis.
90
91
```python { .api }
92
def validate_input(data, allow_empty=False):
93
"""
94
Validate input data for QuantStats functions.
95
96
Parameters:
97
- data: pandas Series or DataFrame to validate
98
- allow_empty: bool, whether to allow empty data
99
100
Returns:
101
pandas Series or DataFrame: Validated data
102
103
Raises:
104
DataValidationError: If data validation fails
105
"""
106
107
def _prepare_returns(data, rf=0.0, nperiods=None):
108
"""
109
Internal function to prepare returns data for analysis.
110
111
Parameters:
112
- data: pandas Series of returns or prices
113
- rf: float, risk-free rate
114
- nperiods: int, number of periods
115
116
Returns:
117
pandas Series: Prepared returns data
118
"""
119
120
def _prepare_prices(data, base=1.0):
121
"""
122
Internal function to prepare price data.
123
124
Parameters:
125
- data: pandas Series of prices
126
- base: float, base value for rebasing
127
128
Returns:
129
pandas Series: Prepared price data
130
"""
131
132
def _prepare_benchmark(benchmark=None, period="max", rf=0.0, prepare_returns=True):
133
"""
134
Prepare benchmark data for analysis.
135
136
Parameters:
137
- benchmark: str or pandas Series, benchmark identifier or data
138
- period: str, time period for data retrieval
139
- rf: float, risk-free rate
140
- prepare_returns: bool, whether to prepare returns
141
142
Returns:
143
pandas Series: Prepared benchmark data
144
"""
145
```
146
147
### Data Aggregation and Resampling
148
149
Functions for aggregating returns across different time periods.
150
151
```python { .api }
152
def aggregate_returns(returns, period=None, compounded=True):
153
"""
154
Aggregate returns to specified frequency.
155
156
Parameters:
157
- returns: pandas Series of returns
158
- period: str, aggregation period ('M', 'Q', 'Y', etc.)
159
- compounded: bool, whether to compound returns
160
161
Returns:
162
pandas Series: Aggregated returns
163
"""
164
165
def group_returns(returns, groupby, compounded=False):
166
"""
167
Group returns by specified criteria.
168
169
Parameters:
170
- returns: pandas Series of returns
171
- groupby: str or function, grouping criteria
172
- compounded: bool, whether to compound grouped returns
173
174
Returns:
175
pandas Series: Grouped returns
176
"""
177
178
def multi_shift(df, shift=3):
179
"""
180
Create DataFrame with multiple shifted versions.
181
182
Parameters:
183
- df: pandas DataFrame to shift
184
- shift: int, number of periods to shift
185
186
Returns:
187
pandas DataFrame: DataFrame with original and shifted columns
188
"""
189
```
190
191
### Statistical Utilities
192
193
Helper functions for statistical calculations and data manipulation.
194
195
```python { .api }
196
def exponential_stdev(returns, window=30, is_halflife=False):
197
"""
198
Calculate exponentially weighted standard deviation.
199
200
Parameters:
201
- returns: pandas Series of returns
202
- window: int, window size or halflife
203
- is_halflife: bool, whether window represents halflife
204
205
Returns:
206
pandas Series: Exponentially weighted standard deviation
207
"""
208
209
def _count_consecutive(data):
210
"""
211
Count consecutive occurrences in data.
212
213
Parameters:
214
- data: pandas Series of boolean or numeric data
215
216
Returns:
217
int: Maximum consecutive count
218
"""
219
220
def _round_to_closest(val, res, decimals=None):
221
"""
222
Round value to closest resolution.
223
224
Parameters:
225
- val: float, value to round
226
- res: float, resolution to round to
227
- decimals: int, number of decimal places
228
229
Returns:
230
float: Rounded value
231
"""
232
```
233
234
### Portfolio Construction
235
236
Functions for creating portfolios and indices from return data.
237
238
```python { .api }
239
def make_portfolio(returns, start_balance=1e5, mode="comp", round_to=None):
240
"""
241
Create portfolio value series from returns.
242
243
Parameters:
244
- returns: pandas Series of returns
245
- start_balance: float, starting portfolio value
246
- mode: str, calculation mode ('comp' for compounded)
247
- round_to: int, decimal places to round to
248
249
Returns:
250
pandas Series: Portfolio value over time
251
"""
252
253
def make_index(ticker, **kwargs):
254
"""
255
Create market index from ticker symbol.
256
257
Parameters:
258
- ticker: str, ticker symbol
259
- **kwargs: additional parameters for data retrieval
260
261
Returns:
262
pandas Series: Index price or return data
263
"""
264
```
265
266
### Data Download and External Sources
267
268
Retrieve financial data from external sources.
269
270
```python { .api }
271
def download_returns(ticker, period="max", proxy=None):
272
"""
273
Download return data for specified ticker.
274
275
Parameters:
276
- ticker: str, ticker symbol (e.g., 'SPY', 'AAPL')
277
- period: str, time period ('1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y', '10y', 'ytd', 'max')
278
- proxy: str, proxy server URL (optional)
279
280
Returns:
281
pandas Series: Return series for the ticker
282
"""
283
```
284
285
### Date and Time Utilities
286
287
Functions for working with time-based data filtering and analysis.
288
289
```python { .api }
290
def _mtd(df):
291
"""
292
Filter DataFrame to month-to-date data.
293
294
Parameters:
295
- df: pandas DataFrame or Series with datetime index
296
297
Returns:
298
pandas DataFrame or Series: Month-to-date filtered data
299
"""
300
301
def _qtd(df):
302
"""
303
Filter DataFrame to quarter-to-date data.
304
305
Parameters:
306
- df: pandas DataFrame or Series with datetime index
307
308
Returns:
309
pandas DataFrame or Series: Quarter-to-date filtered data
310
"""
311
312
def _ytd(df):
313
"""
314
Filter DataFrame to year-to-date data.
315
316
Parameters:
317
- df: pandas DataFrame or Series with datetime index
318
319
Returns:
320
pandas DataFrame or Series: Year-to-date filtered data
321
"""
322
323
def _pandas_date(df, dates):
324
"""
325
Filter DataFrame by specific dates.
326
327
Parameters:
328
- df: pandas DataFrame or Series
329
- dates: list or pandas DatetimeIndex of dates to filter
330
331
Returns:
332
pandas DataFrame or Series: Filtered data
333
"""
334
335
def _pandas_current_month(df):
336
"""
337
Filter DataFrame to current month data.
338
339
Parameters:
340
- df: pandas DataFrame or Series with datetime index
341
342
Returns:
343
pandas DataFrame or Series: Current month data
344
"""
345
```
346
347
### Environment and Context Detection
348
349
Utility functions for detecting execution environment and setting up context.
350
351
```python { .api }
352
def _in_notebook(matplotlib_inline=False):
353
"""
354
Detect if running in Jupyter notebook environment.
355
356
Parameters:
357
- matplotlib_inline: bool, whether to enable matplotlib inline mode
358
359
Returns:
360
bool: True if running in notebook, False otherwise
361
"""
362
363
def _file_stream():
364
"""
365
Create file stream context for data operations.
366
367
Returns:
368
file-like object: Stream for file operations
369
"""
370
```
371
372
### Cache Management
373
374
Functions for managing internal data caches to improve performance.
375
376
```python { .api }
377
def _generate_cache_key(data, rf, nperiods):
378
"""
379
Generate cache key for prepared returns data.
380
381
Parameters:
382
- data: pandas Series, input data
383
- rf: float, risk-free rate
384
- nperiods: int, number of periods
385
386
Returns:
387
str: Cache key
388
"""
389
390
def _clear_cache_if_full():
391
"""
392
Clear cache if it exceeds maximum size limit.
393
394
Returns:
395
None
396
"""
397
```
398
399
### Data Formatting and Display
400
401
Functions for formatting data for display and analysis.
402
403
```python { .api }
404
def _score_str(val):
405
"""
406
Format score value as string with appropriate precision.
407
408
Parameters:
409
- val: float, score value to format
410
411
Returns:
412
str: Formatted score string
413
"""
414
415
def _flatten_dataframe(df, set_index=None):
416
"""
417
Flatten hierarchical DataFrame structure.
418
419
Parameters:
420
- df: pandas DataFrame with hierarchical structure
421
- set_index: str, column name to set as index
422
423
Returns:
424
pandas DataFrame: Flattened DataFrame
425
"""
426
```
427
428
## Exception Classes
429
430
```python { .api }
431
class QuantStatsError(Exception):
432
"""Base exception class for QuantStats."""
433
434
class DataValidationError(QuantStatsError):
435
"""Raised when input data validation fails."""
436
437
class CalculationError(QuantStatsError):
438
"""Raised when a calculation fails."""
439
440
class PlottingError(QuantStatsError):
441
"""Raised when plotting operations fail."""
442
443
class BenchmarkError(QuantStatsError):
444
"""Raised when benchmark-related operations fail."""
445
```
446
447
## Usage Examples
448
449
### Basic Data Conversion
450
451
```python
452
import quantstats as qs
453
import pandas as pd
454
455
# Convert prices to returns
456
prices = pd.Series([100, 102, 101, 105, 103])
457
returns = qs.utils.to_returns(prices)
458
459
# Convert returns back to prices
460
reconstructed_prices = qs.utils.to_prices(returns, base=100)
461
462
# Calculate log returns
463
log_rets = qs.utils.log_returns(returns)
464
```
465
466
### Data Validation and Preparation
467
468
```python
469
# Validate input data
470
try:
471
validated_returns = qs.utils.validate_input(returns)
472
except qs.utils.DataValidationError as e:
473
print(f"Data validation failed: {e}")
474
475
# Aggregate to monthly returns
476
monthly_returns = qs.utils.aggregate_returns(returns, period='M')
477
```
478
479
### External Data Integration
480
481
```python
482
# Download benchmark data
483
spy_returns = qs.utils.download_returns('SPY', period='5y')
484
485
# Create excess returns
486
excess_returns = qs.utils.to_excess_returns(returns, rf=0.02)
487
```
488
489
## Constants
490
491
```python { .api }
492
_PREPARE_RETURNS_CACHE: dict
493
"""Internal cache for prepared returns data"""
494
495
_CACHE_MAX_SIZE: int
496
"""Maximum size for internal caches (default: 100)"""
497
```