or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-utilities.mdindex.mdpandas-integration.mdperformance-visualization.mdreport-generation.mdrisk-assessment.mdstatistical-analysis.md

data-utilities.mddocs/

0

# Data Utilities

1

2

Data preparation and utility functions for converting between prices and returns, data validation, aggregation, benchmarking preparation, and integration with external data sources for quantitative analysis workflows.

3

4

## Capabilities

5

6

### Data Conversion Functions

7

8

Convert between different data formats commonly used in quantitative finance.

9

10

```python { .api }

11

def to_returns(prices, rf=0.0):

12

"""

13

Convert price series to return series.

14

15

Parameters:

16

- prices: pandas Series of prices

17

- rf: float, risk-free rate to subtract from returns

18

19

Returns:

20

pandas Series: Returns calculated as pct_change()

21

"""

22

23

def to_prices(returns, base=1e5):

24

"""

25

Convert return series to price index.

26

27

Parameters:

28

- returns: pandas Series of returns

29

- base: float, starting value for price index

30

31

Returns:

32

pandas Series: Cumulative price index

33

"""

34

35

def log_returns(returns, rf=0.0, nperiods=None):

36

"""

37

Convert returns to log returns.

38

39

Parameters:

40

- returns: pandas Series of returns

41

- rf: float, risk-free rate

42

- nperiods: int, number of periods for annualization

43

44

Returns:

45

pandas Series: Log returns

46

"""

47

48

def to_log_returns(returns, rf=0.0, nperiods=None):

49

"""

50

Alias for log_returns function.

51

52

Parameters:

53

- returns: pandas Series of returns

54

- rf: float, risk-free rate

55

- nperiods: int, number of periods

56

57

Returns:

58

pandas Series: Log returns

59

"""

60

61

def to_excess_returns(returns, rf, nperiods=None):

62

"""

63

Calculate excess returns above risk-free rate.

64

65

Parameters:

66

- returns: pandas Series of returns

67

- rf: float, risk-free rate

68

- nperiods: int, number of periods for rate conversion

69

70

Returns:

71

pandas Series: Excess returns

72

"""

73

74

def rebase(prices, base=100.0):

75

"""

76

Rebase price series to start at specified value.

77

78

Parameters:

79

- prices: pandas Series of prices

80

- base: float, new base value

81

82

Returns:

83

pandas Series: Rebased price series

84

"""

85

```

86

87

### Data Validation and Preparation

88

89

Ensure data quality and prepare data for analysis.

90

91

```python { .api }

92

def validate_input(data, allow_empty=False):

93

"""

94

Validate input data for QuantStats functions.

95

96

Parameters:

97

- data: pandas Series or DataFrame to validate

98

- allow_empty: bool, whether to allow empty data

99

100

Returns:

101

pandas Series or DataFrame: Validated data

102

103

Raises:

104

DataValidationError: If data validation fails

105

"""

106

107

def _prepare_returns(data, rf=0.0, nperiods=None):

108

"""

109

Internal function to prepare returns data for analysis.

110

111

Parameters:

112

- data: pandas Series of returns or prices

113

- rf: float, risk-free rate

114

- nperiods: int, number of periods

115

116

Returns:

117

pandas Series: Prepared returns data

118

"""

119

120

def _prepare_prices(data, base=1.0):

121

"""

122

Internal function to prepare price data.

123

124

Parameters:

125

- data: pandas Series of prices

126

- base: float, base value for rebasing

127

128

Returns:

129

pandas Series: Prepared price data

130

"""

131

132

def _prepare_benchmark(benchmark=None, period="max", rf=0.0, prepare_returns=True):

133

"""

134

Prepare benchmark data for analysis.

135

136

Parameters:

137

- benchmark: str or pandas Series, benchmark identifier or data

138

- period: str, time period for data retrieval

139

- rf: float, risk-free rate

140

- prepare_returns: bool, whether to prepare returns

141

142

Returns:

143

pandas Series: Prepared benchmark data

144

"""

145

```

146

147

### Data Aggregation and Resampling

148

149

Functions for aggregating returns across different time periods.

150

151

```python { .api }

152

def aggregate_returns(returns, period=None, compounded=True):

153

"""

154

Aggregate returns to specified frequency.

155

156

Parameters:

157

- returns: pandas Series of returns

158

- period: str, aggregation period ('M', 'Q', 'Y', etc.)

159

- compounded: bool, whether to compound returns

160

161

Returns:

162

pandas Series: Aggregated returns

163

"""

164

165

def group_returns(returns, groupby, compounded=False):

166

"""

167

Group returns by specified criteria.

168

169

Parameters:

170

- returns: pandas Series of returns

171

- groupby: str or function, grouping criteria

172

- compounded: bool, whether to compound grouped returns

173

174

Returns:

175

pandas Series: Grouped returns

176

"""

177

178

def multi_shift(df, shift=3):

179

"""

180

Create DataFrame with multiple shifted versions.

181

182

Parameters:

183

- df: pandas DataFrame to shift

184

- shift: int, number of periods to shift

185

186

Returns:

187

pandas DataFrame: DataFrame with original and shifted columns

188

"""

189

```

190

191

### Statistical Utilities

192

193

Helper functions for statistical calculations and data manipulation.

194

195

```python { .api }

196

def exponential_stdev(returns, window=30, is_halflife=False):

197

"""

198

Calculate exponentially weighted standard deviation.

199

200

Parameters:

201

- returns: pandas Series of returns

202

- window: int, window size or halflife

203

- is_halflife: bool, whether window represents halflife

204

205

Returns:

206

pandas Series: Exponentially weighted standard deviation

207

"""

208

209

def _count_consecutive(data):

210

"""

211

Count consecutive occurrences in data.

212

213

Parameters:

214

- data: pandas Series of boolean or numeric data

215

216

Returns:

217

int: Maximum consecutive count

218

"""

219

220

def _round_to_closest(val, res, decimals=None):

221

"""

222

Round value to closest resolution.

223

224

Parameters:

225

- val: float, value to round

226

- res: float, resolution to round to

227

- decimals: int, number of decimal places

228

229

Returns:

230

float: Rounded value

231

"""

232

```

233

234

### Portfolio Construction

235

236

Functions for creating portfolios and indices from return data.

237

238

```python { .api }

239

def make_portfolio(returns, start_balance=1e5, mode="comp", round_to=None):

240

"""

241

Create portfolio value series from returns.

242

243

Parameters:

244

- returns: pandas Series of returns

245

- start_balance: float, starting portfolio value

246

- mode: str, calculation mode ('comp' for compounded)

247

- round_to: int, decimal places to round to

248

249

Returns:

250

pandas Series: Portfolio value over time

251

"""

252

253

def make_index(ticker, **kwargs):

254

"""

255

Create market index from ticker symbol.

256

257

Parameters:

258

- ticker: str, ticker symbol

259

- **kwargs: additional parameters for data retrieval

260

261

Returns:

262

pandas Series: Index price or return data

263

"""

264

```

265

266

### Data Download and External Sources

267

268

Retrieve financial data from external sources.

269

270

```python { .api }

271

def download_returns(ticker, period="max", proxy=None):

272

"""

273

Download return data for specified ticker.

274

275

Parameters:

276

- ticker: str, ticker symbol (e.g., 'SPY', 'AAPL')

277

- period: str, time period ('1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y', '10y', 'ytd', 'max')

278

- proxy: str, proxy server URL (optional)

279

280

Returns:

281

pandas Series: Return series for the ticker

282

"""

283

```

284

285

### Date and Time Utilities

286

287

Functions for working with time-based data filtering and analysis.

288

289

```python { .api }

290

def _mtd(df):

291

"""

292

Filter DataFrame to month-to-date data.

293

294

Parameters:

295

- df: pandas DataFrame or Series with datetime index

296

297

Returns:

298

pandas DataFrame or Series: Month-to-date filtered data

299

"""

300

301

def _qtd(df):

302

"""

303

Filter DataFrame to quarter-to-date data.

304

305

Parameters:

306

- df: pandas DataFrame or Series with datetime index

307

308

Returns:

309

pandas DataFrame or Series: Quarter-to-date filtered data

310

"""

311

312

def _ytd(df):

313

"""

314

Filter DataFrame to year-to-date data.

315

316

Parameters:

317

- df: pandas DataFrame or Series with datetime index

318

319

Returns:

320

pandas DataFrame or Series: Year-to-date filtered data

321

"""

322

323

def _pandas_date(df, dates):

324

"""

325

Filter DataFrame by specific dates.

326

327

Parameters:

328

- df: pandas DataFrame or Series

329

- dates: list or pandas DatetimeIndex of dates to filter

330

331

Returns:

332

pandas DataFrame or Series: Filtered data

333

"""

334

335

def _pandas_current_month(df):

336

"""

337

Filter DataFrame to current month data.

338

339

Parameters:

340

- df: pandas DataFrame or Series with datetime index

341

342

Returns:

343

pandas DataFrame or Series: Current month data

344

"""

345

```

346

347

### Environment and Context Detection

348

349

Utility functions for detecting execution environment and setting up context.

350

351

```python { .api }

352

def _in_notebook(matplotlib_inline=False):

353

"""

354

Detect if running in Jupyter notebook environment.

355

356

Parameters:

357

- matplotlib_inline: bool, whether to enable matplotlib inline mode

358

359

Returns:

360

bool: True if running in notebook, False otherwise

361

"""

362

363

def _file_stream():

364

"""

365

Create file stream context for data operations.

366

367

Returns:

368

file-like object: Stream for file operations

369

"""

370

```

371

372

### Cache Management

373

374

Functions for managing internal data caches to improve performance.

375

376

```python { .api }

377

def _generate_cache_key(data, rf, nperiods):

378

"""

379

Generate cache key for prepared returns data.

380

381

Parameters:

382

- data: pandas Series, input data

383

- rf: float, risk-free rate

384

- nperiods: int, number of periods

385

386

Returns:

387

str: Cache key

388

"""

389

390

def _clear_cache_if_full():

391

"""

392

Clear cache if it exceeds maximum size limit.

393

394

Returns:

395

None

396

"""

397

```

398

399

### Data Formatting and Display

400

401

Functions for formatting data for display and analysis.

402

403

```python { .api }

404

def _score_str(val):

405

"""

406

Format score value as string with appropriate precision.

407

408

Parameters:

409

- val: float, score value to format

410

411

Returns:

412

str: Formatted score string

413

"""

414

415

def _flatten_dataframe(df, set_index=None):

416

"""

417

Flatten hierarchical DataFrame structure.

418

419

Parameters:

420

- df: pandas DataFrame with hierarchical structure

421

- set_index: str, column name to set as index

422

423

Returns:

424

pandas DataFrame: Flattened DataFrame

425

"""

426

```

427

428

## Exception Classes

429

430

```python { .api }

431

class QuantStatsError(Exception):

432

"""Base exception class for QuantStats."""

433

434

class DataValidationError(QuantStatsError):

435

"""Raised when input data validation fails."""

436

437

class CalculationError(QuantStatsError):

438

"""Raised when a calculation fails."""

439

440

class PlottingError(QuantStatsError):

441

"""Raised when plotting operations fail."""

442

443

class BenchmarkError(QuantStatsError):

444

"""Raised when benchmark-related operations fail."""

445

```

446

447

## Usage Examples

448

449

### Basic Data Conversion

450

451

```python

452

import quantstats as qs

453

import pandas as pd

454

455

# Convert prices to returns

456

prices = pd.Series([100, 102, 101, 105, 103])

457

returns = qs.utils.to_returns(prices)

458

459

# Convert returns back to prices

460

reconstructed_prices = qs.utils.to_prices(returns, base=100)

461

462

# Calculate log returns

463

log_rets = qs.utils.log_returns(returns)

464

```

465

466

### Data Validation and Preparation

467

468

```python

469

# Validate input data

470

try:

471

validated_returns = qs.utils.validate_input(returns)

472

except qs.utils.DataValidationError as e:

473

print(f"Data validation failed: {e}")

474

475

# Aggregate to monthly returns

476

monthly_returns = qs.utils.aggregate_returns(returns, period='M')

477

```

478

479

### External Data Integration

480

481

```python

482

# Download benchmark data

483

spy_returns = qs.utils.download_returns('SPY', period='5y')

484

485

# Create excess returns

486

excess_returns = qs.utils.to_excess_returns(returns, rf=0.02)

487

```

488

489

## Constants

490

491

```python { .api }

492

_PREPARE_RETURNS_CACHE: dict

493

"""Internal cache for prepared returns data"""

494

495

_CACHE_MAX_SIZE: int

496

"""Maximum size for internal caches (default: 100)"""

497

```