or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

analysis-reporting.mdaspect-benchmarking.mdcli-tools.mdconfiguration.mdcore-benchmarking.mdindex.mdstorage-comparison.md

storage-comparison.mddocs/

0

# Results Storage and Comparison

1

2

## Overview

3

4

pytest-benchmark provides robust storage backends for persisting benchmark results and powerful comparison capabilities for tracking performance over time. Results can be stored in files or Elasticsearch and compared across different runs, commits, or environments.

5

6

## Storage Backends

7

8

### FileStorage Class

9

10

```python { .api }

11

class FileStorage:

12

"""File-based storage backend for benchmark results."""

13

14

def __init__(self, path: str, logger, default_machine_id: str = None):

15

"""

16

Initialize file storage.

17

18

Args:

19

path: Directory path for storing benchmark files

20

logger: Logger instance for output

21

default_machine_id: Default machine identifier

22

"""

23

24

def save(self, output_json: dict, save: str) -> str:

25

"""

26

Save benchmark results to file.

27

28

Args:

29

output_json: Benchmark data in JSON format

30

save: Save identifier/name

31

32

Returns:

33

str: Path to saved file

34

"""

35

36

def load(self, name: str) -> dict:

37

"""

38

Load benchmark results from file.

39

40

Args:

41

name: File identifier to load

42

43

Returns:

44

dict: Loaded benchmark data

45

"""

46

```

47

48

### ElasticsearchStorage Class

49

50

```python { .api }

51

class ElasticsearchStorage:

52

"""Elasticsearch storage backend for benchmark results."""

53

54

def __init__(self, hosts: list, index: str, doctype: str, project_name: str = None, logger=None, **kwargs):

55

"""

56

Initialize Elasticsearch storage.

57

58

Args:

59

hosts: List of Elasticsearch host URLs

60

index: Index name for storing benchmarks

61

doctype: Document type (deprecated in ES 7+)

62

project_name: Project identifier

63

logger: Logger instance

64

**kwargs: Additional Elasticsearch client options

65

"""

66

67

def save(self, output_json: dict, save: str) -> str:

68

"""Save benchmark results to Elasticsearch."""

69

70

def load(self, name: str) -> dict:

71

"""Load benchmark results from Elasticsearch."""

72

```

73

74

## Storage Configuration

75

76

### File Storage URIs

77

78

```bash { .api }

79

# Default file storage

80

--benchmark-storage=file://./.benchmarks

81

82

# Absolute path

83

--benchmark-storage=file:///home/user/benchmarks

84

85

# Relative path

86

--benchmark-storage=file://./results/benchmarks

87

```

88

89

### Elasticsearch URIs

90

91

```bash { .api }

92

# Basic Elasticsearch

93

--benchmark-storage=elasticsearch+http://localhost:9200/benchmarks/results

94

95

# With authentication

96

--benchmark-storage=elasticsearch+https://user:pass@host:9200/index/doctype

97

98

# Multiple hosts

99

--benchmark-storage=elasticsearch+http://host1:9200,host2:9200/index/doctype

100

101

# With project name

102

--benchmark-storage=elasticsearch+http://host:9200/index/doctype?project_name=myproject

103

```

104

105

## Saving Results

106

107

### Manual Saving

108

109

```bash

110

# Save with custom name

111

pytest --benchmark-save=baseline

112

113

# Save with descriptive name

114

pytest --benchmark-save=feature-x-implementation

115

116

# Auto-save with timestamp

117

pytest --benchmark-autosave

118

```

119

120

### Programmatic Saving

121

122

```python

123

def test_with_custom_save(benchmark):

124

def my_function():

125

return sum(range(1000))

126

127

result = benchmark(my_function)

128

129

# Results automatically saved if --benchmark-save is used

130

assert result == 499500

131

```

132

133

### Save Data Options

134

135

```bash

136

# Save only statistics (default)

137

pytest --benchmark-save=baseline

138

139

# Save complete timing data

140

pytest --benchmark-save=baseline --benchmark-save-data

141

```

142

143

## Result Comparison

144

145

### Command-Line Comparison

146

147

```bash { .api }

148

# Compare against latest saved

149

pytest --benchmark-compare

150

151

# Compare against specific run

152

pytest --benchmark-compare=baseline

153

pytest --benchmark-compare=0001

154

155

# Compare with failure thresholds

156

pytest --benchmark-compare=baseline --benchmark-compare-fail=mean:10%

157

```

158

159

### CLI Tool Usage

160

161

```bash

162

# List available runs

163

pytest-benchmark list

164

165

# Compare specific runs

166

pytest-benchmark compare 0001 0002

167

168

# Compare with filters

169

pytest-benchmark compare 'Linux-CPython-3.9-64bit/*'

170

171

# Display comparison table

172

pytest-benchmark compare --help

173

```

174

175

## Comparison Examples

176

177

### Basic Comparison

178

179

```bash

180

# First, establish baseline

181

pytest --benchmark-save=baseline tests/

182

183

# Later, compare new implementation

184

pytest --benchmark-compare=baseline tests/

185

```

186

187

### Continuous Integration Workflow

188

189

```bash

190

# In CI pipeline

191

# 1. Run benchmarks and save

192

pytest --benchmark-only --benchmark-save=commit-${BUILD_ID}

193

194

# 2. Compare against master baseline

195

pytest --benchmark-only --benchmark-compare=master-baseline \

196

--benchmark-compare-fail=mean:15%

197

```

198

199

### Multiple Environment Comparison

200

201

```bash

202

# Save results for different Python versions

203

pytest --benchmark-save=python38 tests/

204

pytest --benchmark-save=python39 tests/

205

pytest --benchmark-save=python310 tests/

206

207

# Compare across versions

208

pytest-benchmark compare python38 python39 python310

209

```

210

211

## Performance Regression Detection

212

213

### Failure Thresholds

214

215

```python { .api }

216

# Threshold expression formats:

217

"mean:5%" # Mean increased by more than 5%

218

"min:0.001" # Min increased by more than 1ms

219

"max:10%" # Max increased by more than 10%

220

"stddev:25%" # Standard deviation increased by 25%

221

```

222

223

### Multiple Thresholds

224

225

```bash

226

# Multiple failure conditions

227

pytest --benchmark-compare=baseline \

228

--benchmark-compare-fail=mean:10% \

229

--benchmark-compare-fail=max:20% \

230

--benchmark-compare-fail=min:0.005

231

```

232

233

### Example Regression Detection

234

235

```python

236

def test_performance_sensitive_function(benchmark):

237

def critical_function():

238

# This function's performance is critical

239

return sum(x**2 for x in range(10000))

240

241

result = benchmark(critical_function)

242

assert result == 333283335000

243

244

# Run with regression detection

245

# pytest --benchmark-compare=baseline --benchmark-compare-fail=mean:5%

246

```

247

248

## Machine Information Tracking

249

250

### Automatic Machine Detection

251

252

```python { .api }

253

def pytest_benchmark_generate_machine_info() -> dict:

254

"""

255

Generate machine information for benchmark context.

256

257

Returns:

258

dict: Machine information including:

259

- node: Machine hostname

260

- processor: Processor name

261

- machine: Machine architecture

262

- python_implementation: CPython/PyPy/etc

263

- python_version: Python version

264

- system: Operating system

265

- cpu: CPU information from py-cpuinfo

266

"""

267

```

268

269

### Machine Info Comparison

270

271

```bash

272

# Benchmarks warn if machine info differs

273

pytest --benchmark-compare=baseline

274

# Warning: Benchmark machine_info is different. Current: {...} VS saved: {...}

275

```

276

277

## Storage Management

278

279

### File Storage Structure

280

281

```

282

.benchmarks/

283

├── Linux-CPython-3.9-64bit/

284

│ ├── 0001_baseline.json

285

│ ├── 0002_feature_x.json

286

│ └── 0003_master.json

287

└── machine_info.json

288

```

289

290

### Elasticsearch Document Structure

291

292

```json

293

{

294

"_index": "benchmarks",

295

"_type": "results",

296

"_id": "0001_baseline",

297

"_source": {

298

"machine_info": {...},

299

"commit_info": {...},

300

"benchmarks": [...],

301

"datetime": "2023-01-01T12:00:00Z",

302

"version": "5.1.0"

303

}

304

}

305

```

306

307

## JSON Export Format

308

309

### Complete JSON Export

310

311

```bash

312

# Export with full timing data

313

pytest --benchmark-json=complete.json --benchmark-save-data

314

```

315

316

### JSON Structure

317

318

```python { .api }

319

# Complete benchmark JSON format:

320

{

321

"machine_info": {

322

"node": str,

323

"processor": str,

324

"machine": str,

325

"python_implementation": str,

326

"python_version": str,

327

"system": str,

328

"cpu": dict

329

},

330

"commit_info": {

331

"id": str,

332

"time": str,

333

"author_time": str,

334

"author_name": str,

335

"author_email": str,

336

"message": str,

337

"branch": str

338

},

339

"benchmarks": [

340

{

341

"group": str,

342

"name": str,

343

"fullname": str,

344

"params": dict,

345

"param": str,

346

"extra_info": dict,

347

"stats": {

348

"min": float,

349

"max": float,

350

"mean": float,

351

"stddev": float,

352

"rounds": int,

353

"median": float,

354

"iqr": float,

355

"q1": float,

356

"q3": float,

357

"iqr_outliers": int,

358

"stddev_outliers": int,

359

"outliers": str,

360

"ld15iqr": float,

361

"hd15iqr": float,

362

"ops": float,

363

"total": float

364

},

365

"data": [float, ...] # If --benchmark-save-data used

366

}

367

],

368

"datetime": str,

369

"version": str

370

}

371

```

372

373

## Advanced Usage

374

375

### Custom Commit Information

376

377

```python

378

def pytest_benchmark_generate_commit_info(config):

379

"""Custom commit info generation."""

380

return {

381

"id": "custom-build-123",

382

"branch": "feature/optimization",

383

"message": "Performance improvements",

384

"time": "2023-01-01T12:00:00Z"

385

}

386

```

387

388

### Storage Authentication

389

390

```bash

391

# Using netrc for Elasticsearch auth

392

echo "machine elasticsearch.example.com login user password secret" >> ~/.netrc

393

394

pytest --benchmark-storage=elasticsearch+https://elasticsearch.example.com:9200/bench/result \

395

--benchmark-netrc=~/.netrc

396

```

397

398

### Filtering Comparisons

399

400

```bash

401

# Compare only specific test patterns

402

pytest-benchmark compare baseline current --benchmark-filter="*string*"

403

404

# Compare specific groups

405

pytest-benchmark compare baseline current --group="database"

406

```

407

408

## Troubleshooting

409

410

### Storage Issues

411

412

```python

413

# Check storage connectivity

414

pytest --benchmark-storage=file://./test-storage --benchmark-save=test

415

416

# Verify Elasticsearch connection

417

pytest --benchmark-storage=elasticsearch+http://localhost:9200/test/bench \

418

--benchmark-save=connectivity-test

419

```

420

421

### Comparison Failures

422

423

```bash

424

# Debug comparison issues

425

pytest --benchmark-compare=baseline --benchmark-verbose

426

427

# List available runs for comparison

428

pytest-benchmark list --storage=file://.benchmarks

429

```