tessl/pypi-pytest-benchmark

A pytest fixture for benchmarking code that automatically calibrates test runs for accurate performance measurements.

—

Pending

Overview

Eval results

Files

Results Storage and Comparison

Name: tessl/pypi-pytest-benchmark
Author: tessl

Overview

pytest-benchmark provides robust storage backends for persisting benchmark results and powerful comparison capabilities for tracking performance over time. Results can be stored in files or Elasticsearch and compared across different runs, commits, or environments.

Storage Backends

FileStorage Class

class FileStorage:
    """File-based storage backend for benchmark results."""
    
    def __init__(self, path: str, logger, default_machine_id: str = None):
        """
        Initialize file storage.
        
        Args:
            path: Directory path for storing benchmark files
            logger: Logger instance for output
            default_machine_id: Default machine identifier
        """
    
    def save(self, output_json: dict, save: str) -> str:
        """
        Save benchmark results to file.
        
        Args:
            output_json: Benchmark data in JSON format
            save: Save identifier/name
            
        Returns:
            str: Path to saved file
        """
    
    def load(self, name: str) -> dict:
        """
        Load benchmark results from file.
        
        Args:
            name: File identifier to load
            
        Returns:
            dict: Loaded benchmark data
        """

ElasticsearchStorage Class

class ElasticsearchStorage:
    """Elasticsearch storage backend for benchmark results."""
    
    def __init__(self, hosts: list, index: str, doctype: str, project_name: str = None, logger=None, **kwargs):
        """
        Initialize Elasticsearch storage.
        
        Args:
            hosts: List of Elasticsearch host URLs
            index: Index name for storing benchmarks
            doctype: Document type (deprecated in ES 7+)
            project_name: Project identifier
            logger: Logger instance
            **kwargs: Additional Elasticsearch client options
        """
    
    def save(self, output_json: dict, save: str) -> str:
        """Save benchmark results to Elasticsearch."""
    
    def load(self, name: str) -> dict:
        """Load benchmark results from Elasticsearch."""

Storage Configuration

File Storage URIs

# Default file storage
--benchmark-storage=file://./.benchmarks

# Absolute path
--benchmark-storage=file:///home/user/benchmarks

# Relative path
--benchmark-storage=file://./results/benchmarks

Elasticsearch URIs

# Basic Elasticsearch
--benchmark-storage=elasticsearch+http://localhost:9200/benchmarks/results

# With authentication
--benchmark-storage=elasticsearch+https://user:pass@host:9200/index/doctype

# Multiple hosts
--benchmark-storage=elasticsearch+http://host1:9200,host2:9200/index/doctype

# With project name
--benchmark-storage=elasticsearch+http://host:9200/index/doctype?project_name=myproject

Saving Results

Manual Saving

# Save with custom name
pytest --benchmark-save=baseline

# Save with descriptive name  
pytest --benchmark-save=feature-x-implementation

# Auto-save with timestamp
pytest --benchmark-autosave

Programmatic Saving

def test_with_custom_save(benchmark):
    def my_function():
        return sum(range(1000))
    
    result = benchmark(my_function)
    
    # Results automatically saved if --benchmark-save is used
    assert result == 499500

Save Data Options

# Save only statistics (default)
pytest --benchmark-save=baseline

# Save complete timing data
pytest --benchmark-save=baseline --benchmark-save-data

Result Comparison

Command-Line Comparison

# Compare against latest saved
pytest --benchmark-compare

# Compare against specific run
pytest --benchmark-compare=baseline
pytest --benchmark-compare=0001

# Compare with failure thresholds
pytest --benchmark-compare=baseline --benchmark-compare-fail=mean:10%

CLI Tool Usage

# List available runs
pytest-benchmark list

# Compare specific runs
pytest-benchmark compare 0001 0002

# Compare with filters
pytest-benchmark compare 'Linux-CPython-3.9-64bit/*'

# Display comparison table
pytest-benchmark compare --help

Comparison Examples

Basic Comparison

# First, establish baseline
pytest --benchmark-save=baseline tests/

# Later, compare new implementation
pytest --benchmark-compare=baseline tests/

Continuous Integration Workflow

# In CI pipeline
# 1. Run benchmarks and save
pytest --benchmark-only --benchmark-save=commit-${BUILD_ID}

# 2. Compare against master baseline  
pytest --benchmark-only --benchmark-compare=master-baseline \
       --benchmark-compare-fail=mean:15%

Multiple Environment Comparison

# Save results for different Python versions
pytest --benchmark-save=python38 tests/
pytest --benchmark-save=python39 tests/  
pytest --benchmark-save=python310 tests/

# Compare across versions
pytest-benchmark compare python38 python39 python310

Performance Regression Detection

Failure Thresholds

# Threshold expression formats:
"mean:5%"           # Mean increased by more than 5%
"min:0.001"         # Min increased by more than 1ms
"max:10%"           # Max increased by more than 10%  
"stddev:25%"        # Standard deviation increased by 25%

Multiple Thresholds

# Multiple failure conditions
pytest --benchmark-compare=baseline \
       --benchmark-compare-fail=mean:10% \
       --benchmark-compare-fail=max:20% \  
       --benchmark-compare-fail=min:0.005

Example Regression Detection

def test_performance_sensitive_function(benchmark):
    def critical_function():
        # This function's performance is critical
        return sum(x**2 for x in range(10000))
    
    result = benchmark(critical_function)
    assert result == 333283335000

# Run with regression detection
# pytest --benchmark-compare=baseline --benchmark-compare-fail=mean:5%

Machine Information Tracking

Automatic Machine Detection

def pytest_benchmark_generate_machine_info() -> dict:
    """
    Generate machine information for benchmark context.
    
    Returns:
        dict: Machine information including:
            - node: Machine hostname
            - processor: Processor name
            - machine: Machine architecture  
            - python_implementation: CPython/PyPy/etc
            - python_version: Python version
            - system: Operating system
            - cpu: CPU information from py-cpuinfo
    """

Machine Info Comparison

# Benchmarks warn if machine info differs
pytest --benchmark-compare=baseline
# Warning: Benchmark machine_info is different. Current: {...} VS saved: {...}

Storage Management

File Storage Structure

.benchmarks/
├── Linux-CPython-3.9-64bit/
│   ├── 0001_baseline.json
│   ├── 0002_feature_x.json  
│   └── 0003_master.json
└── machine_info.json

Elasticsearch Document Structure

{
  "_index": "benchmarks",
  "_type": "results", 
  "_id": "0001_baseline",
  "_source": {
    "machine_info": {...},
    "commit_info": {...},
    "benchmarks": [...],
    "datetime": "2023-01-01T12:00:00Z",
    "version": "5.1.0"
  }
}

JSON Export Format

Complete JSON Export

# Export with full timing data
pytest --benchmark-json=complete.json --benchmark-save-data

JSON Structure

# Complete benchmark JSON format:
{
    "machine_info": {
        "node": str,
        "processor": str, 
        "machine": str,
        "python_implementation": str,
        "python_version": str,
        "system": str,
        "cpu": dict
    },
    "commit_info": {
        "id": str,
        "time": str,
        "author_time": str,
        "author_name": str,
        "author_email": str,
        "message": str,
        "branch": str
    },
    "benchmarks": [
        {
            "group": str,
            "name": str,
            "fullname": str,
            "params": dict,
            "param": str,
            "extra_info": dict,
            "stats": {
                "min": float,
                "max": float, 
                "mean": float,
                "stddev": float,
                "rounds": int,
                "median": float,
                "iqr": float,
                "q1": float,
                "q3": float,
                "iqr_outliers": int,
                "stddev_outliers": int,
                "outliers": str,
                "ld15iqr": float,
                "hd15iqr": float,
                "ops": float,
                "total": float
            },
            "data": [float, ...]  # If --benchmark-save-data used
        }
    ],
    "datetime": str,
    "version": str
}

Advanced Usage

Custom Commit Information

def pytest_benchmark_generate_commit_info(config):
    """Custom commit info generation."""
    return {
        "id": "custom-build-123",
        "branch": "feature/optimization", 
        "message": "Performance improvements",
        "time": "2023-01-01T12:00:00Z"
    }

Storage Authentication

# Using netrc for Elasticsearch auth
echo "machine elasticsearch.example.com login user password secret" >> ~/.netrc

pytest --benchmark-storage=elasticsearch+https://elasticsearch.example.com:9200/bench/result \
       --benchmark-netrc=~/.netrc

Filtering Comparisons

# Compare only specific test patterns
pytest-benchmark compare baseline current --benchmark-filter="*string*"

# Compare specific groups  
pytest-benchmark compare baseline current --group="database"

Troubleshooting

Storage Issues

# Check storage connectivity
pytest --benchmark-storage=file://./test-storage --benchmark-save=test

# Verify Elasticsearch connection
pytest --benchmark-storage=elasticsearch+http://localhost:9200/test/bench \
       --benchmark-save=connectivity-test

Comparison Failures

# Debug comparison issues
pytest --benchmark-compare=baseline --benchmark-verbose

# List available runs for comparison
pytest-benchmark list --storage=file://.benchmarks

Install with Tessl CLI

npx tessl i tessl/pypi-pytest-benchmark

docs

analysis-reporting.md

aspect-benchmarking.md

storage-comparison.md

tile.json

tessl/pypi-pytest-benchmark

storage-comparison.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Results Storage and Comparison

Overview

Storage Backends

FileStorage Class

ElasticsearchStorage Class

Storage Configuration

File Storage URIs

Elasticsearch URIs

Saving Results

Manual Saving

Programmatic Saving

Save Data Options

Result Comparison

Command-Line Comparison

CLI Tool Usage

Comparison Examples

Basic Comparison

Continuous Integration Workflow

Multiple Environment Comparison

Performance Regression Detection

Failure Thresholds

Multiple Thresholds

Example Regression Detection

Machine Information Tracking

Automatic Machine Detection

Machine Info Comparison

Storage Management

File Storage Structure

Elasticsearch Document Structure

JSON Export Format

Complete JSON Export

JSON Structure

Advanced Usage

Custom Commit Information

Storage Authentication

Filtering Comparisons

Troubleshooting

Storage Issues

Comparison Failures

storage-comparison.mddocs/