tessl/pypi-bottleneck

Fast NumPy array functions written in C for high-performance numerical computing

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview

Eval results

Files

Utility Functions

Name: tessl/pypi-bottleneck
Author: tessl

Testing, benchmarking, and introspection utilities for performance analysis and development support. These functions help developers evaluate Bottleneck's performance benefits and validate functionality.

Capabilities

Performance Benchmarking

Comprehensive benchmarking tools to compare Bottleneck performance against NumPy equivalents.

def bench():
    """
    Run comprehensive performance benchmark comparing Bottleneck vs NumPy.

    Executes a full benchmark suite testing all Bottleneck functions against
    their NumPy equivalents across different array sizes, shapes, and data types.
    Results show speed ratios (NumPy time / Bottleneck time) where higher
    values indicate better Bottleneck performance.

    Returns:
    bool, always returns True after printing benchmark results

    Performance Metrics Displayed:
    - Function names and speed ratios for different array configurations
    - Array shapes: (100,), (1000,1000) with various axes
    - Data types: float64 arrays with and without NaN values
    - Speed ratios > 1.0 indicate Bottleneck is faster than NumPy
    """

def bench_detailed(func_name, fraction_nan=0.3):
    """
    Run detailed benchmark for a specific function with customizable parameters.

    Provides in-depth performance analysis for a single function, including
    timing breakdowns, memory usage, and parameter sensitivity analysis.

    Parameters:
    - func_name: str, name of Bottleneck function to benchmark
                 (e.g., 'nanmean', 'move_median', 'rankdata')
    - fraction_nan: float, fraction of array elements to set as NaN
                    for testing NaN-handling performance (default: 0.3)

    Returns:
    None (prints detailed benchmark results)

    Benchmark Details Include:
    - Timing statistics across multiple runs
    - Memory allocation patterns
    - Performance scaling with array size
    - Impact of NaN density on performance
    - Comparison with NumPy/SciPy equivalents
    """

Testing Framework

Built-in test suite execution for functionality validation.

def test():
    """
    Run the complete Bottleneck test suite.

    Executes all unit tests to verify correct functionality across different
    array configurations, data types, and edge cases. Uses pytest framework
    for comprehensive testing coverage.

    Returns:
    bool, True if all tests pass, False if any test fails

    Test Coverage Includes:
    - Correctness verification against NumPy reference implementations
    - Edge case handling (empty arrays, all-NaN arrays, single elements)
    - Data type compatibility (int32, int64, float32, float64)
    - Multi-dimensional array operations
    - Axis parameter validation
    - Memory layout handling (C-contiguous, Fortran-contiguous, strided)
    - Input validation and error handling
    """

Function Introspection

Utility functions for exploring and categorizing Bottleneck's API.

def get_functions(module_name, as_string=False):
    """
    Get list of functions from specified Bottleneck module.

    Provides programmatic access to function lists for testing, documentation,
    or dynamic function discovery purposes.

    Parameters:
    - module_name: str, module name to query:
                   - 'reduce': statistical reduction functions
                   - 'move': moving window functions  
                   - 'nonreduce': array manipulation functions
                   - 'nonreduce_axis': axis-based manipulation functions
                   - 'all': all functions from all modules
    - as_string: bool, return function names as strings instead of
                 function objects (default: False)

    Returns:
    list, function objects or function name strings

    Available Modules:
    - 'reduce': [nansum, nanmean, nanstd, nanvar, nanmin, nanmax, ...]
    - 'move': [move_sum, move_mean, move_std, move_var, move_min, ...]
    - 'nonreduce': [replace]
    - 'nonreduce_axis': [partition, argpartition, rankdata, nanrankdata, push]
    """

Usage Examples

Performance Analysis

import bottleneck as bn

# Run comprehensive benchmark to see overall performance gains
print("Running comprehensive benchmark...")
bn.bench()

# Output will show performance ratios like:
#                   no NaN     no NaN      NaN       no NaN      NaN
#                    (100,)  (1000,1000)(1000,1000)(1000,1000)(1000,1000)
#                    axis=0     axis=0     axis=0     axis=1     axis=1
# nansum         29.7        1.4        1.6        2.0        2.1
# nanmean        99.0        2.0        1.8        3.2        2.5
# move_mean    6264.3       66.2      111.9      361.1      246.5

Detailed Function Benchmarking

import bottleneck as bn

# Analyze specific function performance with different NaN densities
print("Benchmarking nanmean with 10% NaN values:")
bn.bench_detailed('nanmean', fraction_nan=0.1)

print("\nBenchmarking nanmean with 50% NaN values:")  
bn.bench_detailed('nanmean', fraction_nan=0.5)

# Benchmark moving window functions
print("\nBenchmarking move_median performance:")
bn.bench_detailed('move_median', fraction_nan=0.2)

# Compare different functions
functions_to_test = ['nansum', 'nanmean', 'nanstd', 'nanmedian']
for func in functions_to_test:
    print(f"\n=== {func} ===")
    bn.bench_detailed(func, fraction_nan=0.3)

Function Discovery and Testing

import bottleneck as bn

# Discover available functions by category
reduce_funcs = bn.get_functions('reduce', as_string=True)
move_funcs = bn.get_functions('move', as_string=True)
all_funcs = bn.get_functions('all', as_string=True)

print("Reduction functions:", reduce_funcs)
print("Moving window functions:", move_funcs)
print("Total functions available:", len(all_funcs))

# Get function objects for dynamic usage
move_function_objects = bn.get_functions('move', as_string=False)
for func in move_function_objects:
    print(f"Function: {func.__name__}, Module: {func.__module__}")

# Test specific function categories
print("\nTesting reduction functions...")
reduce_functions = bn.get_functions('reduce')
for func in reduce_functions[:3]:  # Test first 3 functions
    try:
        import numpy as np
        test_data = np.array([1, 2, np.nan, 4, 5])
        result = func(test_data)
        print(f"{func.__name__}([1, 2, nan, 4, 5]) = {result}")
    except Exception as e:
        print(f"Error testing {func.__name__}: {e}")

Development and Validation Workflow

import bottleneck as bn
import numpy as np

# Complete development workflow example
def validate_bottleneck_installation():
    """Comprehensive validation of Bottleneck installation and performance."""
    
    print("=== Bottleneck Installation Validation ===")
    
    # 1. Run test suite
    print("1. Running test suite...")
    test_result = bn.test()
    print(f"   Tests passed: {test_result}")
    
    # 2. Verify basic functionality
    print("\n2. Testing basic functionality...")
    test_data = np.array([1, 2, np.nan, 4, 5])
    
    # Test core functions
    results = {
        'nanmean': bn.nanmean(test_data),
        'nansum': bn.nansum(test_data), 
        'move_mean': bn.move_mean(test_data, window=3, min_count=1),
        'rankdata': bn.rankdata([3, 1, 4, 1, 5])
    }
    
    for func_name, result in results.items():
        print(f"   {func_name}: {result}")
    
    # 3. Check performance benefits
    print("\n3. Quick performance check...")
    large_data = np.random.randn(10000)
    large_data[::10] = np.nan  # Add some NaN values
    
    import time
    
    # Time NumPy
    start = time.time()
    numpy_result = np.nanmean(large_data)
    numpy_time = time.time() - start
    
    # Time Bottleneck
    start = time.time()
    bn_result = bn.nanmean(large_data)
    bn_time = time.time() - start
    
    speedup = numpy_time / bn_time
    print(f"   NumPy nanmean: {numpy_time:.6f}s")
    print(f"   Bottleneck nanmean: {bn_time:.6f}s")
    print(f"   Speedup: {speedup:.1f}x")
    
    # 4. Function coverage check
    print("\n4. Function coverage:")
    all_functions = bn.get_functions('all', as_string=True)
    by_category = {
        'reduce': bn.get_functions('reduce', as_string=True),
        'move': bn.get_functions('move', as_string=True),
        'nonreduce': bn.get_functions('nonreduce', as_string=True),
        'nonreduce_axis': bn.get_functions('nonreduce_axis', as_string=True)
    }
    
    for category, functions in by_category.items():
        print(f"   {category}: {len(functions)} functions")
    
    print(f"   Total: {len(all_functions)} functions")
    
    return test_result and speedup > 1.0

# Run validation
is_working = validate_bottleneck_installation()
print(f"\nBottleneck working properly: {is_working}")

Continuous Integration Testing

import bottleneck as bn
import sys

def ci_test_bottleneck():
    """Lightweight test for CI/CD pipelines."""
    
    # Essential functionality test
    import numpy as np
    
    try:
        # Test basic operations
        data = np.array([1, 2, np.nan, 4, 5])
        
        assert bn.nanmean(data) == 3.0
        assert bn.nansum(data) == 12.0
        assert not bn.anynan(np.array([1, 2, 3]))
        assert bn.allnan(np.array([np.nan, np.nan]))
        
        # Test moving window
        series = np.array([1, 2, 3, 4, 5])
        ma = bn.move_mean(series, window=3, min_count=1)
        assert len(ma) == len(series)
        
        # Test ranking
        ranks = bn.rankdata([1, 3, 2])
        expected = np.array([1.0, 3.0, 2.0])
        assert np.allclose(ranks, expected)
        
        print("✓ All essential functions working")
        return True
        
    except Exception as e:
        print(f"✗ Error in essential functionality: {e}")
        return False

# Use in CI pipeline
if __name__ == "__main__":
    success = ci_test_bottleneck()
    sys.exit(0 if success else 1)

Performance Optimization Tips

When using Bottleneck's utility functions for optimization:

Use bench() periodically to verify performance benefits on your specific hardware and data patterns
Profile with bench_detailed() when optimizing critical code paths to understand the impact of:
- Array size and shape
- NaN density in your data
- Memory layout (C vs Fortran order)
Validate with test() after any environment changes (Python version, NumPy version, compilation flags)
Monitor function coverage with get_functions() to ensure you're using the most optimized functions available