CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-numexpr

Fast numerical expression evaluator for NumPy that accelerates array operations through optimized implementations and multi-threading

Pending
Overview
Eval results
Files

threading-performance.mddocs/

Threading and Performance Control

Configuration of multi-threading behavior and performance optimization settings for CPU-intensive computations. NumExpr automatically parallelizes operations across available CPU cores and provides fine-grained control over threading behavior.

Capabilities

Thread Configuration

Control the number of threads used for NumExpr operations, balancing performance with system resource usage.

def set_num_threads(nthreads):
    """
    Set the number of threads to use for operations.
    
    Controls the parallelization level for NumExpr computations. The
    virtual machine distributes array chunks across the specified number
    of threads for parallel execution.
    
    Parameters:
    - nthreads (int): Number of threads to use (1 to MAX_THREADS)
    
    Returns:
    int: Previous thread count setting
    
    Raises:
    ValueError: If nthreads exceeds MAX_THREADS or is less than 1
    """

def get_num_threads():
    """
    Get the current number of threads in use for operations.
    
    Returns:
    int: Current thread count configuration
    """

Usage Examples:

import numexpr as ne
import numpy as np

# Check current thread configuration
print(f"Current threads: {ne.get_num_threads()}")
print(f"Max threads supported: {ne.MAX_THREADS}")

# Set specific thread count
old_threads = ne.set_num_threads(4)
print(f"Changed from {old_threads} to {ne.get_num_threads()} threads")

# Benchmark with different thread counts
data = np.random.random((1000000, 10))
expr = "sum(data**2 + sqrt(data), axis=1)"

for threads in [1, 2, 4, 8]:
    ne.set_num_threads(threads) 
    # Time the operation...
    result = ne.evaluate(expr, local_dict={'data': data})

System Detection

Automatically detect optimal threading configuration based on system capabilities and environment variables.

def detect_number_of_cores():
    """
    Detect the number of CPU cores available on the system.
    
    Uses platform-specific methods to determine the number of logical
    CPU cores, providing a basis for automatic thread configuration.
    
    Returns:
    int: Number of detected CPU cores
    """

def detect_number_of_threads():
    """
    DEPRECATED: Detect optimal number of threads.
    
    This function is deprecated. Use _init_num_threads() instead for
    environment-based thread initialization.
    
    Returns:
    int: Suggested thread count based on system and environment
    """

def _init_num_threads():
    """
    Initialize thread count based on environment variables.
    
    Checks environment variables in order of precedence:
    1. NUMEXPR_MAX_THREADS - maximum thread pool size
    2. NUMEXPR_NUM_THREADS - initial thread count  
    3. OMP_NUM_THREADS - OpenMP thread count
    4. Defaults to detected core count (limited to safe maximum)
    
    Returns:
    int: Initialized thread count
    """

Usage Examples:

# Detect system capabilities
cores = ne.detect_number_of_cores()
print(f"System has {cores} CPU cores")

# Initialize with environment-based settings
import os
os.environ['NUMEXPR_MAX_THREADS'] = '8'
os.environ['NUMEXPR_NUM_THREADS'] = '4'

# This happens automatically on import, but can be called manually
threads = ne._init_num_threads()
print(f"Initialized with {threads} threads")

Performance Constants

Access to system-level constants that control NumExpr's performance characteristics.

# Threading limits
MAX_THREADS: int  # Maximum number of threads supported by the C extension

# Virtual machine configuration  
__BLOCK_SIZE1__: int  # Block size used for chunking array operations

# Runtime state
ncores: int  # Number of detected CPU cores (set at import)
nthreads: int  # Current configured thread count (set at import)

Usage Examples:

print(f"Hardware threads: {ne.ncores}")
print(f"Configured threads: {ne.nthreads}")
print(f"Max supported: {ne.MAX_THREADS}")
print(f"Block size: {ne.__BLOCK_SIZE1__}")

# Ensure we don't exceed limits
desired_threads = min(16, ne.MAX_THREADS, ne.ncores)
ne.set_num_threads(desired_threads)

Environment Variable Configuration

Thread Pool Configuration

NUMEXPR_MAX_THREADS: Maximum size of the thread pool

  • Controls the upper limit for threading
  • Should be set before importing numexpr
  • Recommended: Set to number of physical cores or desired maximum

NUMEXPR_NUM_THREADS: Initial number of active threads

  • Sets the default thread count on initialization
  • Can be changed later with set_num_threads()
  • Falls back to OMP_NUM_THREADS if not set

OMP_NUM_THREADS: OpenMP-compatible thread setting

  • Used if NUMEXPR_NUM_THREADS is not set
  • Provides compatibility with other scientific libraries
  • Standard environment variable for parallel applications
# Example environment setup
export NUMEXPR_MAX_THREADS=8    # Allow up to 8 threads
export NUMEXPR_NUM_THREADS=4    # Start with 4 active threads

# Alternative using OMP standard
export OMP_NUM_THREADS=6        # Use 6 threads (if NUMEXPR_NUM_THREADS not set)

Performance Optimization Guidelines

Thread Count Selection

Optimal Thread Count:

  • Physical cores: Usually best for CPU-bound tasks
  • Leave 1-2 cores free: For system responsiveness
  • Consider hyperthreading: May or may not help depending on workload
  • Memory bandwidth: Can become limiting factor with too many threads

Array Size Considerations:

  • Small arrays (< 10KB): Use fewer threads (1-2) to avoid overhead
  • Medium arrays (10KB-1MB): Benefit from moderate threading (2-8 threads)
  • Large arrays (> 1MB): Can effectively use many threads

Platform-Specific Behavior

SPARC Systems: Automatically limited to 1 thread due to known threading issues Memory-Constrained Systems: NumExpr enforces safe limits (max 16 threads by default) NUMA Systems: Thread affinity may affect performance on multi-socket systems

Performance Monitoring

import time
import numpy as np
import numexpr as ne

def benchmark_threads(expression, data_dict, thread_counts):
    """Benchmark expression with different thread configurations."""
    results = {}
    
    for num_threads in thread_counts:
        ne.set_num_threads(num_threads)
        
        # Warm up
        ne.evaluate(expression, local_dict=data_dict)
        
        # Time multiple evaluations
        start = time.time()
        for _ in range(10):
            ne.evaluate(expression, local_dict=data_dict)
        elapsed = time.time() - start
        
        results[num_threads] = elapsed / 10
        print(f"{num_threads} threads: {elapsed/10:.4f}s per evaluation")
    
    return results

# Example usage
large_arrays = {
    'a': np.random.random(1000000),
    'b': np.random.random(1000000),
    'c': np.random.random(1000000)
}

benchmark_threads("a * b + sin(c) * exp(-a/100)", 
                 large_arrays, 
                 [1, 2, 4, 8])

Thread Safety

NumExpr operations are thread-safe in the following contexts:

  • Multiple expressions: Different threads can evaluate different expressions simultaneously
  • Shared read-only data: Multiple threads can safely read the same input arrays
  • Thread-local results: Each evaluation produces independent results

Not thread-safe:

  • Modifying global thread settings: Calls to set_num_threads() affect all threads
  • Shared output arrays: Multiple threads writing to the same output array
  • VML settings: VML configuration changes affect the entire process

Install with Tessl CLI

npx tessl i tessl/pypi-numexpr

docs

compiled-expressions.md

expression-analysis.md

expression-evaluation.md

index.md

threading-performance.md

vml-integration.md

tile.json