0
# Threading and Performance Control
1
2
Configuration of multi-threading behavior and performance optimization settings for CPU-intensive computations. NumExpr automatically parallelizes operations across available CPU cores and provides fine-grained control over threading behavior.
3
4
## Capabilities
5
6
### Thread Configuration
7
8
Control the number of threads used for NumExpr operations, balancing performance with system resource usage.
9
10
```python { .api }
11
def set_num_threads(nthreads):
12
"""
13
Set the number of threads to use for operations.
14
15
Controls the parallelization level for NumExpr computations. The
16
virtual machine distributes array chunks across the specified number
17
of threads for parallel execution.
18
19
Parameters:
20
- nthreads (int): Number of threads to use (1 to MAX_THREADS)
21
22
Returns:
23
int: Previous thread count setting
24
25
Raises:
26
ValueError: If nthreads exceeds MAX_THREADS or is less than 1
27
"""
28
29
def get_num_threads():
30
"""
31
Get the current number of threads in use for operations.
32
33
Returns:
34
int: Current thread count configuration
35
"""
36
```
37
38
**Usage Examples:**
39
40
```python
41
import numexpr as ne
42
import numpy as np
43
44
# Check current thread configuration
45
print(f"Current threads: {ne.get_num_threads()}")
46
print(f"Max threads supported: {ne.MAX_THREADS}")
47
48
# Set specific thread count
49
old_threads = ne.set_num_threads(4)
50
print(f"Changed from {old_threads} to {ne.get_num_threads()} threads")
51
52
# Benchmark with different thread counts
53
data = np.random.random((1000000, 10))
54
expr = "sum(data**2 + sqrt(data), axis=1)"
55
56
for threads in [1, 2, 4, 8]:
57
ne.set_num_threads(threads)
58
# Time the operation...
59
result = ne.evaluate(expr, local_dict={'data': data})
60
```
61
62
### System Detection
63
64
Automatically detect optimal threading configuration based on system capabilities and environment variables.
65
66
```python { .api }
67
def detect_number_of_cores():
68
"""
69
Detect the number of CPU cores available on the system.
70
71
Uses platform-specific methods to determine the number of logical
72
CPU cores, providing a basis for automatic thread configuration.
73
74
Returns:
75
int: Number of detected CPU cores
76
"""
77
78
def detect_number_of_threads():
79
"""
80
DEPRECATED: Detect optimal number of threads.
81
82
This function is deprecated. Use _init_num_threads() instead for
83
environment-based thread initialization.
84
85
Returns:
86
int: Suggested thread count based on system and environment
87
"""
88
89
def _init_num_threads():
90
"""
91
Initialize thread count based on environment variables.
92
93
Checks environment variables in order of precedence:
94
1. NUMEXPR_MAX_THREADS - maximum thread pool size
95
2. NUMEXPR_NUM_THREADS - initial thread count
96
3. OMP_NUM_THREADS - OpenMP thread count
97
4. Defaults to detected core count (limited to safe maximum)
98
99
Returns:
100
int: Initialized thread count
101
"""
102
```
103
104
**Usage Examples:**
105
106
```python
107
# Detect system capabilities
108
cores = ne.detect_number_of_cores()
109
print(f"System has {cores} CPU cores")
110
111
# Initialize with environment-based settings
112
import os
113
os.environ['NUMEXPR_MAX_THREADS'] = '8'
114
os.environ['NUMEXPR_NUM_THREADS'] = '4'
115
116
# This happens automatically on import, but can be called manually
117
threads = ne._init_num_threads()
118
print(f"Initialized with {threads} threads")
119
```
120
121
### Performance Constants
122
123
Access to system-level constants that control NumExpr's performance characteristics.
124
125
```python { .api }
126
# Threading limits
127
MAX_THREADS: int # Maximum number of threads supported by the C extension
128
129
# Virtual machine configuration
130
__BLOCK_SIZE1__: int # Block size used for chunking array operations
131
132
# Runtime state
133
ncores: int # Number of detected CPU cores (set at import)
134
nthreads: int # Current configured thread count (set at import)
135
```
136
137
**Usage Examples:**
138
139
```python
140
print(f"Hardware threads: {ne.ncores}")
141
print(f"Configured threads: {ne.nthreads}")
142
print(f"Max supported: {ne.MAX_THREADS}")
143
print(f"Block size: {ne.__BLOCK_SIZE1__}")
144
145
# Ensure we don't exceed limits
146
desired_threads = min(16, ne.MAX_THREADS, ne.ncores)
147
ne.set_num_threads(desired_threads)
148
```
149
150
## Environment Variable Configuration
151
152
### Thread Pool Configuration
153
154
**NUMEXPR_MAX_THREADS**: Maximum size of the thread pool
155
- Controls the upper limit for threading
156
- Should be set before importing numexpr
157
- Recommended: Set to number of physical cores or desired maximum
158
159
**NUMEXPR_NUM_THREADS**: Initial number of active threads
160
- Sets the default thread count on initialization
161
- Can be changed later with `set_num_threads()`
162
- Falls back to OMP_NUM_THREADS if not set
163
164
**OMP_NUM_THREADS**: OpenMP-compatible thread setting
165
- Used if NUMEXPR_NUM_THREADS is not set
166
- Provides compatibility with other scientific libraries
167
- Standard environment variable for parallel applications
168
169
```bash
170
# Example environment setup
171
export NUMEXPR_MAX_THREADS=8 # Allow up to 8 threads
172
export NUMEXPR_NUM_THREADS=4 # Start with 4 active threads
173
174
# Alternative using OMP standard
175
export OMP_NUM_THREADS=6 # Use 6 threads (if NUMEXPR_NUM_THREADS not set)
176
```
177
178
## Performance Optimization Guidelines
179
180
### Thread Count Selection
181
182
**Optimal Thread Count:**
183
- **Physical cores**: Usually best for CPU-bound tasks
184
- **Leave 1-2 cores free**: For system responsiveness
185
- **Consider hyperthreading**: May or may not help depending on workload
186
- **Memory bandwidth**: Can become limiting factor with too many threads
187
188
**Array Size Considerations:**
189
- **Small arrays (< 10KB)**: Use fewer threads (1-2) to avoid overhead
190
- **Medium arrays (10KB-1MB)**: Benefit from moderate threading (2-8 threads)
191
- **Large arrays (> 1MB)**: Can effectively use many threads
192
193
### Platform-Specific Behavior
194
195
**SPARC Systems**: Automatically limited to 1 thread due to known threading issues
196
**Memory-Constrained Systems**: NumExpr enforces safe limits (max 16 threads by default)
197
**NUMA Systems**: Thread affinity may affect performance on multi-socket systems
198
199
### Performance Monitoring
200
201
```python
202
import time
203
import numpy as np
204
import numexpr as ne
205
206
def benchmark_threads(expression, data_dict, thread_counts):
207
"""Benchmark expression with different thread configurations."""
208
results = {}
209
210
for num_threads in thread_counts:
211
ne.set_num_threads(num_threads)
212
213
# Warm up
214
ne.evaluate(expression, local_dict=data_dict)
215
216
# Time multiple evaluations
217
start = time.time()
218
for _ in range(10):
219
ne.evaluate(expression, local_dict=data_dict)
220
elapsed = time.time() - start
221
222
results[num_threads] = elapsed / 10
223
print(f"{num_threads} threads: {elapsed/10:.4f}s per evaluation")
224
225
return results
226
227
# Example usage
228
large_arrays = {
229
'a': np.random.random(1000000),
230
'b': np.random.random(1000000),
231
'c': np.random.random(1000000)
232
}
233
234
benchmark_threads("a * b + sin(c) * exp(-a/100)",
235
large_arrays,
236
[1, 2, 4, 8])
237
```
238
239
### Thread Safety
240
241
NumExpr operations are thread-safe in the following contexts:
242
- **Multiple expressions**: Different threads can evaluate different expressions simultaneously
243
- **Shared read-only data**: Multiple threads can safely read the same input arrays
244
- **Thread-local results**: Each evaluation produces independent results
245
246
**Not thread-safe:**
247
- **Modifying global thread settings**: Calls to `set_num_threads()` affect all threads
248
- **Shared output arrays**: Multiple threads writing to the same output array
249
- **VML settings**: VML configuration changes affect the entire process