or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

compiled-expressions.mdexpression-analysis.mdexpression-evaluation.mdindex.mdthreading-performance.mdvml-integration.md

threading-performance.mddocs/

0

# Threading and Performance Control

1

2

Configuration of multi-threading behavior and performance optimization settings for CPU-intensive computations. NumExpr automatically parallelizes operations across available CPU cores and provides fine-grained control over threading behavior.

3

4

## Capabilities

5

6

### Thread Configuration

7

8

Control the number of threads used for NumExpr operations, balancing performance with system resource usage.

9

10

```python { .api }

11

def set_num_threads(nthreads):

12

"""

13

Set the number of threads to use for operations.

14

15

Controls the parallelization level for NumExpr computations. The

16

virtual machine distributes array chunks across the specified number

17

of threads for parallel execution.

18

19

Parameters:

20

- nthreads (int): Number of threads to use (1 to MAX_THREADS)

21

22

Returns:

23

int: Previous thread count setting

24

25

Raises:

26

ValueError: If nthreads exceeds MAX_THREADS or is less than 1

27

"""

28

29

def get_num_threads():

30

"""

31

Get the current number of threads in use for operations.

32

33

Returns:

34

int: Current thread count configuration

35

"""

36

```

37

38

**Usage Examples:**

39

40

```python

41

import numexpr as ne

42

import numpy as np

43

44

# Check current thread configuration

45

print(f"Current threads: {ne.get_num_threads()}")

46

print(f"Max threads supported: {ne.MAX_THREADS}")

47

48

# Set specific thread count

49

old_threads = ne.set_num_threads(4)

50

print(f"Changed from {old_threads} to {ne.get_num_threads()} threads")

51

52

# Benchmark with different thread counts

53

data = np.random.random((1000000, 10))

54

expr = "sum(data**2 + sqrt(data), axis=1)"

55

56

for threads in [1, 2, 4, 8]:

57

ne.set_num_threads(threads)

58

# Time the operation...

59

result = ne.evaluate(expr, local_dict={'data': data})

60

```

61

62

### System Detection

63

64

Automatically detect optimal threading configuration based on system capabilities and environment variables.

65

66

```python { .api }

67

def detect_number_of_cores():

68

"""

69

Detect the number of CPU cores available on the system.

70

71

Uses platform-specific methods to determine the number of logical

72

CPU cores, providing a basis for automatic thread configuration.

73

74

Returns:

75

int: Number of detected CPU cores

76

"""

77

78

def detect_number_of_threads():

79

"""

80

DEPRECATED: Detect optimal number of threads.

81

82

This function is deprecated. Use _init_num_threads() instead for

83

environment-based thread initialization.

84

85

Returns:

86

int: Suggested thread count based on system and environment

87

"""

88

89

def _init_num_threads():

90

"""

91

Initialize thread count based on environment variables.

92

93

Checks environment variables in order of precedence:

94

1. NUMEXPR_MAX_THREADS - maximum thread pool size

95

2. NUMEXPR_NUM_THREADS - initial thread count

96

3. OMP_NUM_THREADS - OpenMP thread count

97

4. Defaults to detected core count (limited to safe maximum)

98

99

Returns:

100

int: Initialized thread count

101

"""

102

```

103

104

**Usage Examples:**

105

106

```python

107

# Detect system capabilities

108

cores = ne.detect_number_of_cores()

109

print(f"System has {cores} CPU cores")

110

111

# Initialize with environment-based settings

112

import os

113

os.environ['NUMEXPR_MAX_THREADS'] = '8'

114

os.environ['NUMEXPR_NUM_THREADS'] = '4'

115

116

# This happens automatically on import, but can be called manually

117

threads = ne._init_num_threads()

118

print(f"Initialized with {threads} threads")

119

```

120

121

### Performance Constants

122

123

Access to system-level constants that control NumExpr's performance characteristics.

124

125

```python { .api }

126

# Threading limits

127

MAX_THREADS: int # Maximum number of threads supported by the C extension

128

129

# Virtual machine configuration

130

__BLOCK_SIZE1__: int # Block size used for chunking array operations

131

132

# Runtime state

133

ncores: int # Number of detected CPU cores (set at import)

134

nthreads: int # Current configured thread count (set at import)

135

```

136

137

**Usage Examples:**

138

139

```python

140

print(f"Hardware threads: {ne.ncores}")

141

print(f"Configured threads: {ne.nthreads}")

142

print(f"Max supported: {ne.MAX_THREADS}")

143

print(f"Block size: {ne.__BLOCK_SIZE1__}")

144

145

# Ensure we don't exceed limits

146

desired_threads = min(16, ne.MAX_THREADS, ne.ncores)

147

ne.set_num_threads(desired_threads)

148

```

149

150

## Environment Variable Configuration

151

152

### Thread Pool Configuration

153

154

**NUMEXPR_MAX_THREADS**: Maximum size of the thread pool

155

- Controls the upper limit for threading

156

- Should be set before importing numexpr

157

- Recommended: Set to number of physical cores or desired maximum

158

159

**NUMEXPR_NUM_THREADS**: Initial number of active threads

160

- Sets the default thread count on initialization

161

- Can be changed later with `set_num_threads()`

162

- Falls back to OMP_NUM_THREADS if not set

163

164

**OMP_NUM_THREADS**: OpenMP-compatible thread setting

165

- Used if NUMEXPR_NUM_THREADS is not set

166

- Provides compatibility with other scientific libraries

167

- Standard environment variable for parallel applications

168

169

```bash

170

# Example environment setup

171

export NUMEXPR_MAX_THREADS=8 # Allow up to 8 threads

172

export NUMEXPR_NUM_THREADS=4 # Start with 4 active threads

173

174

# Alternative using OMP standard

175

export OMP_NUM_THREADS=6 # Use 6 threads (if NUMEXPR_NUM_THREADS not set)

176

```

177

178

## Performance Optimization Guidelines

179

180

### Thread Count Selection

181

182

**Optimal Thread Count:**

183

- **Physical cores**: Usually best for CPU-bound tasks

184

- **Leave 1-2 cores free**: For system responsiveness

185

- **Consider hyperthreading**: May or may not help depending on workload

186

- **Memory bandwidth**: Can become limiting factor with too many threads

187

188

**Array Size Considerations:**

189

- **Small arrays (< 10KB)**: Use fewer threads (1-2) to avoid overhead

190

- **Medium arrays (10KB-1MB)**: Benefit from moderate threading (2-8 threads)

191

- **Large arrays (> 1MB)**: Can effectively use many threads

192

193

### Platform-Specific Behavior

194

195

**SPARC Systems**: Automatically limited to 1 thread due to known threading issues

196

**Memory-Constrained Systems**: NumExpr enforces safe limits (max 16 threads by default)

197

**NUMA Systems**: Thread affinity may affect performance on multi-socket systems

198

199

### Performance Monitoring

200

201

```python

202

import time

203

import numpy as np

204

import numexpr as ne

205

206

def benchmark_threads(expression, data_dict, thread_counts):

207

"""Benchmark expression with different thread configurations."""

208

results = {}

209

210

for num_threads in thread_counts:

211

ne.set_num_threads(num_threads)

212

213

# Warm up

214

ne.evaluate(expression, local_dict=data_dict)

215

216

# Time multiple evaluations

217

start = time.time()

218

for _ in range(10):

219

ne.evaluate(expression, local_dict=data_dict)

220

elapsed = time.time() - start

221

222

results[num_threads] = elapsed / 10

223

print(f"{num_threads} threads: {elapsed/10:.4f}s per evaluation")

224

225

return results

226

227

# Example usage

228

large_arrays = {

229

'a': np.random.random(1000000),

230

'b': np.random.random(1000000),

231

'c': np.random.random(1000000)

232

}

233

234

benchmark_threads("a * b + sin(c) * exp(-a/100)",

235

large_arrays,

236

[1, 2, 4, 8])

237

```

238

239

### Thread Safety

240

241

NumExpr operations are thread-safe in the following contexts:

242

- **Multiple expressions**: Different threads can evaluate different expressions simultaneously

243

- **Shared read-only data**: Multiple threads can safely read the same input arrays

244

- **Thread-local results**: Each evaluation produces independent results

245

246

**Not thread-safe:**

247

- **Modifying global thread settings**: Calls to `set_num_threads()` affect all threads

248

- **Shared output arrays**: Multiple threads writing to the same output array

249

- **VML settings**: VML configuration changes affect the entire process