Tessl Tile for pypi/numexpr@2.11.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

compiled-expressions.md expression-analysis.md expression-evaluation.md index.md threading-performance.md vml-integration.md

vml-integration.mddocs/

0
# VML Integration
1

2
Integration with Intel's Vector Math Library (VML) for hardware-accelerated transcendental functions when available. VML provides optimized implementations of mathematical functions that can significantly improve performance for expressions containing trigonometric, exponential, and logarithmic operations.
3

4
## Capabilities
5

6
### VML Configuration
7

8
Control VML library behavior including accuracy modes and threading for optimal performance based on application requirements.
9

10
```python { .api }
11
def get_vml_version():
12
    """
13
    Get the VML/MKL library version information.
14
    
15
    Returns the version string of the Intel Vector Math Library or
16
    Math Kernel Library if available and linked with NumExpr.
17
    
18
    Returns:
19
    str or None: VML/MKL version string if available, None if VML not available
20
    """
21

22
def set_vml_accuracy_mode(mode):
23
    """
24
    Set the accuracy mode for VML operations.
25
    
26
    Controls the trade-off between computational speed and numerical accuracy
27
    for VML-accelerated functions. Different modes provide different guarantees
28
    about precision and performance.
29
    
30
    Parameters:
31
    - mode (str or None): Accuracy mode setting
32
        - 'high': High accuracy mode (HA), <1 least significant bit error
33
        - 'low': Low accuracy mode (LA), typically 1-2 LSB error  
34
        - 'fast': Enhanced performance mode (EP), fastest with relaxed accuracy
35
        - None: Use VML default mode settings
36
    
37
    Returns:
38
    str or None: Previous accuracy mode setting
39
    
40
    Raises:
41
    ValueError: If mode is not one of the supported values
42
    """
43
```
44

45
**Usage Examples:**
46

47
```python
48
import numexpr as ne
49
import numpy as np
50

51
# Check VML availability and version
52
if ne.use_vml:
53
    print(f"VML Version: {ne.get_vml_version()}")
54
    
55
    # Set accuracy mode for performance-critical code
56
    old_mode = ne.set_vml_accuracy_mode('fast')
57
    
58
    # Perform VML-accelerated computations
59
    x = np.linspace(0, 10, 1000000)
60
    result = ne.evaluate("sin(x) * exp(-x/5) + log(x + 1)")
61
    
62
    # Restore previous accuracy mode
63
    ne.set_vml_accuracy_mode(old_mode)
64
else:
65
    print("VML not available - using standard implementations")
66
```
67

68
### VML Threading Control
69

70
Manage threading specifically for VML operations, which may have different optimal settings than general NumExpr threading.
71

72
```python { .api }
73
def set_vml_num_threads(nthreads):
74
    """
75
    Set the number of threads for VML operations.
76
    
77
    Suggests a maximum number of threads for VML library operations.
78
    This is independent of NumExpr's general threading and allows
79
    fine-tuning of VML performance characteristics.
80
    
81
    Parameters:
82
    - nthreads (int): Number of threads for VML operations
83
    
84
    Note:
85
    This function is equivalent to mkl_domain_set_num_threads(nthreads, MKL_DOMAIN_VML)
86
    in the Intel MKL library.
87
    """
88
89
```
90

91
**Usage Examples:**
92

93
```python
94
# Configure VML threading independently
95
if ne.use_vml:
96
    # Note: get_vml_num_threads() is not available in public API
97
    print(f"Current NumExpr threads: {ne.get_num_threads()}")
98
    
99
    # Set VML to use fewer threads than NumExpr
100
    ne.set_num_threads(8)        # NumExpr uses 8 threads
101
    ne.set_vml_num_threads(4)    # VML uses 4 threads
102
    
103
    # Benchmark VML-heavy expression
104
    data = np.random.random(1000000)
105
    result = ne.evaluate("sin(data) + cos(data) + exp(data) + log(data + 1)")
106
```
107

108
### VML Feature Detection
109

110
Runtime detection of VML availability and capabilities.
111

112
```python { .api }
113
# VML availability flag
114
use_vml: bool  # True if VML support is available and enabled
115
```
116

117
**Usage Examples:**
118

119
```python
120
# Conditional logic based on VML availability
121
if ne.use_vml:
122
    # Use VML-optimized expressions
123
    expression = "sin(a) * cos(b) + exp(c) * log(d + 1)"
124
    ne.set_vml_accuracy_mode('fast')  # Prioritize speed
125
else:
126
    # Fallback to simpler expressions or warn user
127
    print("Warning: VML not available, performance may be limited")
128
    expression = "a * 0.8414 + b * 0.5403 + c * 2.718 + d * 0.693"  # Approximations
129
```
130

131
## VML-Accelerated Functions
132

133
When VML is available, the following functions receive hardware acceleration:
134

135
### Mathematical Functions
136

137
**Trigonometric Functions:**
138
- `sin`, `cos`, `tan`
139
- `arcsin`, `arccos`, `arctan`, `arctan2`
140
- `sinh`, `cosh`, `tanh`
141
- `arcsinh`, `arccosh`, `arctanh`
142

143
**Exponential and Logarithmic:**
144
- `exp`, `expm1`
145
- `log`, `log1p`, `log10`
146

147
**Power Functions:**
148
- `sqrt`
149
- `pow` (power operations)
150

151
**Other Functions:**
152
- `absolute`/`abs`
153
- `conjugate`
154
- `ceil`, `floor`
155
- `fmod`
156
- `div`, `inv` (division and inverse)
157

158
### Performance Characteristics
159

160
**Speed Improvements:**
161
- 2-10x faster for transcendental functions
162
- Greater improvements on larger arrays
163
- Optimal for Intel/AMD processors with VML support
164

165
**Accuracy Modes:**
166
- **High ('high')**: Maximum precision, ~1 ULP (Unit in Last Place) error
167
- **Low ('low')**: Good precision, 1-2 ULP error, moderate speed improvement
168
- **Fast ('fast')**: Maximum speed, relaxed precision guarantees
169

170
## Installation and Setup
171

172
### Enabling VML Support
173

174
VML support requires Intel MKL to be available during NumExpr compilation:
175

176
```bash
177
# Install NumExpr with MKL support via conda (recommended)
178
conda install numexpr
179

180
# Or compile from source with MKL
181
# 1. Install Intel MKL
182
# 2. Copy site.cfg.example to site.cfg
183
# 3. Edit site.cfg to point to MKL libraries
184
# 4. Build: python setup.py build
185
```
186

187
### Verifying VML Installation
188

189
```python
190
import numexpr as ne
191

192
# Check if VML is available
193
print(f"VML available: {ne.use_vml}")
194

195
if ne.use_vml:
196
    print(f"VML version: {ne.get_vml_version()}")
197
    # VML threading information not available via public API
198
    
199
    # Test VML acceleration
200
    import numpy as np
201
    import time
202
    
203
    x = np.random.random(1000000)
204
    
205
    # Time VML-accelerated expression
206
    start = time.time()
207
    result_vml = ne.evaluate("sin(x) + cos(x) + exp(x)")
208
    vml_time = time.time() - start
209
    
210
    # Time equivalent NumPy expression  
211
    start = time.time()
212
    result_numpy = np.sin(x) + np.cos(x) + np.exp(x)
213
    numpy_time = time.time() - start
214
    
215
    print(f"VML time: {vml_time:.4f}s")
216
    print(f"NumPy time: {numpy_time:.4f}s") 
217
    print(f"Speedup: {numpy_time/vml_time:.2f}x")
218
```
219

220
## Advanced VML Usage
221

222
### Accuracy vs Performance Tuning
223

224
```python
225
import numpy as np
226
import numexpr as ne
227

228
def benchmark_vml_modes(expression, data_dict):
229
    """Benchmark VML accuracy modes for an expression."""
230
    if not ne.use_vml:
231
        print("VML not available")
232
        return
233
        
234
    modes = ['high', 'low', 'fast']
235
    results = {}
236
    
237
    for mode in modes:
238
        ne.set_vml_accuracy_mode(mode)
239
        
240
        # Warm up
241
        ne.evaluate(expression, local_dict=data_dict)
242
        
243
        # Time multiple evaluations
244
        start = time.time()
245
        for _ in range(100):
246
            result = ne.evaluate(expression, local_dict=data_dict)
247
        elapsed = time.time() - start
248
        
249
        results[mode] = {
250
            'time': elapsed / 100,
251
            'sample_result': result[:5]  # First few values for comparison
252
        }
253
        
254
    return results
255

256
# Example usage
257
data = {'x': np.linspace(0.1, 10, 100000)}
258
results = benchmark_vml_modes("sin(x) + cos(x) + log(x)", data)
259

260
for mode, info in results.items():
261
    print(f"{mode}: {info['time']:.6f}s, sample: {info['sample_result']}")
262
```
263

264
### Hybrid Threading Strategies
265

266
```python
267
# Strategy 1: Match VML threads to NumExpr threads
268
ne.set_num_threads(4)
269
ne.set_vml_num_threads(4)
270

271
# Strategy 2: Use fewer VML threads for memory-bound operations
272
ne.set_num_threads(8)
273
ne.set_vml_num_threads(2)  # Reduce VML threading to avoid memory bandwidth limits
274

275
# Strategy 3: Disable VML threading for small arrays
276
if array_size < 10000:
277
    ne.set_vml_num_threads(1)
278
else:
279
    ne.set_vml_num_threads(ne.get_num_threads())
280
```
281

282
### Expression Optimization for VML
283

284
```python
285
# VML-friendly expression patterns
286
vml_optimized = "sin(a) * cos(b) + exp(c/10) * sqrt(d)"  # Uses VML functions
287

288
# Less VML-friendly (uses non-VML operations)
289
mixed_expression = "where(a > 0, sin(a), cos(a)) + b**3"  # where() not VML-accelerated
290

291
# Consider rewriting for better VML utilization
292
# Instead of: where(x > 0, sin(x), 0)
293
# Use: (x > 0) * sin(x)  # Better VML utilization
294
```
295

296
VML integration provides substantial performance improvements for mathematical expressions, particularly those involving transcendental functions. The key is balancing accuracy requirements with performance needs and properly configuring threading for your specific use case.

Version

Tile

Files

vml-integration.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

vml-integration.mddocs/