0
# VML Integration
1
2
Integration with Intel's Vector Math Library (VML) for hardware-accelerated transcendental functions when available. VML provides optimized implementations of mathematical functions that can significantly improve performance for expressions containing trigonometric, exponential, and logarithmic operations.
3
4
## Capabilities
5
6
### VML Configuration
7
8
Control VML library behavior including accuracy modes and threading for optimal performance based on application requirements.
9
10
```python { .api }
11
def get_vml_version():
12
"""
13
Get the VML/MKL library version information.
14
15
Returns the version string of the Intel Vector Math Library or
16
Math Kernel Library if available and linked with NumExpr.
17
18
Returns:
19
str or None: VML/MKL version string if available, None if VML not available
20
"""
21
22
def set_vml_accuracy_mode(mode):
23
"""
24
Set the accuracy mode for VML operations.
25
26
Controls the trade-off between computational speed and numerical accuracy
27
for VML-accelerated functions. Different modes provide different guarantees
28
about precision and performance.
29
30
Parameters:
31
- mode (str or None): Accuracy mode setting
32
- 'high': High accuracy mode (HA), <1 least significant bit error
33
- 'low': Low accuracy mode (LA), typically 1-2 LSB error
34
- 'fast': Enhanced performance mode (EP), fastest with relaxed accuracy
35
- None: Use VML default mode settings
36
37
Returns:
38
str or None: Previous accuracy mode setting
39
40
Raises:
41
ValueError: If mode is not one of the supported values
42
"""
43
```
44
45
**Usage Examples:**
46
47
```python
48
import numexpr as ne
49
import numpy as np
50
51
# Check VML availability and version
52
if ne.use_vml:
53
print(f"VML Version: {ne.get_vml_version()}")
54
55
# Set accuracy mode for performance-critical code
56
old_mode = ne.set_vml_accuracy_mode('fast')
57
58
# Perform VML-accelerated computations
59
x = np.linspace(0, 10, 1000000)
60
result = ne.evaluate("sin(x) * exp(-x/5) + log(x + 1)")
61
62
# Restore previous accuracy mode
63
ne.set_vml_accuracy_mode(old_mode)
64
else:
65
print("VML not available - using standard implementations")
66
```
67
68
### VML Threading Control
69
70
Manage threading specifically for VML operations, which may have different optimal settings than general NumExpr threading.
71
72
```python { .api }
73
def set_vml_num_threads(nthreads):
74
"""
75
Set the number of threads for VML operations.
76
77
Suggests a maximum number of threads for VML library operations.
78
This is independent of NumExpr's general threading and allows
79
fine-tuning of VML performance characteristics.
80
81
Parameters:
82
- nthreads (int): Number of threads for VML operations
83
84
Note:
85
This function is equivalent to mkl_domain_set_num_threads(nthreads, MKL_DOMAIN_VML)
86
in the Intel MKL library.
87
"""
88
89
```
90
91
**Usage Examples:**
92
93
```python
94
# Configure VML threading independently
95
if ne.use_vml:
96
# Note: get_vml_num_threads() is not available in public API
97
print(f"Current NumExpr threads: {ne.get_num_threads()}")
98
99
# Set VML to use fewer threads than NumExpr
100
ne.set_num_threads(8) # NumExpr uses 8 threads
101
ne.set_vml_num_threads(4) # VML uses 4 threads
102
103
# Benchmark VML-heavy expression
104
data = np.random.random(1000000)
105
result = ne.evaluate("sin(data) + cos(data) + exp(data) + log(data + 1)")
106
```
107
108
### VML Feature Detection
109
110
Runtime detection of VML availability and capabilities.
111
112
```python { .api }
113
# VML availability flag
114
use_vml: bool # True if VML support is available and enabled
115
```
116
117
**Usage Examples:**
118
119
```python
120
# Conditional logic based on VML availability
121
if ne.use_vml:
122
# Use VML-optimized expressions
123
expression = "sin(a) * cos(b) + exp(c) * log(d + 1)"
124
ne.set_vml_accuracy_mode('fast') # Prioritize speed
125
else:
126
# Fallback to simpler expressions or warn user
127
print("Warning: VML not available, performance may be limited")
128
expression = "a * 0.8414 + b * 0.5403 + c * 2.718 + d * 0.693" # Approximations
129
```
130
131
## VML-Accelerated Functions
132
133
When VML is available, the following functions receive hardware acceleration:
134
135
### Mathematical Functions
136
137
**Trigonometric Functions:**
138
- `sin`, `cos`, `tan`
139
- `arcsin`, `arccos`, `arctan`, `arctan2`
140
- `sinh`, `cosh`, `tanh`
141
- `arcsinh`, `arccosh`, `arctanh`
142
143
**Exponential and Logarithmic:**
144
- `exp`, `expm1`
145
- `log`, `log1p`, `log10`
146
147
**Power Functions:**
148
- `sqrt`
149
- `pow` (power operations)
150
151
**Other Functions:**
152
- `absolute`/`abs`
153
- `conjugate`
154
- `ceil`, `floor`
155
- `fmod`
156
- `div`, `inv` (division and inverse)
157
158
### Performance Characteristics
159
160
**Speed Improvements:**
161
- 2-10x faster for transcendental functions
162
- Greater improvements on larger arrays
163
- Optimal for Intel/AMD processors with VML support
164
165
**Accuracy Modes:**
166
- **High ('high')**: Maximum precision, ~1 ULP (Unit in Last Place) error
167
- **Low ('low')**: Good precision, 1-2 ULP error, moderate speed improvement
168
- **Fast ('fast')**: Maximum speed, relaxed precision guarantees
169
170
## Installation and Setup
171
172
### Enabling VML Support
173
174
VML support requires Intel MKL to be available during NumExpr compilation:
175
176
```bash
177
# Install NumExpr with MKL support via conda (recommended)
178
conda install numexpr
179
180
# Or compile from source with MKL
181
# 1. Install Intel MKL
182
# 2. Copy site.cfg.example to site.cfg
183
# 3. Edit site.cfg to point to MKL libraries
184
# 4. Build: python setup.py build
185
```
186
187
### Verifying VML Installation
188
189
```python
190
import numexpr as ne
191
192
# Check if VML is available
193
print(f"VML available: {ne.use_vml}")
194
195
if ne.use_vml:
196
print(f"VML version: {ne.get_vml_version()}")
197
# VML threading information not available via public API
198
199
# Test VML acceleration
200
import numpy as np
201
import time
202
203
x = np.random.random(1000000)
204
205
# Time VML-accelerated expression
206
start = time.time()
207
result_vml = ne.evaluate("sin(x) + cos(x) + exp(x)")
208
vml_time = time.time() - start
209
210
# Time equivalent NumPy expression
211
start = time.time()
212
result_numpy = np.sin(x) + np.cos(x) + np.exp(x)
213
numpy_time = time.time() - start
214
215
print(f"VML time: {vml_time:.4f}s")
216
print(f"NumPy time: {numpy_time:.4f}s")
217
print(f"Speedup: {numpy_time/vml_time:.2f}x")
218
```
219
220
## Advanced VML Usage
221
222
### Accuracy vs Performance Tuning
223
224
```python
225
import numpy as np
226
import numexpr as ne
227
228
def benchmark_vml_modes(expression, data_dict):
229
"""Benchmark VML accuracy modes for an expression."""
230
if not ne.use_vml:
231
print("VML not available")
232
return
233
234
modes = ['high', 'low', 'fast']
235
results = {}
236
237
for mode in modes:
238
ne.set_vml_accuracy_mode(mode)
239
240
# Warm up
241
ne.evaluate(expression, local_dict=data_dict)
242
243
# Time multiple evaluations
244
start = time.time()
245
for _ in range(100):
246
result = ne.evaluate(expression, local_dict=data_dict)
247
elapsed = time.time() - start
248
249
results[mode] = {
250
'time': elapsed / 100,
251
'sample_result': result[:5] # First few values for comparison
252
}
253
254
return results
255
256
# Example usage
257
data = {'x': np.linspace(0.1, 10, 100000)}
258
results = benchmark_vml_modes("sin(x) + cos(x) + log(x)", data)
259
260
for mode, info in results.items():
261
print(f"{mode}: {info['time']:.6f}s, sample: {info['sample_result']}")
262
```
263
264
### Hybrid Threading Strategies
265
266
```python
267
# Strategy 1: Match VML threads to NumExpr threads
268
ne.set_num_threads(4)
269
ne.set_vml_num_threads(4)
270
271
# Strategy 2: Use fewer VML threads for memory-bound operations
272
ne.set_num_threads(8)
273
ne.set_vml_num_threads(2) # Reduce VML threading to avoid memory bandwidth limits
274
275
# Strategy 3: Disable VML threading for small arrays
276
if array_size < 10000:
277
ne.set_vml_num_threads(1)
278
else:
279
ne.set_vml_num_threads(ne.get_num_threads())
280
```
281
282
### Expression Optimization for VML
283
284
```python
285
# VML-friendly expression patterns
286
vml_optimized = "sin(a) * cos(b) + exp(c/10) * sqrt(d)" # Uses VML functions
287
288
# Less VML-friendly (uses non-VML operations)
289
mixed_expression = "where(a > 0, sin(a), cos(a)) + b**3" # where() not VML-accelerated
290
291
# Consider rewriting for better VML utilization
292
# Instead of: where(x > 0, sin(x), 0)
293
# Use: (x > 0) * sin(x) # Better VML utilization
294
```
295
296
VML integration provides substantial performance improvements for mathematical expressions, particularly those involving transcendental functions. The key is balancing accuracy requirements with performance needs and properly configuring threading for your specific use case.