or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

compiled-expressions.mdexpression-analysis.mdexpression-evaluation.mdindex.mdthreading-performance.mdvml-integration.md

vml-integration.mddocs/

0

# VML Integration

1

2

Integration with Intel's Vector Math Library (VML) for hardware-accelerated transcendental functions when available. VML provides optimized implementations of mathematical functions that can significantly improve performance for expressions containing trigonometric, exponential, and logarithmic operations.

3

4

## Capabilities

5

6

### VML Configuration

7

8

Control VML library behavior including accuracy modes and threading for optimal performance based on application requirements.

9

10

```python { .api }

11

def get_vml_version():

12

"""

13

Get the VML/MKL library version information.

14

15

Returns the version string of the Intel Vector Math Library or

16

Math Kernel Library if available and linked with NumExpr.

17

18

Returns:

19

str or None: VML/MKL version string if available, None if VML not available

20

"""

21

22

def set_vml_accuracy_mode(mode):

23

"""

24

Set the accuracy mode for VML operations.

25

26

Controls the trade-off between computational speed and numerical accuracy

27

for VML-accelerated functions. Different modes provide different guarantees

28

about precision and performance.

29

30

Parameters:

31

- mode (str or None): Accuracy mode setting

32

- 'high': High accuracy mode (HA), <1 least significant bit error

33

- 'low': Low accuracy mode (LA), typically 1-2 LSB error

34

- 'fast': Enhanced performance mode (EP), fastest with relaxed accuracy

35

- None: Use VML default mode settings

36

37

Returns:

38

str or None: Previous accuracy mode setting

39

40

Raises:

41

ValueError: If mode is not one of the supported values

42

"""

43

```

44

45

**Usage Examples:**

46

47

```python

48

import numexpr as ne

49

import numpy as np

50

51

# Check VML availability and version

52

if ne.use_vml:

53

print(f"VML Version: {ne.get_vml_version()}")

54

55

# Set accuracy mode for performance-critical code

56

old_mode = ne.set_vml_accuracy_mode('fast')

57

58

# Perform VML-accelerated computations

59

x = np.linspace(0, 10, 1000000)

60

result = ne.evaluate("sin(x) * exp(-x/5) + log(x + 1)")

61

62

# Restore previous accuracy mode

63

ne.set_vml_accuracy_mode(old_mode)

64

else:

65

print("VML not available - using standard implementations")

66

```

67

68

### VML Threading Control

69

70

Manage threading specifically for VML operations, which may have different optimal settings than general NumExpr threading.

71

72

```python { .api }

73

def set_vml_num_threads(nthreads):

74

"""

75

Set the number of threads for VML operations.

76

77

Suggests a maximum number of threads for VML library operations.

78

This is independent of NumExpr's general threading and allows

79

fine-tuning of VML performance characteristics.

80

81

Parameters:

82

- nthreads (int): Number of threads for VML operations

83

84

Note:

85

This function is equivalent to mkl_domain_set_num_threads(nthreads, MKL_DOMAIN_VML)

86

in the Intel MKL library.

87

"""

88

89

```

90

91

**Usage Examples:**

92

93

```python

94

# Configure VML threading independently

95

if ne.use_vml:

96

# Note: get_vml_num_threads() is not available in public API

97

print(f"Current NumExpr threads: {ne.get_num_threads()}")

98

99

# Set VML to use fewer threads than NumExpr

100

ne.set_num_threads(8) # NumExpr uses 8 threads

101

ne.set_vml_num_threads(4) # VML uses 4 threads

102

103

# Benchmark VML-heavy expression

104

data = np.random.random(1000000)

105

result = ne.evaluate("sin(data) + cos(data) + exp(data) + log(data + 1)")

106

```

107

108

### VML Feature Detection

109

110

Runtime detection of VML availability and capabilities.

111

112

```python { .api }

113

# VML availability flag

114

use_vml: bool # True if VML support is available and enabled

115

```

116

117

**Usage Examples:**

118

119

```python

120

# Conditional logic based on VML availability

121

if ne.use_vml:

122

# Use VML-optimized expressions

123

expression = "sin(a) * cos(b) + exp(c) * log(d + 1)"

124

ne.set_vml_accuracy_mode('fast') # Prioritize speed

125

else:

126

# Fallback to simpler expressions or warn user

127

print("Warning: VML not available, performance may be limited")

128

expression = "a * 0.8414 + b * 0.5403 + c * 2.718 + d * 0.693" # Approximations

129

```

130

131

## VML-Accelerated Functions

132

133

When VML is available, the following functions receive hardware acceleration:

134

135

### Mathematical Functions

136

137

**Trigonometric Functions:**

138

- `sin`, `cos`, `tan`

139

- `arcsin`, `arccos`, `arctan`, `arctan2`

140

- `sinh`, `cosh`, `tanh`

141

- `arcsinh`, `arccosh`, `arctanh`

142

143

**Exponential and Logarithmic:**

144

- `exp`, `expm1`

145

- `log`, `log1p`, `log10`

146

147

**Power Functions:**

148

- `sqrt`

149

- `pow` (power operations)

150

151

**Other Functions:**

152

- `absolute`/`abs`

153

- `conjugate`

154

- `ceil`, `floor`

155

- `fmod`

156

- `div`, `inv` (division and inverse)

157

158

### Performance Characteristics

159

160

**Speed Improvements:**

161

- 2-10x faster for transcendental functions

162

- Greater improvements on larger arrays

163

- Optimal for Intel/AMD processors with VML support

164

165

**Accuracy Modes:**

166

- **High ('high')**: Maximum precision, ~1 ULP (Unit in Last Place) error

167

- **Low ('low')**: Good precision, 1-2 ULP error, moderate speed improvement

168

- **Fast ('fast')**: Maximum speed, relaxed precision guarantees

169

170

## Installation and Setup

171

172

### Enabling VML Support

173

174

VML support requires Intel MKL to be available during NumExpr compilation:

175

176

```bash

177

# Install NumExpr with MKL support via conda (recommended)

178

conda install numexpr

179

180

# Or compile from source with MKL

181

# 1. Install Intel MKL

182

# 2. Copy site.cfg.example to site.cfg

183

# 3. Edit site.cfg to point to MKL libraries

184

# 4. Build: python setup.py build

185

```

186

187

### Verifying VML Installation

188

189

```python

190

import numexpr as ne

191

192

# Check if VML is available

193

print(f"VML available: {ne.use_vml}")

194

195

if ne.use_vml:

196

print(f"VML version: {ne.get_vml_version()}")

197

# VML threading information not available via public API

198

199

# Test VML acceleration

200

import numpy as np

201

import time

202

203

x = np.random.random(1000000)

204

205

# Time VML-accelerated expression

206

start = time.time()

207

result_vml = ne.evaluate("sin(x) + cos(x) + exp(x)")

208

vml_time = time.time() - start

209

210

# Time equivalent NumPy expression

211

start = time.time()

212

result_numpy = np.sin(x) + np.cos(x) + np.exp(x)

213

numpy_time = time.time() - start

214

215

print(f"VML time: {vml_time:.4f}s")

216

print(f"NumPy time: {numpy_time:.4f}s")

217

print(f"Speedup: {numpy_time/vml_time:.2f}x")

218

```

219

220

## Advanced VML Usage

221

222

### Accuracy vs Performance Tuning

223

224

```python

225

import numpy as np

226

import numexpr as ne

227

228

def benchmark_vml_modes(expression, data_dict):

229

"""Benchmark VML accuracy modes for an expression."""

230

if not ne.use_vml:

231

print("VML not available")

232

return

233

234

modes = ['high', 'low', 'fast']

235

results = {}

236

237

for mode in modes:

238

ne.set_vml_accuracy_mode(mode)

239

240

# Warm up

241

ne.evaluate(expression, local_dict=data_dict)

242

243

# Time multiple evaluations

244

start = time.time()

245

for _ in range(100):

246

result = ne.evaluate(expression, local_dict=data_dict)

247

elapsed = time.time() - start

248

249

results[mode] = {

250

'time': elapsed / 100,

251

'sample_result': result[:5] # First few values for comparison

252

}

253

254

return results

255

256

# Example usage

257

data = {'x': np.linspace(0.1, 10, 100000)}

258

results = benchmark_vml_modes("sin(x) + cos(x) + log(x)", data)

259

260

for mode, info in results.items():

261

print(f"{mode}: {info['time']:.6f}s, sample: {info['sample_result']}")

262

```

263

264

### Hybrid Threading Strategies

265

266

```python

267

# Strategy 1: Match VML threads to NumExpr threads

268

ne.set_num_threads(4)

269

ne.set_vml_num_threads(4)

270

271

# Strategy 2: Use fewer VML threads for memory-bound operations

272

ne.set_num_threads(8)

273

ne.set_vml_num_threads(2) # Reduce VML threading to avoid memory bandwidth limits

274

275

# Strategy 3: Disable VML threading for small arrays

276

if array_size < 10000:

277

ne.set_vml_num_threads(1)

278

else:

279

ne.set_vml_num_threads(ne.get_num_threads())

280

```

281

282

### Expression Optimization for VML

283

284

```python

285

# VML-friendly expression patterns

286

vml_optimized = "sin(a) * cos(b) + exp(c/10) * sqrt(d)" # Uses VML functions

287

288

# Less VML-friendly (uses non-VML operations)

289

mixed_expression = "where(a > 0, sin(a), cos(a)) + b**3" # where() not VML-accelerated

290

291

# Consider rewriting for better VML utilization

292

# Instead of: where(x > 0, sin(x), 0)

293

# Use: (x > 0) * sin(x) # Better VML utilization

294

```

295

296

VML integration provides substantial performance improvements for mathematical expressions, particularly those involving transcendental functions. The key is balancing accuracy requirements with performance needs and properly configuring threading for your specific use case.