Python wrapper for OpenCL enabling GPU and parallel computing with comprehensive array operations and mathematical functions
npx @tessl/cli install tessl/pypi-pyopencl@2025.2.00
# PyOpenCL
1
2
PyOpenCL is a comprehensive Python wrapper for OpenCL that provides pythonic access to parallel computing capabilities on GPUs and other massively parallel devices. It offers both low-level OpenCL API access with automatic error checking and high-level convenience functions for array operations, mathematical functions, and algorithm primitives, making GPU computing accessible for scientific computing, machine learning, and high-performance applications.
3
4
## Package Information
5
6
- **Package Name**: pyopencl
7
- **Language**: Python
8
- **Installation**: `pip install pyopencl`
9
10
## Core Imports
11
12
```python
13
import pyopencl as cl
14
```
15
16
Common patterns for array operations:
17
18
```python
19
import pyopencl.array as cl_array
20
import pyopencl.clmath as clmath
21
```
22
23
For algorithm primitives:
24
25
```python
26
from pyopencl.scan import InclusiveScanKernel
27
from pyopencl.reduction import ReductionKernel
28
from pyopencl.elementwise import ElementwiseKernel
29
```
30
31
## Basic Usage
32
33
```python
34
import pyopencl as cl
35
import pyopencl.array as cl_array
36
import numpy as np
37
38
# Create OpenCL context and queue
39
ctx = cl.create_some_context()
40
queue = cl.CommandQueue(ctx)
41
42
# Create arrays on device
43
a_host = np.random.randn(50000).astype(np.float32)
44
b_host = np.random.randn(50000).astype(np.float32)
45
46
a_gpu = cl_array.to_device(queue, a_host)
47
b_gpu = cl_array.to_device(queue, b_host)
48
49
# Perform operations on GPU
50
result_gpu = a_gpu + b_gpu
51
result_host = result_gpu.get()
52
53
print(f"Result shape: {result_host.shape}")
54
print(f"First 5 elements: {result_host[:5]}")
55
```
56
57
## Architecture
58
59
PyOpenCL follows OpenCL's hierarchical structure while providing pythonic interfaces:
60
61
- **Platform/Device Management**: Discover and select compute devices (GPUs, CPUs)
62
- **Context/CommandQueue**: Execution environment and command scheduling
63
- **Memory Objects**: Buffers, images, and shared virtual memory (SVM) for data transfer
64
- **Program/Kernel**: Compile and execute OpenCL kernels on devices
65
- **Array Operations**: High-level NumPy-like interface for GPU arrays
66
- **Algorithm Primitives**: Pre-built parallel algorithms (scan, reduction, sorting)
67
68
This design enables everything from simple array operations to complex custom kernel development, serving as the foundation for GPU computing in the Python scientific ecosystem.
69
70
## Capabilities
71
72
### Core OpenCL Objects and Management
73
74
Platform discovery, device selection, context creation, command queue management, program compilation, and kernel execution. These form the foundation of OpenCL computing and provide complete control over parallel execution.
75
76
```python { .api }
77
def get_platforms(): ...
78
def create_some_context(interactive=None, answers=None): ...
79
def choose_devices(interactive=None, answers=None): ...
80
81
class Platform: ...
82
class Device: ...
83
class Context: ...
84
class CommandQueue: ...
85
class Program: ...
86
class Kernel: ...
87
```
88
89
[Core OpenCL](./core-opencl.md)
90
91
### Memory Management and Data Transfer
92
93
Buffer creation, image handling, memory mapping, and data transfer between host and device. Includes advanced shared virtual memory (SVM) support for zero-copy operations in OpenCL 2.0+.
94
95
```python { .api }
96
class Buffer: ...
97
class Image: ...
98
def create_image(context, flags, format, shape=None, pitches=None, hostbuf=None): ...
99
def enqueue_copy(queue, dest, src, **kwargs): ...
100
def enqueue_fill(queue, dest, pattern, size, *, offset=0, wait_for=None): ...
101
102
# SVM (OpenCL 2.0+)
103
class SVM: ...
104
class SVMAllocation: ...
105
def svm_empty(ctx, flags, shape, dtype, order="C", alignment=None): ...
106
def csvm_empty(ctx, shape, dtype, order="C", alignment=None): ...
107
```
108
109
[Memory Management](./memory-management.md)
110
111
### Array Operations
112
113
High-level NumPy-like GPU array interface providing familiar array operations, mathematical functions, and data manipulation. Enables seamless transition from CPU to GPU computing.
114
115
```python { .api }
116
class Array: ...
117
def to_device(queue, ary, **kwargs): ...
118
def zeros(queue, shape, dtype=float, order="C", allocator=None): ...
119
def arange(queue, *args, **kwargs): ...
120
121
def sum(a, dtype=None, queue=None, slice=None): ...
122
def dot(a_gpu, b_gpu, dtype=None, queue=None): ...
123
def concatenate(arrays, axis=0, queue=None, allocator=None): ...
124
def transpose(a_gpu, axes=None): ...
125
```
126
127
[Array Operations](./array-operations.md)
128
129
### Mathematical Functions
130
131
Comprehensive set of mathematical functions optimized for GPU execution, including trigonometric, exponential, logarithmic, and special functions that operate element-wise on arrays.
132
133
```python { .api }
134
# Trigonometric functions
135
def sin(x, queue=None): ...
136
def cos(x, queue=None): ...
137
def tan(x, queue=None): ...
138
def asin(x, queue=None): ...
139
140
# Exponential/logarithmic functions
141
def exp(x, queue=None): ...
142
def log(x, queue=None): ...
143
def sqrt(x, queue=None): ...
144
145
# Special functions
146
def erf(x, queue=None): ...
147
def tgamma(x, queue=None): ...
148
```
149
150
[Mathematical Functions](./mathematical-functions.md)
151
152
### Algorithm Primitives
153
154
Pre-built parallel algorithms including scan (prefix sum), reduction, element-wise operations, and sorting. These provide building blocks for complex parallel computations.
155
156
```python { .api }
157
class ReductionKernel: ...
158
class InclusiveScanKernel: ...
159
class ExclusiveScanKernel: ...
160
class ElementwiseKernel: ...
161
162
class RadixSort: ...
163
class BitonicSort: ...
164
```
165
166
[Algorithm Primitives](./algorithm-primitives.md)
167
168
### Random Number Generation
169
170
High-quality parallel random number generation using cryptographically secure algorithms (Philox, Threefry) suitable for Monte Carlo simulations and stochastic computations.
171
172
```python { .api }
173
class PhiloxGenerator: ...
174
class ThreefryGenerator: ...
175
176
def rand(queue, shape, dtype=float, luxury=None, generator=None): ...
177
def fill_rand(result, queue=None, luxury=None, generator=None): ...
178
```
179
180
[Random Number Generation](./random-number-generation.md)
181
182
### Tools and Utilities
183
184
Memory allocators, kernel argument handling, type management, device characterization, and debugging utilities that support efficient GPU computing and development workflows.
185
186
```python { .api }
187
class MemoryPool: ...
188
class ImmediateAllocator: ...
189
class DeferredAllocator: ...
190
191
def dtype_to_ctype(dtype): ...
192
def get_or_register_dtype(name, dtype): ...
193
194
# Device characterization
195
def has_double_support(device): ...
196
def get_simd_group_size(device, kernel): ...
197
```
198
199
[Tools and Utilities](./tools-and-utilities.md)
200
201
### OpenGL Interoperability
202
203
Integration with OpenGL for graphics/compute workflows, allowing sharing of buffers, textures, and renderbuffers between OpenGL and OpenCL contexts.
204
205
```python { .api }
206
class GLBuffer: ...
207
class GLRenderBuffer: ...
208
class GLTexture: ...
209
210
def enqueue_acquire_gl_objects(queue, mem_objects, wait_for=None): ...
211
def enqueue_release_gl_objects(queue, mem_objects, wait_for=None): ...
212
def have_gl(): ...
213
```
214
215
[OpenGL Interoperability](./opengl-interop.md)
216
217
## Error Handling
218
219
```python { .api }
220
class Error(Exception): ...
221
class MemoryError(Error): ...
222
class LogicError(Error): ...
223
class RuntimeError(Error): ...
224
```
225
226
PyOpenCL provides comprehensive error handling with automatic OpenCL error code translation to Python exceptions, enabling proper error recovery and debugging.
227
228
## Types
229
230
```python { .api }
231
# Type aliases for function signatures
232
WaitList = Sequence[Event] | None
233
KernelArg = Buffer | Array | LocalMemory | np.number | SVM
234
Allocator = Callable[[int], Buffer]
235
236
# OpenCL constants and enumerations
237
class mem_flags: ...
238
class device_type: ...
239
class command_queue_properties: ...
240
```