or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cuda-core.mddevice-memory.mddriver-api.mdgpu-direct-storage.mdindex.mdjit-compilation.mdkernels-streams.mdlibrary-management.mdruntime-compilation.md

device-memory.mddocs/

0

# Device and Memory Management

1

2

Essential CUDA device enumeration, selection, and memory allocation operations including unified memory, streams, and events for efficient GPU resource management. This module provides the foundational operations for working with CUDA devices and managing memory across CPU and GPU address spaces.

3

4

## Capabilities

5

6

### Device Information and Selection

7

8

Query available CUDA devices, select active devices, and retrieve device properties for optimal resource allocation.

9

10

```python { .api }

11

def cudaGetDeviceCount() -> int:

12

"""

13

Get the number of CUDA-capable devices.

14

15

Returns:

16

int: Number of available CUDA devices

17

18

Raises:

19

cudaError_t: If CUDA driver/runtime error occurs

20

"""

21

22

def cudaSetDevice(device: int) -> None:

23

"""

24

Set the current CUDA device for subsequent operations.

25

26

Args:

27

device (int): Device ID (0-based index)

28

29

Raises:

30

cudaError_t: If device ID is invalid or device not available

31

"""

32

33

def cudaGetDevice() -> int:

34

"""

35

Get the currently selected CUDA device.

36

37

Returns:

38

int: Currently active device ID

39

"""

40

41

def cudaDeviceReset() -> None:

42

"""

43

Reset the current CUDA device and destroy all associated contexts.

44

45

Note:

46

This function should be called to ensure clean shutdown

47

"""

48

49

def cudaDeviceSynchronize() -> None:

50

"""

51

Wait for all operations on the current device to complete.

52

53

Note:

54

Blocks until all preceding operations complete

55

"""

56

57

def cudaGetErrorString(error: cudaError_t) -> str:

58

"""

59

Get a descriptive string for a CUDA error code.

60

61

Args:

62

error (cudaError_t): CUDA error code

63

64

Returns:

65

str: Human-readable error description

66

"""

67

```

68

69

### Device Properties and Attributes

70

71

Retrieve detailed device capabilities and specifications for performance optimization.

72

73

```python { .api }

74

def cudaGetDeviceProperties(device: int) -> cudaDeviceProp:

75

"""

76

Get comprehensive properties of a CUDA device.

77

78

Args:

79

device (int): Device ID to query

80

81

Returns:

82

cudaDeviceProp: Device properties structure

83

"""

84

85

def cudaDeviceGetAttribute(attr: cudaDeviceAttr, device: int) -> int:

86

"""

87

Get a specific attribute value for a CUDA device.

88

89

Args:

90

attr (cudaDeviceAttr): Attribute to query

91

device (int): Device ID

92

93

Returns:

94

int: Attribute value

95

"""

96

```

97

98

### Memory Allocation

99

100

Allocate memory on both device (GPU) and host (CPU) with various allocation strategies.

101

102

```python { .api }

103

def cudaMalloc(size: int) -> int:

104

"""

105

Allocate memory on the CUDA device.

106

107

Args:

108

size (int): Number of bytes to allocate

109

110

Returns:

111

int: Device memory pointer (as integer address)

112

113

Raises:

114

cudaError_t: If allocation fails (e.g., out of memory)

115

"""

116

117

def cudaMallocHost(size: int) -> int:

118

"""

119

Allocate page-locked (pinned) host memory.

120

121

Args:

122

size (int): Number of bytes to allocate

123

124

Returns:

125

int: Host memory pointer (as integer address)

126

127

Note:

128

Pinned memory enables faster host-device transfers

129

"""

130

131

def cudaMallocManaged(size: int, flags: int = 0) -> int:

132

"""

133

Allocate unified memory accessible from both CPU and GPU.

134

135

Args:

136

size (int): Number of bytes to allocate

137

flags (int): Allocation flags (optional)

138

139

Returns:

140

int: Unified memory pointer

141

"""

142

143

def cudaHostAlloc(size: int, flags: int) -> int:

144

"""

145

Allocate host memory with specific allocation flags.

146

147

Args:

148

size (int): Number of bytes to allocate

149

flags (int): Allocation flags (cudaHostAllocDefault, etc.)

150

151

Returns:

152

int: Host memory pointer

153

"""

154

```

155

156

### Memory Deallocation

157

158

Free allocated memory resources on both device and host.

159

160

```python { .api }

161

def cudaFree(devPtr: int) -> None:

162

"""

163

Free device memory allocated with cudaMalloc.

164

165

Args:

166

devPtr (int): Device pointer to free

167

"""

168

169

def cudaFreeHost(ptr: int) -> None:

170

"""

171

Free host memory allocated with cudaMallocHost or cudaHostAlloc.

172

173

Args:

174

ptr (int): Host pointer to free

175

"""

176

177

def cudaHostUnregister(ptr: int) -> None:

178

"""

179

Unregister previously registered host memory.

180

181

Args:

182

ptr (int): Host pointer to unregister

183

"""

184

```

185

186

### Memory Transfer Operations

187

188

Copy data between host and device memory with various transfer directions and modes.

189

190

```python { .api }

191

def cudaMemcpy(dst, src, count: int, kind: cudaMemcpyKind) -> None:

192

"""

193

Copy memory between host and device synchronously.

194

195

Args:

196

dst: Destination pointer

197

src: Source pointer

198

count (int): Number of bytes to copy

199

kind (cudaMemcpyKind): Copy direction

200

201

Note:

202

Blocks until copy completes

203

"""

204

205

def cudaMemcpyAsync(dst, src, count: int, kind: cudaMemcpyKind, stream: int) -> None:

206

"""

207

Copy memory between host and device asynchronously.

208

209

Args:

210

dst: Destination pointer

211

src: Source pointer

212

count (int): Number of bytes to copy

213

kind (cudaMemcpyKind): Copy direction

214

stream (int): CUDA stream for asynchronous execution

215

"""

216

217

def cudaMemset(devPtr: int, value: int, count: int) -> None:

218

"""

219

Set device memory to a specific value.

220

221

Args:

222

devPtr (int): Device pointer

223

value (int): Value to set (0-255)

224

count (int): Number of bytes to set

225

"""

226

227

def cudaMemsetAsync(devPtr: int, value: int, count: int, stream: int) -> None:

228

"""

229

Set device memory to a specific value asynchronously.

230

231

Args:

232

devPtr (int): Device pointer

233

value (int): Value to set (0-255)

234

count (int): Number of bytes to set

235

stream (int): CUDA stream for asynchronous execution

236

"""

237

```

238

239

### Memory Information

240

241

Query memory usage and availability on CUDA devices.

242

243

```python { .api }

244

def cudaMemGetInfo() -> tuple:

245

"""

246

Get memory information for the current device.

247

248

Returns:

249

tuple[int, int]: (free_memory, total_memory) in bytes

250

"""

251

252

def cudaPointerGetAttributes(ptr: int) -> cudaPointerAttributes:

253

"""

254

Get attributes of a memory pointer.

255

256

Args:

257

ptr (int): Memory pointer to query

258

259

Returns:

260

cudaPointerAttributes: Pointer attributes structure

261

"""

262

```

263

264

## Types

265

266

### Memory Copy Directions

267

268

```python { .api }

269

class cudaMemcpyKind:

270

"""Memory copy direction enumeration"""

271

cudaMemcpyHostToHost: int

272

cudaMemcpyHostToDevice: int

273

cudaMemcpyDeviceToHost: int

274

cudaMemcpyDeviceToDevice: int

275

cudaMemcpyDefault: int # Infer direction automatically

276

```

277

278

### Device Attributes

279

280

```python { .api }

281

class cudaDeviceAttr:

282

"""CUDA device attribute enumeration"""

283

cudaDevAttrMaxThreadsPerBlock: int

284

cudaDevAttrMaxBlockDimX: int

285

cudaDevAttrMaxBlockDimY: int

286

cudaDevAttrMaxBlockDimZ: int

287

cudaDevAttrMaxGridDimX: int

288

cudaDevAttrMaxGridDimY: int

289

cudaDevAttrMaxGridDimZ: int

290

cudaDevAttrMaxSharedMemoryPerBlock: int

291

cudaDevAttrTotalConstantMemory: int

292

cudaDevAttrWarpSize: int

293

cudaDevAttrMaxPitch: int

294

cudaDevAttrMultiProcessorCount: int

295

cudaDevAttrClockRate: int

296

cudaDevAttrMemoryClockRate: int

297

cudaDevAttrMemoryBusWidth: int

298

```

299

300

### Host Allocation Flags

301

302

```python { .api }

303

# Host allocation flag constants

304

cudaHostAllocDefault: int # Default page-locked allocation

305

cudaHostAllocPortable: int # Portable across CUDA contexts

306

cudaHostAllocMapped: int # Map allocation into device address space

307

cudaHostAllocWriteCombined: int # Write-combined memory

308

```

309

310

### Device Properties Structure

311

312

```python { .api }

313

class cudaDeviceProp:

314

"""CUDA device properties structure"""

315

name: str # Device name

316

totalGlobalMem: int # Global memory size in bytes

317

sharedMemPerBlock: int # Shared memory per block

318

regsPerBlock: int # Registers per block

319

warpSize: int # Warp size

320

memPitch: int # Maximum pitch in bytes

321

maxThreadsPerBlock: int # Maximum threads per block

322

maxThreadsDim: tuple # Maximum block dimensions

323

maxGridSize: tuple # Maximum grid dimensions

324

clockRate: int # Clock frequency in kHz

325

totalConstMem: int # Constant memory size

326

major: int # Compute capability major version

327

minor: int # Compute capability minor version

328

multiProcessorCount: int # Number of SMs

329

```

330

331

### Pointer Attributes

332

333

```python { .api }

334

class cudaPointerAttributes:

335

"""Memory pointer attributes structure"""

336

type: int # Memory type (host, device, managed)

337

device: int # Device where pointer resides

338

devicePointer: int # Device pointer value

339

hostPointer: int # Host pointer value

340

```

341

342

## Usage Examples

343

344

### Basic Device Management

345

346

```python

347

from cuda.bindings import runtime

348

349

# Check available devices

350

device_count = runtime.cudaGetDeviceCount()

351

print(f"Found {device_count} CUDA devices")

352

353

# Select and query device

354

runtime.cudaSetDevice(0)

355

current_device = runtime.cudaGetDevice()

356

print(f"Using device {current_device}")

357

358

# Get device properties

359

props = runtime.cudaGetDeviceProperties(0)

360

print(f"Device: {props.name}")

361

print(f"Compute Capability: {props.major}.{props.minor}")

362

print(f"Global Memory: {props.totalGlobalMem // (1024**3)} GB")

363

```

364

365

### Memory Operations

366

367

```python

368

from cuda.bindings import runtime

369

370

# Allocate memory

371

size = 1024 * 1024 # 1MB

372

device_ptr = runtime.cudaMalloc(size)

373

host_ptr = runtime.cudaMallocHost(size)

374

375

# Transfer data

376

runtime.cudaMemcpy(

377

device_ptr, host_ptr, size,

378

runtime.cudaMemcpyKind.cudaMemcpyHostToDevice

379

)

380

381

# Check memory usage

382

free_mem, total_mem = runtime.cudaMemGetInfo()

383

print(f"Free: {free_mem // (1024**2)} MB")

384

print(f"Total: {total_mem // (1024**2)} MB")

385

386

# Cleanup

387

runtime.cudaFree(device_ptr)

388

runtime.cudaFreeHost(host_ptr)

389

```