0
# Memory Management
1
2
Memory information including total memory, usage statistics, VRAM details, and memory error management for AMD GPU devices.
3
4
## Capabilities
5
6
### Memory Total and Usage
7
8
Get total memory and current usage for different memory types on the GPU.
9
10
```c { .api }
11
amdsmi_status_t amdsmi_get_gpu_memory_total(amdsmi_processor_handle processor_handle, amdsmi_memory_type_t mem_type, uint64_t *total);
12
amdsmi_status_t amdsmi_get_gpu_memory_usage(amdsmi_processor_handle processor_handle, amdsmi_memory_type_t mem_type, uint64_t *used);
13
```
14
15
**Parameters:**
16
- `processor_handle`: Handle to the GPU processor
17
- `mem_type`: Type of memory (VRAM, VIS_VRAM, GTT)
18
- `total`: Pointer to receive total memory amount in bytes
19
- `used`: Pointer to receive used memory amount in bytes
20
21
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
22
23
**Usage Example:**
24
25
```c
26
uint64_t total_vram, used_vram;
27
amdsmi_status_t ret;
28
29
// Get VRAM information
30
ret = amdsmi_get_gpu_memory_total(processor, AMDSMI_MEM_TYPE_VRAM, &total_vram);
31
if (ret == AMDSMI_STATUS_SUCCESS) {
32
ret = amdsmi_get_gpu_memory_usage(processor, AMDSMI_MEM_TYPE_VRAM, &used_vram);
33
if (ret == AMDSMI_STATUS_SUCCESS) {
34
printf("VRAM: %llu MB used / %llu MB total (%.1f%% usage)\n",
35
used_vram / (1024*1024), total_vram / (1024*1024),
36
(double)used_vram / total_vram * 100.0);
37
}
38
}
39
40
// Get GTT memory information
41
uint64_t total_gtt, used_gtt;
42
ret = amdsmi_get_gpu_memory_total(processor, AMDSMI_MEM_TYPE_GTT, &total_gtt);
43
ret = amdsmi_get_gpu_memory_usage(processor, AMDSMI_MEM_TYPE_GTT, &used_gtt);
44
```
45
46
### VRAM Usage Information
47
48
Get comprehensive VRAM usage information in a structured format.
49
50
```c { .api }
51
amdsmi_status_t amdsmi_get_gpu_vram_usage(amdsmi_processor_handle processor_handle, amdsmi_vram_info_t *info);
52
```
53
54
**Parameters:**
55
- `processor_handle`: Handle to the GPU processor
56
- `info`: Pointer to receive VRAM usage information
57
58
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
59
60
**Usage Example:**
61
62
```c
63
amdsmi_vram_info_t vram_info;
64
amdsmi_status_t ret = amdsmi_get_gpu_vram_usage(processor, &vram_info);
65
if (ret == AMDSMI_STATUS_SUCCESS) {
66
printf("VRAM Total: %u MB\n", vram_info.vram_total);
67
printf("VRAM Used: %u MB\n", vram_info.vram_used);
68
printf("VRAM Free: %u MB\n", vram_info.vram_total - vram_info.vram_used);
69
}
70
```
71
72
### Bad/Retired Page Information
73
74
Get information about bad or retired memory pages that are no longer usable.
75
76
```c { .api }
77
amdsmi_status_t amdsmi_get_gpu_bad_page_info(amdsmi_processor_handle processor_handle, uint32_t *num_pages, amdsmi_retired_page_record_t *info);
78
```
79
80
**Parameters:**
81
- `processor_handle`: Handle to the GPU processor
82
- `num_pages`: As input, maximum number of page records. As output, actual number available or written.
83
- `info`: Pointer to array of retired page records, or NULL to query count only
84
85
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
86
87
**Usage Example:**
88
89
```c
90
// First get the count of bad pages
91
uint32_t num_bad_pages = 0;
92
amdsmi_status_t ret = amdsmi_get_gpu_bad_page_info(processor, &num_bad_pages, NULL);
93
if (ret == AMDSMI_STATUS_SUCCESS && num_bad_pages > 0) {
94
// Allocate memory and get the page records
95
amdsmi_retired_page_record_t *bad_pages =
96
malloc(num_bad_pages * sizeof(amdsmi_retired_page_record_t));
97
98
ret = amdsmi_get_gpu_bad_page_info(processor, &num_bad_pages, bad_pages);
99
if (ret == AMDSMI_STATUS_SUCCESS) {
100
printf("Found %u bad memory pages:\n", num_bad_pages);
101
for (uint32_t i = 0; i < num_bad_pages; i++) {
102
printf(" Page %u: Address 0x%llx, Size %llu bytes, Status: %d\n",
103
i, bad_pages[i].page_address, bad_pages[i].page_size,
104
bad_pages[i].status);
105
}
106
}
107
free(bad_pages);
108
}
109
```
110
111
### Reserved Memory Pages
112
113
Get information about reserved (retired) memory pages across the system.
114
115
```c { .api }
116
amdsmi_status_t amdsmi_get_gpu_memory_reserved_pages(amdsmi_processor_handle processor_handle, uint32_t *num_pages, amdsmi_retired_page_record_t *records);
117
```
118
119
**Parameters:**
120
- `processor_handle`: Handle to the GPU processor
121
- `num_pages`: As input, maximum number of page records. As output, actual number available or written.
122
- `records`: Pointer to array of retired page records, or NULL to query count only
123
124
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
125
126
### RAS Block Features
127
128
Check if RAS (Reliability, Availability, Serviceability) features are enabled for specific GPU blocks.
129
130
```c { .api }
131
amdsmi_status_t amdsmi_get_gpu_ras_block_features_enabled(amdsmi_processor_handle processor_handle, amdsmi_gpu_block_t block, amdsmi_ras_err_state_t *state);
132
```
133
134
**Parameters:**
135
- `processor_handle`: Handle to the GPU processor
136
- `block`: GPU block to query (UMC, SDMA, GFX, etc.)
137
- `state`: Pointer to receive RAS error state
138
139
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
140
141
## Python API
142
143
### Memory Information
144
145
```python { .api }
146
def amdsmi_get_gpu_memory_total(processor_handle, mem_type):
147
"""
148
Get total memory for a specific memory type.
149
150
Args:
151
processor_handle: GPU processor handle
152
mem_type (AmdSmiMemoryType): Memory type to query
153
154
Returns:
155
int: Total memory in bytes
156
157
Raises:
158
AmdSmiException: If memory query fails
159
"""
160
161
def amdsmi_get_gpu_memory_usage(processor_handle, mem_type):
162
"""
163
Get used memory for a specific memory type.
164
165
Args:
166
processor_handle: GPU processor handle
167
mem_type (AmdSmiMemoryType): Memory type to query
168
169
Returns:
170
int: Used memory in bytes
171
172
Raises:
173
AmdSmiException: If memory query fails
174
"""
175
```
176
177
### VRAM Usage
178
179
```python { .api }
180
def amdsmi_get_gpu_vram_usage(processor_handle):
181
"""
182
Get VRAM usage information.
183
184
Args:
185
processor_handle: GPU processor handle
186
187
Returns:
188
dict: VRAM info with keys 'vram_total', 'vram_used' (in MB)
189
190
Raises:
191
AmdSmiException: If VRAM query fails
192
"""
193
```
194
195
### Bad Page Information
196
197
```python { .api }
198
def amdsmi_get_gpu_bad_page_info(processor_handle):
199
"""
200
Get information about bad/retired memory pages.
201
202
Args:
203
processor_handle: GPU processor handle
204
205
Returns:
206
list: List of bad page records, each with keys 'page_address',
207
'page_size', 'status'
208
209
Raises:
210
AmdSmiException: If bad page query fails
211
"""
212
```
213
214
**Python Usage Example:**
215
216
```python
217
import amdsmi
218
from amdsmi import AmdSmiMemoryType
219
220
# Initialize and get GPU handle
221
amdsmi.amdsmi_init()
222
223
try:
224
sockets = amdsmi.amdsmi_get_socket_handles()
225
processors = amdsmi.amdsmi_get_processor_handles(sockets[0])
226
gpu = processors[0]
227
228
# Get VRAM information using structured interface
229
vram_info = amdsmi.amdsmi_get_gpu_vram_usage(gpu)
230
print(f"VRAM: {vram_info['vram_used']} MB / {vram_info['vram_total']} MB")
231
usage_percent = (vram_info['vram_used'] / vram_info['vram_total']) * 100
232
print(f"VRAM Usage: {usage_percent:.1f}%")
233
234
# Get memory information by type
235
vram_total = amdsmi.amdsmi_get_gpu_memory_total(gpu, AmdSmiMemoryType.AMDSMI_MEM_TYPE_VRAM)
236
vram_used = amdsmi.amdsmi_get_gpu_memory_usage(gpu, AmdSmiMemoryType.AMDSMI_MEM_TYPE_VRAM)
237
print(f"VRAM (detailed): {vram_used // (1024*1024)} MB / {vram_total // (1024*1024)} MB")
238
239
# Get GTT memory information
240
gtt_total = amdsmi.amdsmi_get_gpu_memory_total(gpu, AmdSmiMemoryType.AMDSMI_MEM_TYPE_GTT)
241
gtt_used = amdsmi.amdsmi_get_gpu_memory_usage(gpu, AmdSmiMemoryType.AMDSMI_MEM_TYPE_GTT)
242
print(f"GTT Memory: {gtt_used // (1024*1024)} MB / {gtt_total // (1024*1024)} MB")
243
244
# Check for bad pages
245
bad_pages = amdsmi.amdsmi_get_gpu_bad_page_info(gpu)
246
if bad_pages:
247
print(f"Found {len(bad_pages)} bad memory pages:")
248
for i, page in enumerate(bad_pages):
249
print(f" Page {i}: Address 0x{page['page_address']:x}, "
250
f"Size {page['page_size']} bytes")
251
else:
252
print("No bad memory pages found")
253
254
finally:
255
amdsmi.amdsmi_shut_down()
256
```
257
258
## Types
259
260
### Memory Types
261
262
```c { .api }
263
typedef enum {
264
AMDSMI_MEM_TYPE_VRAM, // VRAM memory (device local)
265
AMDSMI_MEM_TYPE_VIS_VRAM, // Visible VRAM memory (CPU accessible)
266
AMDSMI_MEM_TYPE_GTT // GTT (Graphics Translation Table) memory
267
} amdsmi_memory_type_t;
268
```
269
270
### VRAM Information Structure
271
272
```c { .api }
273
typedef struct {
274
uint32_t vram_total; // Total VRAM in MB
275
uint32_t vram_used; // Used VRAM in MB
276
uint32_t reserved[2]; // Reserved for future use
277
} amdsmi_vram_info_t;
278
```
279
280
### Retired Page Record
281
282
```c { .api }
283
typedef struct {
284
uint64_t page_address; // Start address of the page
285
uint64_t page_size; // Size of the page in bytes
286
amdsmi_memory_page_status_t status; // Page status (reserved, pending, etc.)
287
} amdsmi_retired_page_record_t;
288
```
289
290
### Memory Page Status
291
292
```c { .api }
293
typedef enum {
294
AMDSMI_MEM_PAGE_STATUS_RESERVED, // Page is reserved and not available
295
AMDSMI_MEM_PAGE_STATUS_PENDING, // Page is marked bad, will be reserved
296
AMDSMI_MEM_PAGE_STATUS_UNRESERVABLE // Unable to reserve this page
297
} amdsmi_memory_page_status_t;
298
```
299
300
### GPU Blocks (for RAS)
301
302
```c { .api }
303
typedef enum {
304
AMDSMI_GPU_BLOCK_UMC = 0x0000000000000001, // UMC (Unified Memory Controller)
305
AMDSMI_GPU_BLOCK_SDMA = 0x0000000000000002, // SDMA (System DMA)
306
AMDSMI_GPU_BLOCK_GFX = 0x0000000000000004, // GFX (Graphics)
307
AMDSMI_GPU_BLOCK_MMHUB = 0x0000000000000008, // MMHUB (Multimedia Hub)
308
AMDSMI_GPU_BLOCK_ATHUB = 0x0000000000000010, // ATHUB (ATI Hub)
309
AMDSMI_GPU_BLOCK_PCIE_BIF = 0x0000000000000020, // PCIe BIF
310
AMDSMI_GPU_BLOCK_HDP = 0x0000000000000040, // HDP (Host Data Path)
311
AMDSMI_GPU_BLOCK_XGMI_WAFL = 0x0000000000000080,// XGMI
312
AMDSMI_GPU_BLOCK_DF = 0x0000000000000100, // Data Fabric
313
AMDSMI_GPU_BLOCK_SMN = 0x0000000000000200, // System Memory Network
314
AMDSMI_GPU_BLOCK_SEM = 0x0000000000000400, // SEM
315
AMDSMI_GPU_BLOCK_MP0 = 0x0000000000000800, // MP0 (Microprocessor 0)
316
AMDSMI_GPU_BLOCK_MP1 = 0x0000000000001000, // MP1 (Microprocessor 1)
317
AMDSMI_GPU_BLOCK_FUSE = 0x0000000000002000 // Fuse
318
} amdsmi_gpu_block_t;
319
```
320
321
### RAS Error States
322
323
```c { .api }
324
typedef enum {
325
AMDSMI_RAS_ERR_STATE_NONE = 0, // No current errors
326
AMDSMI_RAS_ERR_STATE_DISABLED, // ECC/RAS is disabled
327
AMDSMI_RAS_ERR_STATE_PARITY, // ECC errors present, type unknown
328
AMDSMI_RAS_ERR_STATE_SING_C, // Single correctable error
329
AMDSMI_RAS_ERR_STATE_MULT_UC, // Multiple uncorrectable errors
330
AMDSMI_RAS_ERR_STATE_POISON, // Firmware detected error, page isolated
331
AMDSMI_RAS_ERR_STATE_ENABLED // ECC/RAS is enabled
332
} amdsmi_ras_err_state_t;
333
```
334
335
## Memory Management Workflow
336
337
A typical memory monitoring workflow includes:
338
339
1. **Query Total Memory**: Use `amdsmi_get_gpu_memory_total()` to get total memory for each type
340
2. **Monitor Usage**: Use `amdsmi_get_gpu_memory_usage()` to track current memory consumption
341
3. **Check VRAM Status**: Use `amdsmi_get_gpu_vram_usage()` for structured VRAM information
342
4. **Monitor Health**: Check for bad pages with `amdsmi_get_gpu_bad_page_info()`
343
5. **Verify RAS**: Check RAS feature status for critical blocks
344
345
## Important Notes
346
347
1. **Memory Units**:
348
- `amdsmi_get_gpu_memory_*()` functions return values in bytes
349
- `amdsmi_get_gpu_vram_usage()` returns values in megabytes
350
351
2. **Memory Types**:
352
- **VRAM**: GPU's local high-speed memory
353
- **VIS_VRAM**: CPU-accessible portion of VRAM (typically smaller)
354
- **GTT**: System memory mapped for GPU access
355
356
3. **Bad Pages**: Indicate hardware problems and should be monitored in production systems
357
358
4. **RAS Features**: Reliability features that may not be available on all GPU models
359
360
5. **Virtual Machine Limitations**: Some memory management functions may have limited functionality in virtualized environments
361
362
6. **Memory Accounting**: Different memory types serve different purposes and have different performance characteristics