0
# AMD SMI Library
1
2
The AMD System Management Interface (AMD SMI) Library is a comprehensive C/C++ library with Python bindings that provides user-space applications the ability to monitor and control AMD GPU devices on Linux systems. It offers socket and device handle abstractions for better hardware representation, supports querying device information like temperature, power consumption, and performance metrics, and includes comprehensive device management capabilities.
3
4
## Package Information
5
6
- **Package Name**: amdsmi
7
- **Language**: C/C++ (primary), Python (bindings)
8
- **Installation**: Available as part of ROCm installation or built from source
9
- **Supported Platforms**: Linux bare metal and Linux virtual machine guest for AMD GPUs
10
11
## Core Imports
12
13
### C/C++
14
15
```c
16
#include "amd_smi/amdsmi.h"
17
```
18
19
### Python
20
21
```python
22
import amdsmi
23
```
24
25
For specific functionality imports:
26
27
```python
28
from amdsmi import (
29
amdsmi_init, amdsmi_shut_down,
30
amdsmi_get_socket_handles, amdsmi_get_processor_handles,
31
amdsmi_get_gpu_activity, amdsmi_get_power_info
32
)
33
```
34
35
## Basic Usage
36
37
### C/C++ Example
38
39
```c
40
#include <iostream>
41
#include <vector>
42
#include "amd_smi/amdsmi.h"
43
44
int main() {
45
amdsmi_status_t ret;
46
uint32_t socket_count = 0;
47
48
// Initialize AMD SMI for GPUs only
49
ret = amdsmi_init(AMDSMI_INIT_AMD_GPUS);
50
if (ret != AMDSMI_STATUS_SUCCESS) {
51
return 1;
52
}
53
54
// Get socket count
55
ret = amdsmi_get_socket_handles(&socket_count, nullptr);
56
std::vector<amdsmi_socket_handle> sockets(socket_count);
57
ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]);
58
59
// Get devices for first socket
60
uint32_t device_count = 0;
61
ret = amdsmi_get_processor_handles(sockets[0], &device_count, nullptr);
62
std::vector<amdsmi_processor_handle> devices(device_count);
63
ret = amdsmi_get_processor_handles(sockets[0], &device_count, &devices[0]);
64
65
// Get temperature for first device
66
int64_t temperature = 0;
67
ret = amdsmi_get_temp_metric(devices[0], TEMPERATURE_TYPE_EDGE,
68
AMDSMI_TEMP_CURRENT, &temperature);
69
std::cout << "GPU Temperature: " << temperature << "C" << std::endl;
70
71
// Cleanup
72
amdsmi_shut_down();
73
return 0;
74
}
75
```
76
77
### Python Example
78
79
```python
80
import amdsmi
81
82
# Initialize the library
83
amdsmi.amdsmi_init()
84
85
try:
86
# Get socket handles
87
sockets = amdsmi.amdsmi_get_socket_handles()
88
89
if sockets:
90
# Get processor handles for first socket
91
processors = amdsmi.amdsmi_get_processor_handles(sockets[0])
92
93
if processors:
94
# Get GPU activity information
95
activity = amdsmi.amdsmi_get_gpu_activity(processors[0])
96
print(f"GFX Activity: {activity.gfx_activity}%")
97
98
# Get power information
99
power_info = amdsmi.amdsmi_get_power_info(processors[0])
100
print(f"Socket Power: {power_info.average_socket_power}W")
101
102
finally:
103
# Always shut down the library
104
amdsmi.amdsmi_shut_down()
105
```
106
107
## Architecture
108
109
The AMD SMI Library uses a hierarchical device representation:
110
111
- **Sockets**: Physical hardware sockets that can contain multiple processors
112
- **Processors**: Individual processing units (GPUs, CPUs) within a socket
113
- **Handles**: Opaque references to sockets and processors used throughout the API
114
- **Initialization Flags**: Control which processor types are discovered and monitored
115
116
This design enables the library to provide a unified interface for mixed-processor systems while maintaining efficient resource management and clear hardware topology representation.
117
118
## Capabilities
119
120
### Library Management
121
122
Core library initialization, shutdown, and version management functions that must be called before using other AMD SMI functionality.
123
124
```c { .api }
125
amdsmi_status_t amdsmi_init(uint64_t init_flags);
126
amdsmi_status_t amdsmi_shut_down(void);
127
amdsmi_status_t amdsmi_get_lib_version(amdsmi_version_t *version);
128
```
129
130
[Library Management](./library-management.md)
131
132
### Device Discovery
133
134
Functions for discovering and identifying AMD processors, sockets, and their properties in the system.
135
136
```c { .api }
137
amdsmi_status_t amdsmi_get_socket_handles(uint32_t *socket_count, amdsmi_socket_handle *socket_handles);
138
amdsmi_status_t amdsmi_get_processor_handles(amdsmi_socket_handle socket_handle, uint32_t *processor_count, amdsmi_processor_handle *processor_handles);
139
amdsmi_status_t amdsmi_get_processor_type(amdsmi_processor_handle processor_handle, processor_type_t *processor_type);
140
amdsmi_status_t amdsmi_get_gpu_device_uuid(amdsmi_processor_handle processor_handle, unsigned int *uuid_length, char *uuid);
141
```
142
143
[Device Discovery](./device-discovery.md)
144
145
### Hardware Information
146
147
Static hardware information including ASIC details, board information, firmware versions, and driver information.
148
149
```c { .api }
150
amdsmi_status_t amdsmi_get_gpu_asic_info(amdsmi_processor_handle processor_handle, amdsmi_asic_info_t *info);
151
amdsmi_status_t amdsmi_get_gpu_board_info(amdsmi_processor_handle processor_handle, amdsmi_board_info_t *info);
152
amdsmi_status_t amdsmi_get_fw_info(amdsmi_processor_handle processor_handle, amdsmi_fw_info_t *info);
153
amdsmi_status_t amdsmi_get_gpu_driver_version(amdsmi_processor_handle processor_handle, int *length, char *version);
154
```
155
156
[Hardware Information](./hardware-information.md)
157
158
### Performance Monitoring
159
160
Real-time monitoring of GPU performance metrics including activity levels, clock frequencies, power consumption, and temperature measurements.
161
162
```c { .api }
163
amdsmi_status_t amdsmi_get_gpu_activity(amdsmi_processor_handle processor_handle, amdsmi_engine_usage_t *info);
164
amdsmi_status_t amdsmi_get_power_info(amdsmi_processor_handle processor_handle, amdsmi_power_info_t *info);
165
amdsmi_status_t amdsmi_get_temp_metric(amdsmi_processor_handle processor_handle, amdsmi_temperature_type_t sensor_type, amdsmi_temperature_metric_t metric, int64_t *temperature);
166
amdsmi_status_t amdsmi_get_clk_freq(amdsmi_processor_handle processor_handle, amdsmi_clk_type_t clk_type, amdsmi_frequencies_t *f);
167
```
168
169
[Performance Monitoring](./performance-monitoring.md)
170
171
### Memory Management
172
173
Memory information including total memory, usage statistics, VRAM details, and memory error management.
174
175
```c { .api }
176
amdsmi_status_t amdsmi_get_gpu_memory_total(amdsmi_processor_handle processor_handle, amdsmi_memory_type_t mem_type, uint64_t *total);
177
amdsmi_status_t amdsmi_get_gpu_memory_usage(amdsmi_processor_handle processor_handle, amdsmi_memory_type_t mem_type, uint64_t *used);
178
amdsmi_status_t amdsmi_get_gpu_vram_usage(amdsmi_processor_handle processor_handle, amdsmi_vram_info_t *info);
179
amdsmi_status_t amdsmi_get_gpu_bad_page_info(amdsmi_processor_handle processor_handle, uint32_t *num_pages, amdsmi_retired_page_record_t *info);
180
```
181
182
[Memory Management](./memory-management.md)
183
184
### PCIe and Connectivity
185
186
PCIe interface monitoring, bandwidth management, topology discovery, and multi-GPU connectivity features.
187
188
```c { .api }
189
amdsmi_status_t amdsmi_get_gpu_pci_bandwidth(amdsmi_processor_handle processor_handle, amdsmi_pcie_bandwidth_t *bandwidth);
190
amdsmi_status_t amdsmi_get_gpu_pci_throughput(amdsmi_processor_handle processor_handle, uint64_t *sent, uint64_t *received, uint64_t *max_pkt_sz);
191
amdsmi_status_t amdsmi_get_gpu_topo_numa_affinity(amdsmi_processor_handle processor_handle, uint32_t *numa_node);
192
amdsmi_status_t amdsmi_topo_get_link_type(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *hops, AMDSMI_IO_LINK_TYPE *type);
193
```
194
195
[PCIe and Connectivity](./pcie-connectivity.md)
196
197
### Performance Control
198
199
Advanced performance tuning including clock control, power management, fan control, and overclocking capabilities. Note: Many control functions require root privileges and are not supported in virtual environments.
200
201
```c { .api }
202
amdsmi_status_t amdsmi_set_gpu_perf_level(amdsmi_processor_handle processor_handle, amdsmi_dev_perf_level_t perf_lvl);
203
amdsmi_status_t amdsmi_set_power_cap(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, uint64_t cap);
204
amdsmi_status_t amdsmi_set_gpu_fan_speed(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, uint64_t speed);
205
amdsmi_status_t amdsmi_set_clk_freq(amdsmi_processor_handle processor_handle, amdsmi_clk_type_t clk_type, uint64_t freq_bitmask);
206
```
207
208
[Performance Control](./performance-control.md)
209
210
### Error Handling and RAS
211
212
Error detection, RAS (Reliability, Availability, Serviceability) features, ECC error monitoring, and comprehensive error reporting.
213
214
```c { .api }
215
amdsmi_status_t amdsmi_get_gpu_ecc_count(amdsmi_processor_handle processor_handle, amdsmi_gpu_block_t block, amdsmi_error_count_t *ec);
216
amdsmi_status_t amdsmi_get_gpu_ecc_enabled(amdsmi_processor_handle processor_handle, uint64_t *enabled_blocks);
217
amdsmi_status_t amdsmi_status_code_to_string(amdsmi_status_t status, const char **status_string);
218
amdsmi_status_t amdsmi_get_gpu_ras_block_features_enabled(amdsmi_processor_handle processor_handle, amdsmi_gpu_block_t block, amdsmi_ras_err_state_t *state);
219
```
220
221
[Error Handling and RAS](./error-handling-ras.md)
222
223
### Process and System Information
224
225
Process monitoring, system-level GPU usage information, and multi-process GPU utilization tracking.
226
227
```c { .api }
228
amdsmi_status_t amdsmi_get_gpu_process_list(amdsmi_processor_handle processor_handle, uint32_t *max_processes, amdsmi_process_handle_t *list);
229
amdsmi_status_t amdsmi_get_gpu_process_info(amdsmi_processor_handle processor_handle, amdsmi_process_handle_t process, amdsmi_proc_info_t *info);
230
amdsmi_status_t amdsmi_get_gpu_compute_process_info(amdsmi_process_info_t *procs, uint32_t *num_items);
231
```
232
233
[Process and System Information](./process-system-info.md)
234
235
### Event Monitoring
236
237
Asynchronous event notification system for GPU state changes, thermal events, and error conditions.
238
239
```c { .api }
240
amdsmi_status_t amdsmi_init_gpu_event_notification(amdsmi_processor_handle processor_handle);
241
amdsmi_status_t amdsmi_set_gpu_event_notification_mask(amdsmi_processor_handle processor_handle, uint64_t mask);
242
amdsmi_status_t amdsmi_get_gpu_event_notification(int timeout_ms, uint32_t *num_elem, amdsmi_evt_notification_data_t *data);
243
amdsmi_status_t amdsmi_stop_gpu_event_notification(amdsmi_processor_handle processor_handle);
244
```
245
246
[Event Monitoring](./event-monitoring.md)
247
248
### Performance Counters
249
250
Low-level performance counter management for detailed GPU profiling and performance analysis.
251
252
```c { .api }
253
amdsmi_status_t amdsmi_gpu_counter_group_supported(amdsmi_processor_handle processor_handle, amdsmi_event_group_t group);
254
amdsmi_status_t amdsmi_gpu_create_counter(amdsmi_processor_handle processor_handle, amdsmi_event_type_t type, amdsmi_event_handle_t *evnt_handle);
255
amdsmi_status_t amdsmi_gpu_destroy_counter(amdsmi_event_handle_t evnt_handle);
256
amdsmi_status_t amdsmi_gpu_control_counter(amdsmi_event_handle_t evt_handle, amdsmi_counter_command_t cmd, void *cmd_args);
257
amdsmi_status_t amdsmi_gpu_read_counter(amdsmi_event_handle_t evt_handle, amdsmi_counter_value_t *value);
258
```
259
260
[Performance Counters](./performance-counters.md)
261
262
## Core Types
263
264
### Handle Types
265
266
```c { .api }
267
typedef void *amdsmi_socket_handle;
268
typedef void *amdsmi_processor_handle;
269
typedef uint32_t amdsmi_process_handle_t;
270
typedef uintptr_t amdsmi_event_handle_t;
271
```
272
273
### Status and Initialization
274
275
```c { .api }
276
typedef enum {
277
AMDSMI_STATUS_SUCCESS = 0,
278
AMDSMI_STATUS_INVAL = 1,
279
AMDSMI_STATUS_NOT_SUPPORTED = 2,
280
AMDSMI_STATUS_NOT_YET_IMPLEMENTED = 3,
281
AMDSMI_STATUS_FAIL_LOAD_MODULE = 4,
282
AMDSMI_STATUS_FAIL_LOAD_SYMBOL = 5,
283
// ... additional error codes
284
AMDSMI_STATUS_UNKNOWN_ERROR = 0xFFFFFFFF
285
} amdsmi_status_t;
286
287
typedef enum {
288
AMDSMI_INIT_ALL_PROCESSORS = 0x0,
289
AMDSMI_INIT_AMD_CPUS = (1 << 0),
290
AMDSMI_INIT_AMD_GPUS = (1 << 1),
291
AMDSMI_INIT_NON_AMD_CPUS = (1 << 2),
292
AMDSMI_INIT_NON_AMD_GPUS = (1 << 3)
293
} amdsmi_init_flags_t;
294
295
typedef enum {
296
UNKNOWN = 0,
297
AMD_GPU,
298
AMD_CPU,
299
NON_AMD_GPU,
300
NON_AMD_CPU
301
} processor_type_t;
302
```
303
304
### Version Information
305
306
```c { .api }
307
typedef struct {
308
uint32_t year;
309
uint32_t major;
310
uint32_t minor;
311
uint32_t release;
312
const char *build;
313
} amdsmi_version_t;
314
```