0
# Performance Monitoring
1
2
Real-time monitoring of GPU performance metrics including activity levels, clock frequencies, power consumption, temperature measurements, and comprehensive system metrics.
3
4
## Capabilities
5
6
### GPU Activity Monitoring
7
8
Get current GPU engine utilization percentages across different processing units.
9
10
```c { .api }
11
amdsmi_status_t amdsmi_get_gpu_activity(amdsmi_processor_handle processor_handle, amdsmi_engine_usage_t *info);
12
```
13
14
**Parameters:**
15
- `processor_handle`: Handle to the GPU processor
16
- `info`: Pointer to receive engine usage information
17
18
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
19
20
**Usage Example:**
21
22
```c
23
amdsmi_engine_usage_t usage;
24
amdsmi_status_t ret = amdsmi_get_gpu_activity(processor, &usage);
25
if (ret == AMDSMI_STATUS_SUCCESS) {
26
printf("GFX Activity: %u%%\n", usage.gfx_activity);
27
printf("UMC Activity: %u%%\n", usage.umc_activity);
28
printf("MM Activity: %u%%\n", usage.mm_activity);
29
}
30
```
31
32
### Utilization Counters
33
34
Get coarse grain utilization counters that provide minimally invasive GPU usage information.
35
36
```c { .api }
37
amdsmi_status_t amdsmi_get_utilization_count(amdsmi_processor_handle processor_handle, amdsmi_utilization_counter_t utilization_counters[], uint32_t count, uint64_t *timestamp);
38
```
39
40
**Parameters:**
41
- `processor_handle`: Handle to the GPU processor
42
- `utilization_counters`: Array of utilization counter structures (caller must set types)
43
- `count`: Number of counters in the array
44
- `timestamp`: Pointer to receive timestamp of measurements
45
46
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
47
48
### Power Information
49
50
Get current power consumption and voltage measurements from the GPU.
51
52
```c { .api }
53
amdsmi_status_t amdsmi_get_power_info(amdsmi_processor_handle processor_handle, amdsmi_power_info_t *info);
54
```
55
56
**Parameters:**
57
- `processor_handle`: Handle to the GPU processor
58
- `info`: Pointer to receive power information
59
60
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
61
62
**Usage Example:**
63
64
```c
65
amdsmi_power_info_t power_info;
66
amdsmi_status_t ret = amdsmi_get_power_info(processor, &power_info);
67
if (ret == AMDSMI_STATUS_SUCCESS) {
68
printf("Socket Power: %u W\n", power_info.average_socket_power);
69
printf("GFX Voltage: %u mV\n", power_info.gfx_voltage);
70
printf("SOC Voltage: %u mV\n", power_info.soc_voltage);
71
printf("Memory Voltage: %u mV\n", power_info.mem_voltage);
72
printf("Power Limit: %u W\n", power_info.power_limit);
73
}
74
```
75
76
### Energy Accumulator
77
78
Get energy consumption accumulator with high precision measurements.
79
80
```c { .api }
81
amdsmi_status_t amdsmi_get_energy_count(amdsmi_processor_handle processor_handle, uint64_t *power, float *counter_resolution, uint64_t *timestamp);
82
```
83
84
**Parameters:**
85
- `processor_handle`: Handle to the GPU processor
86
- `power`: Pointer to receive energy counter value
87
- `counter_resolution`: Pointer to receive counter resolution in micro Joules
88
- `timestamp`: Pointer to receive timestamp with 1ns resolution
89
90
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
91
92
### Temperature Monitoring
93
94
Get temperature measurements from various sensors on the GPU.
95
96
```c { .api }
97
amdsmi_status_t amdsmi_get_temp_metric(amdsmi_processor_handle processor_handle, amdsmi_temperature_type_t sensor_type, amdsmi_temperature_metric_t metric, int64_t *temperature);
98
```
99
100
**Parameters:**
101
- `processor_handle`: Handle to the GPU processor
102
- `sensor_type`: Type of temperature sensor (EDGE, JUNCTION, VRAM, HBM, etc.)
103
- `metric`: Temperature metric to retrieve (CURRENT, MAX, MIN, CRITICAL, etc.)
104
- `temperature`: Pointer to receive temperature in millidegrees Celsius
105
106
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
107
108
**Usage Example:**
109
110
```c
111
int64_t temperature;
112
amdsmi_status_t ret = amdsmi_get_temp_metric(processor,
113
TEMPERATURE_TYPE_EDGE,
114
AMDSMI_TEMP_CURRENT,
115
&temperature);
116
if (ret == AMDSMI_STATUS_SUCCESS) {
117
printf("Edge Temperature: %ld mC (%.1f C)\n", temperature, temperature / 1000.0);
118
}
119
120
// Get junction temperature maximum
121
ret = amdsmi_get_temp_metric(processor,
122
TEMPERATURE_TYPE_JUNCTION,
123
AMDSMI_TEMP_MAX,
124
&temperature);
125
```
126
127
### Voltage Monitoring
128
129
Get voltage measurements from GPU voltage rails.
130
131
```c { .api }
132
amdsmi_status_t amdsmi_get_gpu_volt_metric(amdsmi_processor_handle processor_handle, amdsmi_voltage_type_t sensor_type, amdsmi_voltage_metric_t metric, int64_t *voltage);
133
```
134
135
**Parameters:**
136
- `processor_handle`: Handle to the GPU processor
137
- `sensor_type`: Type of voltage sensor
138
- `metric`: Voltage metric to retrieve (CURRENT, MAX, MIN, etc.)
139
- `voltage`: Pointer to receive voltage in millivolts
140
141
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
142
143
### Clock Frequency Information
144
145
Get available clock frequencies and current frequency selection for different clock domains.
146
147
```c { .api }
148
amdsmi_status_t amdsmi_get_clk_freq(amdsmi_processor_handle processor_handle, amdsmi_clk_type_t clk_type, amdsmi_frequencies_t *f);
149
```
150
151
**Parameters:**
152
- `processor_handle`: Handle to the GPU processor
153
- `clk_type`: Type of clock (SYS/GFX, MEM, SOC, etc.)
154
- `f`: Pointer to receive frequency information
155
156
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
157
158
### Clock Measurements
159
160
Get real-time clock measurements averaged over 1 second.
161
162
```c { .api }
163
amdsmi_status_t amdsmi_get_clock_info(amdsmi_processor_handle processor_handle, amdsmi_clk_type_t clk_type, amdsmi_clk_info_t *info);
164
```
165
166
**Parameters:**
167
- `processor_handle`: Handle to the GPU processor
168
- `clk_type`: Type of clock to measure
169
- `info`: Pointer to receive clock measurement information
170
171
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
172
173
**Usage Example:**
174
175
```c
176
amdsmi_clk_info_t clk_info;
177
amdsmi_status_t ret = amdsmi_get_clock_info(processor, CLK_TYPE_GFX, &clk_info);
178
if (ret == AMDSMI_STATUS_SUCCESS) {
179
printf("GFX Clock - Current: %u MHz, Min: %u MHz, Max: %u MHz\n",
180
clk_info.cur_clk, clk_info.min_clk, clk_info.max_clk);
181
}
182
```
183
184
### Comprehensive GPU Metrics
185
186
Get comprehensive GPU metrics structure with extensive telemetry data.
187
188
```c { .api }
189
amdsmi_status_t amdsmi_get_gpu_metrics_info(amdsmi_processor_handle processor_handle, amdsmi_gpu_metrics_t *pgpu_metrics);
190
```
191
192
**Parameters:**
193
- `processor_handle`: Handle to the GPU processor
194
- `pgpu_metrics`: Pointer to receive comprehensive GPU metrics
195
196
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
197
198
### Fan Speed Monitoring
199
200
Monitor fan speeds in RPMs and as percentage of maximum speed.
201
202
```c { .api }
203
amdsmi_status_t amdsmi_get_gpu_fan_rpms(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, int64_t *speed);
204
amdsmi_status_t amdsmi_get_gpu_fan_speed(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, int64_t *speed);
205
amdsmi_status_t amdsmi_get_gpu_fan_speed_max(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, uint64_t *max_speed);
206
```
207
208
**Parameters:**
209
- `processor_handle`: Handle to the GPU processor
210
- `sensor_ind`: 0-based sensor index (usually 0)
211
- `speed`: Pointer to receive fan speed (RPMs or 0-255 scale)
212
- `max_speed`: Pointer to receive maximum fan speed
213
214
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
215
216
## Python API
217
218
### GPU Activity
219
220
```python { .api }
221
def amdsmi_get_gpu_activity(processor_handle):
222
"""
223
Get GPU engine activity percentages.
224
225
Args:
226
processor_handle: GPU processor handle
227
228
Returns:
229
dict: Engine usage with keys 'gfx_activity', 'umc_activity', 'mm_activity'
230
231
Raises:
232
AmdSmiException: If activity query fails
233
"""
234
```
235
236
### Power Information
237
238
```python { .api }
239
def amdsmi_get_power_info(processor_handle):
240
"""
241
Get GPU power and voltage information.
242
243
Args:
244
processor_handle: GPU processor handle
245
246
Returns:
247
dict: Power info with keys 'average_socket_power', 'gfx_voltage',
248
'soc_voltage', 'mem_voltage', 'power_limit'
249
250
Raises:
251
AmdSmiException: If power query fails
252
"""
253
```
254
255
### Temperature Monitoring
256
257
```python { .api }
258
def amdsmi_get_temp_metric(processor_handle, sensor_type, metric):
259
"""
260
Get temperature measurement from GPU sensor.
261
262
Args:
263
processor_handle: GPU processor handle
264
sensor_type (AmdSmiTemperatureType): Temperature sensor type
265
metric (AmdSmiTemperatureMetric): Temperature metric to retrieve
266
267
Returns:
268
int: Temperature in millidegrees Celsius
269
270
Raises:
271
AmdSmiException: If temperature query fails
272
"""
273
```
274
275
### Clock Information
276
277
```python { .api }
278
def amdsmi_get_clock_info(processor_handle, clk_type):
279
"""
280
Get real-time clock measurements.
281
282
Args:
283
processor_handle: GPU processor handle
284
clk_type (AmdSmiClkType): Clock type to measure
285
286
Returns:
287
dict: Clock info with keys 'cur_clk', 'min_clk', 'max_clk'
288
289
Raises:
290
AmdSmiException: If clock query fails
291
"""
292
```
293
294
**Python Usage Example:**
295
296
```python
297
import amdsmi
298
from amdsmi import AmdSmiTemperatureType, AmdSmiTemperatureMetric, AmdSmiClkType
299
300
# Initialize and get GPU handle
301
amdsmi.amdsmi_init()
302
303
try:
304
sockets = amdsmi.amdsmi_get_socket_handles()
305
processors = amdsmi.amdsmi_get_processor_handles(sockets[0])
306
gpu = processors[0] # First GPU
307
308
# Get GPU activity
309
activity = amdsmi.amdsmi_get_gpu_activity(gpu)
310
print(f"GPU Usage: GFX={activity['gfx_activity']}%, "
311
f"Memory={activity['umc_activity']}%, MM={activity['mm_activity']}%")
312
313
# Get power information
314
power = amdsmi.amdsmi_get_power_info(gpu)
315
print(f"Power: {power['average_socket_power']}W, "
316
f"GFX Voltage: {power['gfx_voltage']}mV")
317
318
# Get temperature
319
temp = amdsmi.amdsmi_get_temp_metric(gpu,
320
AmdSmiTemperatureType.TEMPERATURE_TYPE_EDGE,
321
AmdSmiTemperatureMetric.AMDSMI_TEMP_CURRENT)
322
print(f"GPU Temperature: {temp/1000:.1f}°C")
323
324
# Get clock information
325
gfx_clk = amdsmi.amdsmi_get_clock_info(gpu, AmdSmiClkType.CLK_TYPE_GFX)
326
print(f"GFX Clock: {gfx_clk['cur_clk']} MHz "
327
f"(Range: {gfx_clk['min_clk']}-{gfx_clk['max_clk']} MHz)")
328
329
finally:
330
amdsmi.amdsmi_shut_down()
331
```
332
333
## Types
334
335
### Engine Usage Structure
336
337
```c { .api }
338
typedef struct {
339
uint32_t gfx_activity; // GFX engine activity percentage (0-100)
340
uint32_t umc_activity; // Memory controller activity percentage (0-100)
341
uint32_t mm_activity; // Multimedia engine activity percentage (0-100)
342
uint32_t reserved[13]; // Reserved for future use
343
} amdsmi_engine_usage_t;
344
```
345
346
### Power Information Structure
347
348
```c { .api }
349
typedef struct {
350
uint32_t average_socket_power; // Average socket power in Watts
351
uint32_t gfx_voltage; // GFX voltage in millivolts
352
uint32_t soc_voltage; // SOC voltage in millivolts
353
uint32_t mem_voltage; // Memory voltage in millivolts
354
uint32_t power_limit; // Power limit in Watts
355
uint32_t reserved[11]; // Reserved for future use
356
} amdsmi_power_info_t;
357
```
358
359
### Clock Information Structure
360
361
```c { .api }
362
typedef struct {
363
uint32_t cur_clk; // Current clock frequency in MHz
364
uint32_t min_clk; // Minimum clock frequency in MHz
365
uint32_t max_clk; // Maximum clock frequency in MHz
366
uint32_t reserved[5]; // Reserved for future use
367
} amdsmi_clk_info_t;
368
```
369
370
### Frequency Information Structure
371
372
```c { .api }
373
typedef struct {
374
uint32_t num_supported; // Number of supported frequencies
375
uint32_t current; // Current frequency index
376
uint64_t frequency[AMDSMI_MAX_NUM_FREQUENCIES]; // Array of available frequencies in Hz
377
} amdsmi_frequencies_t;
378
```
379
380
### Utilization Counter Structure
381
382
```c { .api }
383
typedef struct {
384
AMDSMI_UTILIZATION_COUNTER_TYPE type; // Counter type
385
uint64_t value; // Counter value
386
} amdsmi_utilization_counter_t;
387
```
388
389
### Temperature Types
390
391
```c { .api }
392
typedef enum {
393
TEMPERATURE_TYPE_EDGE, // Edge temperature sensor
394
TEMPERATURE_TYPE_JUNCTION, // Junction temperature sensor
395
TEMPERATURE_TYPE_VRAM, // VRAM temperature sensor
396
TEMPERATURE_TYPE_HBM_0, // HBM stack 0 temperature
397
TEMPERATURE_TYPE_HBM_1, // HBM stack 1 temperature
398
TEMPERATURE_TYPE_HBM_2, // HBM stack 2 temperature
399
TEMPERATURE_TYPE_HBM_3, // HBM stack 3 temperature
400
TEMPERATURE_TYPE_PLX // PLX temperature sensor
401
} amdsmi_temperature_type_t;
402
```
403
404
### Temperature Metrics
405
406
```c { .api }
407
typedef enum {
408
AMDSMI_TEMP_CURRENT, // Current temperature
409
AMDSMI_TEMP_MAX, // Maximum temperature
410
AMDSMI_TEMP_MIN, // Minimum temperature
411
AMDSMI_TEMP_MAX_HYST, // Maximum temperature hysteresis
412
AMDSMI_TEMP_MIN_HYST, // Minimum temperature hysteresis
413
AMDSMI_TEMP_CRITICAL, // Critical temperature threshold
414
AMDSMI_TEMP_CRITICAL_HYST, // Critical temperature hysteresis
415
AMDSMI_TEMP_EMERGENCY, // Emergency temperature threshold
416
AMDSMI_TEMP_EMERGENCY_HYST, // Emergency temperature hysteresis
417
AMDSMI_TEMP_CRIT_MIN, // Critical minimum temperature
418
AMDSMI_TEMP_CRIT_MIN_HYST, // Critical minimum hysteresis
419
AMDSMI_TEMP_OFFSET, // Temperature offset
420
AMDSMI_TEMP_LOWEST, // Historical minimum temperature
421
AMDSMI_TEMP_HIGHEST // Historical maximum temperature
422
} amdsmi_temperature_metric_t;
423
```
424
425
### Clock Types
426
427
```c { .api }
428
typedef enum {
429
CLK_TYPE_SYS = 0x0, // System/GFX clock
430
CLK_TYPE_GFX = CLK_TYPE_SYS,
431
CLK_TYPE_DF, // Data Fabric clock
432
CLK_TYPE_DCEF, // Display Controller Engine clock
433
CLK_TYPE_SOC, // SOC clock
434
CLK_TYPE_MEM, // Memory clock
435
CLK_TYPE_PCIE, // PCIe clock
436
CLK_TYPE_VCLK0, // Video clock 0
437
CLK_TYPE_VCLK1, // Video clock 1
438
CLK_TYPE_DCLK0, // Display clock 0
439
CLK_TYPE_DCLK1 // Display clock 1
440
} amdsmi_clk_type_t;
441
```
442
443
## Constants
444
445
```c { .api }
446
#define AMDSMI_MAX_NUM_FREQUENCIES 32 // Maximum supported frequencies
447
#define AMDSMI_MAX_FAN_SPEED 255 // Maximum fan speed value
448
#define AMDSMI_NUM_HBM_INSTANCES 4 // Number of HBM instances
449
```
450
451
## Important Notes
452
453
1. **Virtual Machine Limitations**: Many monitoring functions are not supported on virtual machine guests.
454
455
2. **Sensor Availability**: Not all sensors may be available on all GPU models. Functions will return appropriate error codes for unavailable sensors.
456
457
3. **Units**: Temperature values are in millidegrees Celsius, voltages in millivolts, power in watts, and frequencies in MHz or Hz as specified.
458
459
4. **Sampling Rates**: Some metrics are averaged over time periods (e.g., 1 second for clock measurements).
460
461
5. **Precision**: Energy counters provide high-precision accumulation while power measurements provide current instantaneous values.
462
463
6. **Multi-Sensor Support**: Some GPUs have multiple sensors of the same type, accessed via the `sensor_ind` parameter.