0
# Performance Counters
1
2
Low-level performance counter management for detailed GPU profiling and performance analysis. Performance counters provide access to hardware-level metrics for advanced performance tuning and debugging.
3
4
## Capabilities
5
6
### Counter Group Support
7
8
Check if a specific performance counter group is supported by the GPU.
9
10
```c { .api }
11
amdsmi_status_t amdsmi_gpu_counter_group_supported(amdsmi_processor_handle processor_handle, amdsmi_event_group_t group);
12
```
13
14
**Parameters:**
15
- `processor_handle`: Handle to the GPU processor
16
- `group`: Performance counter group to check
17
18
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS if supported, error code otherwise
19
20
**Usage Example:**
21
22
```c
23
amdsmi_status_t ret = amdsmi_gpu_counter_group_supported(processor, AMDSMI_EVNT_GRP_XGMI);
24
if (ret == AMDSMI_STATUS_SUCCESS) {
25
printf("XGMI counter group is supported\n");
26
} else {
27
printf("XGMI counter group not available\n");
28
}
29
```
30
31
### Counter Creation
32
33
Create a performance counter for monitoring specific hardware events.
34
35
```c { .api }
36
amdsmi_status_t amdsmi_gpu_create_counter(amdsmi_processor_handle processor_handle, amdsmi_event_type_t type, amdsmi_event_handle_t *evnt_handle);
37
```
38
39
**Parameters:**
40
- `processor_handle`: Handle to the GPU processor
41
- `type`: Type of performance counter event to monitor
42
- `evnt_handle`: Pointer to receive the created counter handle
43
44
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
45
46
**Usage Example:**
47
48
```c
49
amdsmi_event_handle_t counter_handle;
50
amdsmi_status_t ret = amdsmi_gpu_create_counter(processor,
51
AMDSMI_EVNT_XGMI_0_NOP_TX,
52
&counter_handle);
53
if (ret == AMDSMI_STATUS_SUCCESS) {
54
printf("Counter created successfully, handle: 0x%lx\n", counter_handle);
55
}
56
```
57
58
### Counter Control
59
60
Control counter operations such as starting, stopping, and resetting measurements.
61
62
```c { .api }
63
amdsmi_status_t amdsmi_gpu_control_counter(amdsmi_event_handle_t evt_handle, amdsmi_counter_command_t cmd, void *cmd_args);
64
```
65
66
**Parameters:**
67
- `evt_handle`: Handle to the performance counter
68
- `cmd`: Counter command (START, STOP, RESET)
69
- `cmd_args`: Command-specific arguments (may be NULL)
70
71
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
72
73
**Usage Example:**
74
75
```c
76
// Start the counter
77
amdsmi_status_t ret = amdsmi_gpu_control_counter(counter_handle,
78
AMDSMI_CNTR_CMD_START,
79
NULL);
80
if (ret == AMDSMI_STATUS_SUCCESS) {
81
printf("Counter started\n");
82
83
// ... run workload ...
84
85
// Stop the counter
86
ret = amdsmi_gpu_control_counter(counter_handle,
87
AMDSMI_CNTR_CMD_STOP,
88
NULL);
89
}
90
```
91
92
### Counter Reading
93
94
Read the current value from a performance counter.
95
96
```c { .api }
97
amdsmi_status_t amdsmi_gpu_read_counter(amdsmi_event_handle_t evt_handle, amdsmi_counter_value_t *value);
98
```
99
100
**Parameters:**
101
- `evt_handle`: Handle to the performance counter
102
- `value`: Pointer to receive counter value and metadata
103
104
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
105
106
**Usage Example:**
107
108
```c
109
amdsmi_counter_value_t counter_value;
110
amdsmi_status_t ret = amdsmi_gpu_read_counter(counter_handle, &counter_value);
111
if (ret == AMDSMI_STATUS_SUCCESS) {
112
printf("Counter Value: %llu\n", counter_value.value);
113
printf("Time Enabled: %llu ns\n", counter_value.time_enabled);
114
printf("Time Running: %llu ns\n", counter_value.time_running);
115
}
116
```
117
118
### Counter Destruction
119
120
Destroy a performance counter and free associated resources.
121
122
```c { .api }
123
amdsmi_status_t amdsmi_gpu_destroy_counter(amdsmi_event_handle_t evnt_handle);
124
```
125
126
**Parameters:**
127
- `evnt_handle`: Handle to the performance counter to destroy
128
129
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
130
131
**Usage Example:**
132
133
```c
134
amdsmi_status_t ret = amdsmi_gpu_destroy_counter(counter_handle);
135
if (ret == AMDSMI_STATUS_SUCCESS) {
136
printf("Counter destroyed successfully\n");
137
}
138
```
139
140
### Available Counters Query
141
142
Get the number of available counters for a specific event group.
143
144
```c { .api }
145
amdsmi_status_t amdsmi_get_gpu_available_counters(amdsmi_processor_handle processor_handle, amdsmi_event_group_t grp, uint32_t *available);
146
```
147
148
**Parameters:**
149
- `processor_handle`: Handle to the GPU processor
150
- `grp`: Performance counter group to query
151
- `available`: Pointer to receive number of available counters
152
153
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
154
155
**Usage Example:**
156
157
```c
158
uint32_t available_counters;
159
amdsmi_status_t ret = amdsmi_get_gpu_available_counters(processor,
160
AMDSMI_EVNT_GRP_XGMI,
161
&available_counters);
162
if (ret == AMDSMI_STATUS_SUCCESS) {
163
printf("Available XGMI counters: %u\n", available_counters);
164
}
165
```
166
167
## Python API
168
169
### Counter Group Support
170
171
```python { .api }
172
def amdsmi_gpu_counter_group_supported(processor_handle, group):
173
"""
174
Check if a counter group is supported.
175
176
Args:
177
processor_handle: GPU processor handle
178
group: Counter group type from AmdSmiEventGroup
179
180
Returns:
181
bool: True if supported, False otherwise
182
183
Raises:
184
AmdSmiException: If query fails
185
"""
186
```
187
188
### Counter Management
189
190
```python { .api }
191
def amdsmi_gpu_create_counter(processor_handle, event_type):
192
"""
193
Create a performance counter.
194
195
Args:
196
processor_handle: GPU processor handle
197
event_type: Event type from AmdSmiEventType
198
199
Returns:
200
int: Counter handle
201
202
Raises:
203
AmdSmiException: If counter creation fails
204
"""
205
206
def amdsmi_gpu_control_counter(counter_handle, command, args=None):
207
"""
208
Control counter operations.
209
210
Args:
211
counter_handle (int): Counter handle
212
command: Command from AmdSmiCounterCommand
213
args: Optional command arguments
214
215
Raises:
216
AmdSmiException: If control operation fails
217
"""
218
219
def amdsmi_gpu_read_counter(counter_handle):
220
"""
221
Read counter value.
222
223
Args:
224
counter_handle (int): Counter handle
225
226
Returns:
227
dict: Counter value with keys 'value', 'time_enabled', 'time_running'
228
229
Raises:
230
AmdSmiException: If counter read fails
231
"""
232
233
def amdsmi_gpu_destroy_counter(counter_handle):
234
"""
235
Destroy a performance counter.
236
237
Args:
238
counter_handle (int): Counter handle
239
240
Raises:
241
AmdSmiException: If counter destruction fails
242
"""
243
```
244
245
**Python Usage Example:**
246
247
```python
248
import amdsmi
249
from amdsmi import AmdSmiEventGroup, AmdSmiEventType, AmdSmiCounterCommand
250
251
# Initialize and get GPU
252
amdsmi.amdsmi_init()
253
try:
254
sockets = amdsmi.amdsmi_get_socket_handles()
255
processors = amdsmi.amdsmi_get_processor_handles(sockets[0])
256
gpu = processors[0]
257
258
# Check if XGMI counters are supported
259
if amdsmi.amdsmi_gpu_counter_group_supported(gpu, AmdSmiEventGroup.XGMI):
260
print("XGMI counters supported")
261
262
# Get available counter count
263
available = amdsmi.amdsmi_get_gpu_available_counters(gpu, AmdSmiEventGroup.XGMI)
264
print(f"Available XGMI counters: {available}")
265
266
# Create a counter
267
counter = amdsmi.amdsmi_gpu_create_counter(gpu, AmdSmiEventType.XGMI_0_NOP_TX)
268
269
try:
270
# Start the counter
271
amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.START)
272
273
# ... run workload here ...
274
275
# Stop and read the counter
276
amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.STOP)
277
value = amdsmi.amdsmi_gpu_read_counter(counter)
278
279
print(f"Counter value: {value['value']}")
280
print(f"Time enabled: {value['time_enabled']} ns")
281
print(f"Time running: {value['time_running']} ns")
282
283
finally:
284
# Clean up the counter
285
amdsmi.amdsmi_gpu_destroy_counter(counter)
286
287
finally:
288
amdsmi.amdsmi_shut_down()
289
```
290
291
## Types
292
293
### Counter Value Structure
294
295
```c { .api }
296
typedef struct {
297
uint64_t value; // Counter value
298
uint64_t time_enabled; // Time counter was enabled (nanoseconds)
299
uint64_t time_running; // Time counter was actively running (nanoseconds)
300
uint64_t reserved[4]; // Reserved for future use
301
} amdsmi_counter_value_t;
302
```
303
304
### Event Groups
305
306
```c { .api }
307
typedef enum {
308
AMDSMI_EVNT_GRP_XGMI = 0, // XGMI event group
309
AMDSMI_EVNT_GRP_XGMI_DATA_OUT, // XGMI data out events
310
AMDSMI_EVNT_GRP_GMI, // GMI event group
311
AMDSMI_EVNT_GRP_INVALID = 0xFFFFFFFF // Invalid event group
312
} amdsmi_event_group_t;
313
```
314
315
### Event Types
316
317
```c { .api }
318
typedef enum {
319
AMDSMI_EVNT_XGMI_0_NOP_TX = 0, // XGMI link 0 NOP transmit
320
AMDSMI_EVNT_XGMI_0_REQ_TX, // XGMI link 0 request transmit
321
AMDSMI_EVNT_XGMI_0_RESP_TX, // XGMI link 0 response transmit
322
AMDSMI_EVNT_XGMI_0_BEATS_TX, // XGMI link 0 data beats transmit
323
AMDSMI_EVNT_XGMI_1_NOP_TX, // XGMI link 1 NOP transmit
324
AMDSMI_EVNT_XGMI_1_REQ_TX, // XGMI link 1 request transmit
325
// ... additional event types
326
AMDSMI_EVNT_LAST = 0xFFFFFFFF // Last event type marker
327
} amdsmi_event_type_t;
328
```
329
330
### Counter Commands
331
332
```c { .api }
333
typedef enum {
334
AMDSMI_CNTR_CMD_START = 0, // Start counter
335
AMDSMI_CNTR_CMD_STOP, // Stop counter
336
AMDSMI_CNTR_CMD_RESET, // Reset counter value
337
AMDSMI_CNTR_CMD_INVALID = 0xFFFFFFFF // Invalid command
338
} amdsmi_counter_command_t;
339
```
340
341
## Counter Categories
342
343
### XGMI Counters
344
Monitor traffic and performance on Infinity Fabric links between GPUs:
345
- **NOP_TX/RX**: No-operation packet transmission/reception
346
- **REQ_TX/RX**: Request packet transmission/reception
347
- **RESP_TX/RX**: Response packet transmission/reception
348
- **BEATS_TX/RX**: Data beat transmission/reception
349
350
### GMI Counters
351
Monitor Global Memory Interface activity for memory subsystem analysis.
352
353
## Important Notes
354
355
1. **Hardware Dependent**: Available counters depend on GPU generation and architecture.
356
357
2. **Resource Limits**: Limited number of counters can be active simultaneously.
358
359
3. **Precision**: Counters provide nanosecond-precision timing information.
360
361
4. **Overhead**: Performance counters introduce minimal overhead when properly managed.
362
363
5. **Group Support**: Always check group support before creating counters.
364
365
6. **Resource Cleanup**: Always destroy counters to free hardware resources.
366
367
7. **Multiplexing**: Hardware may multiplex counters when resource limits are exceeded.
368
369
8. **Root Privileges**: Some counters may require elevated privileges to access.
370
371
9. **Thread Safety**: Counter operations are thread-safe but should be coordinated across threads.
372
373
10. **Sampling**: For continuous monitoring, implement proper sampling intervals to avoid overflow.