Hardware performance counter system for monitoring low-level GPU metrics including XGMI (AMD's high-speed GPU-to-GPU interconnect) traffic statistics. Performance counters provide fine-grained visibility into GPU subsystem activity for profiling and optimization.
Check if a GPU supports a specific performance counter group before attempting to use it.
def amdsmi_gpu_counter_group_supported(
processor_handle: processor_handle,
event_group: AmdSmiEventGroup
) -> None:
"""
Check if a performance counter group is supported by the GPU.
Verifies that the specified event group is available on the target GPU hardware.
This should be called before attempting to create counters from a specific group.
Parameters:
- processor_handle: Handle for the target GPU device
- event_group (AmdSmiEventGroup): The counter group to check. Valid values:
- AmdSmiEventGroup.XGMI: XGMI link activity counters
- AmdSmiEventGroup.XGMI_DATA_OUT: XGMI data transmission counters
Returns:
- None: Function returns successfully if the group is supported
Raises:
- AmdSmiParameterException: If processor_handle or event_group is invalid
- AmdSmiLibraryException: If the counter group is not supported or on query failure
Example:
```python
import amdsmi
from amdsmi import AmdSmiEventGroup
amdsmi.amdsmi_init()
device = amdsmi.amdsmi_get_processor_handles()[0]
# Check if XGMI counters are supported
try:
amdsmi.amdsmi_gpu_counter_group_supported(device, AmdSmiEventGroup.XGMI)
print("XGMI counters are supported")
except Exception as e:
print(f"XGMI counters not supported: {e}")
amdsmi.amdsmi_shut_down()
```
"""Query the number of available counters in a specific group.
def amdsmi_get_gpu_available_counters(
processor_handle: processor_handle,
event_group: AmdSmiEventGroup
) -> int:
"""
Get the number of available performance counters in a group.
Returns the count of hardware counters that can be simultaneously created
from the specified event group. This helps determine resource limits before
creating multiple counters.
Parameters:
- processor_handle: Handle for the target GPU device
- event_group (AmdSmiEventGroup): The counter group to query. Valid values:
- AmdSmiEventGroup.XGMI: XGMI link activity counters
- AmdSmiEventGroup.XGMI_DATA_OUT: XGMI data transmission counters
Returns:
- int: Number of available counters in the specified group
Raises:
- AmdSmiParameterException: If processor_handle or event_group is invalid
- AmdSmiLibraryException: On query failure
Example:
```python
import amdsmi
from amdsmi import AmdSmiEventGroup
amdsmi.amdsmi_init()
device = amdsmi.amdsmi_get_processor_handles()[0]
# Check how many XGMI counters are available
xgmi_count = amdsmi.amdsmi_get_gpu_available_counters(
device,
AmdSmiEventGroup.XGMI
)
print(f"Available XGMI counters: {xgmi_count}")
xgmi_data_count = amdsmi.amdsmi_get_gpu_available_counters(
device,
AmdSmiEventGroup.XGMI_DATA_OUT
)
print(f"Available XGMI data out counters: {xgmi_data_count}")
amdsmi.amdsmi_shut_down()
```
"""Create a performance counter for a specific event type.
def amdsmi_gpu_create_counter(
processor_handle: processor_handle,
event_type: AmdSmiEventType
) -> amdsmi_event_handle_t:
"""
Create a performance counter for monitoring a specific event.
Allocates a hardware counter resource to track the specified event type.
The returned event handle is used for all subsequent counter operations.
Counters must be destroyed with amdsmi_gpu_destroy_counter when no longer needed.
Parameters:
- processor_handle: Handle for the target GPU device
- event_type (AmdSmiEventType): The specific event to monitor. Valid values include:
- XGMI link 0 events:
- AmdSmiEventType.XGMI_0_NOP_TX: NOP transactions transmitted
- AmdSmiEventType.XGMI_0_REQUEST_TX: Request packets transmitted
- AmdSmiEventType.XGMI_0_RESPONSE_TX: Response packets transmitted
- AmdSmiEventType.XGMI_0_BEATS_TX: Data beats transmitted
- XGMI link 1 events:
- AmdSmiEventType.XGMI_1_NOP_TX: NOP transactions transmitted
- AmdSmiEventType.XGMI_1_REQUEST_TX: Request packets transmitted
- AmdSmiEventType.XGMI_1_RESPONSE_TX: Response packets transmitted
- AmdSmiEventType.XGMI_1_BEATS_TX: Data beats transmitted
- XGMI data output events (links 0-5):
- AmdSmiEventType.XGMI_DATA_OUT_0 through XGMI_DATA_OUT_5
Returns:
- amdsmi_event_handle_t: Handle to the created counter, used for control and read operations
Raises:
- AmdSmiParameterException: If processor_handle or event_type is invalid
- AmdSmiLibraryException: If counter creation fails (e.g., no available counters)
Example:
```python
import amdsmi
from amdsmi import AmdSmiEventType
amdsmi.amdsmi_init()
device = amdsmi.amdsmi_get_processor_handles()[0]
# Create a counter for XGMI link 0 request transmissions
counter = amdsmi.amdsmi_gpu_create_counter(
device,
AmdSmiEventType.XGMI_0_REQUEST_TX
)
print(f"Created counter: {counter}")
# Remember to destroy the counter when done
amdsmi.amdsmi_gpu_destroy_counter(counter)
amdsmi.amdsmi_shut_down()
```
"""Start and stop performance counter data collection.
def amdsmi_gpu_control_counter(
event_handle: amdsmi_event_handle_t,
counter_command: AmdSmiCounterCommand
) -> None:
"""
Control performance counter operation (start/stop).
Sends a control command to a performance counter to begin or end data collection.
Counters must be started before reading values and should be stopped when
collection is complete.
Parameters:
- event_handle (amdsmi_event_handle_t): Handle returned by amdsmi_gpu_create_counter
- counter_command (AmdSmiCounterCommand): Command to execute. Valid values:
- AmdSmiCounterCommand.CMD_START: Begin counter data collection
- AmdSmiCounterCommand.CMD_STOP: Stop counter data collection
Returns:
- None: Function completes successfully
Raises:
- AmdSmiParameterException: If event_handle or counter_command is invalid
- AmdSmiLibraryException: If the control operation fails
Example:
```python
import amdsmi
import time
from amdsmi import AmdSmiEventType, AmdSmiCounterCommand
amdsmi.amdsmi_init()
device = amdsmi.amdsmi_get_processor_handles()[0]
# Create and start a counter
counter = amdsmi.amdsmi_gpu_create_counter(
device,
AmdSmiEventType.XGMI_0_BEATS_TX
)
# Start collecting data
amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.CMD_START)
print("Counter started")
# Collect data for 5 seconds
time.sleep(5)
# Stop the counter
amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.CMD_STOP)
print("Counter stopped")
# Clean up
amdsmi.amdsmi_gpu_destroy_counter(counter)
amdsmi.amdsmi_shut_down()
```
"""Read the current value and timing information from a performance counter.
def amdsmi_gpu_read_counter(
event_handle: amdsmi_event_handle_t
) -> Dict[str, Any]:
"""
Read the current value from a performance counter.
Retrieves the accumulated counter value along with timing information about
how long the counter has been enabled and actively counting. This allows
calculation of average rates over time.
Parameters:
- event_handle (amdsmi_event_handle_t): Handle returned by amdsmi_gpu_create_counter
Returns:
- dict: Dictionary containing counter information:
- value (int): Accumulated counter value (event count)
- time_enabled (int): Total time counter has been enabled (nanoseconds)
- time_running (int): Total time counter has been actively counting (nanoseconds)
Raises:
- AmdSmiParameterException: If event_handle is invalid
- AmdSmiLibraryException: On read failure
Example:
```python
import amdsmi
import time
from amdsmi import AmdSmiEventType, AmdSmiCounterCommand
amdsmi.amdsmi_init()
device = amdsmi.amdsmi_get_processor_handles()[0]
counter = amdsmi.amdsmi_gpu_create_counter(
device,
AmdSmiEventType.XGMI_0_BEATS_TX
)
amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.CMD_START)
time.sleep(2)
# Read the counter
data = amdsmi.amdsmi_gpu_read_counter(counter)
print(f"Counter value: {data['value']}")
print(f"Time enabled: {data['time_enabled'] / 1e9:.3f} seconds")
print(f"Time running: {data['time_running'] / 1e9:.3f} seconds")
# Calculate average rate
if data['time_running'] > 0:
rate = data['value'] / (data['time_running'] / 1e9)
print(f"Average rate: {rate:.2f} events/second")
amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.CMD_STOP)
amdsmi.amdsmi_gpu_destroy_counter(counter)
amdsmi.amdsmi_shut_down()
```
"""Release performance counter resources.
def amdsmi_gpu_destroy_counter(event_handle: amdsmi_event_handle_t) -> None:
"""
Destroy a performance counter and release its resources.
Frees the hardware counter resource allocated by amdsmi_gpu_create_counter.
The counter should be stopped before destruction. After calling this function,
the event_handle becomes invalid and should not be used.
Parameters:
- event_handle (amdsmi_event_handle_t): Handle returned by amdsmi_gpu_create_counter
Returns:
- None: Function completes successfully
Raises:
- AmdSmiParameterException: If event_handle is invalid
- AmdSmiLibraryException: If destruction fails
Example:
```python
import amdsmi
from amdsmi import AmdSmiEventType, AmdSmiCounterCommand
amdsmi.amdsmi_init()
device = amdsmi.amdsmi_get_processor_handles()[0]
# Create and use a counter
counter = amdsmi.amdsmi_gpu_create_counter(
device,
AmdSmiEventType.XGMI_0_REQUEST_TX
)
amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.CMD_START)
# ... perform operations ...
amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.CMD_STOP)
# Clean up the counter
amdsmi.amdsmi_gpu_destroy_counter(counter)
print("Counter destroyed and resources released")
amdsmi.amdsmi_shut_down()
```
"""Event group classifications for performance counters.
class AmdSmiEventGroup(IntEnum):
"""
Performance counter event group types.
Event groups categorize related performance counters and determine
the pool of available counter resources.
Values:
- XGMI: XGMI link activity counters (transactions, requests, responses)
- XGMI_DATA_OUT: XGMI data transmission counters (outbound data volume)
- GRP_INVALID: Invalid/unsupported group
Example:
```python
from amdsmi import AmdSmiEventGroup
# Use XGMI group for link protocol counters
group = AmdSmiEventGroup.XGMI
# Use XGMI_DATA_OUT for data volume counters
data_group = AmdSmiEventGroup.XGMI_DATA_OUT
```
"""
XGMI = amdsmi_wrapper.AMDSMI_EVNT_GRP_XGMI
XGMI_DATA_OUT = amdsmi_wrapper.AMDSMI_EVNT_GRP_XGMI_DATA_OUT
GRP_INVALID = amdsmi_wrapper.AMDSMI_EVNT_GRP_INVALIDSpecific event types that can be monitored.
class AmdSmiEventType(IntEnum):
"""
Specific performance counter event types.
Defines individual hardware events that can be monitored via performance
counters. Events are organized by XGMI link and transaction type.
XGMI Link 0 Events:
- XGMI_0_NOP_TX: NOP (No Operation) transactions transmitted on link 0
- XGMI_0_REQUEST_TX: Request packets transmitted on link 0
- XGMI_0_RESPONSE_TX: Response packets transmitted on link 0
- XGMI_0_BEATS_TX: Data beats (transfer units) transmitted on link 0
XGMI Link 1 Events:
- XGMI_1_NOP_TX: NOP transactions transmitted on link 1
- XGMI_1_REQUEST_TX: Request packets transmitted on link 1
- XGMI_1_RESPONSE_TX: Response packets transmitted on link 1
- XGMI_1_BEATS_TX: Data beats transmitted on link 1
XGMI Data Output Events:
- XGMI_DATA_OUT_0 through XGMI_DATA_OUT_5: Data transmitted on links 0-5
Example:
```python
from amdsmi import AmdSmiEventType
# Monitor requests on link 0
event = AmdSmiEventType.XGMI_0_REQUEST_TX
# Monitor data volume on link 1
data_event = AmdSmiEventType.XGMI_DATA_OUT_1
```
"""
XGMI_0_NOP_TX = amdsmi_wrapper.AMDSMI_EVNT_XGMI_0_NOP_TX
XGMI_0_REQUEST_TX = amdsmi_wrapper.AMDSMI_EVNT_XGMI_0_REQUEST_TX
XGMI_0_RESPONSE_TX = amdsmi_wrapper.AMDSMI_EVNT_XGMI_0_RESPONSE_TX
XGMI_0_BEATS_TX = amdsmi_wrapper.AMDSMI_EVNT_XGMI_0_BEATS_TX
XGMI_1_NOP_TX = amdsmi_wrapper.AMDSMI_EVNT_XGMI_1_NOP_TX
XGMI_1_REQUEST_TX = amdsmi_wrapper.AMDSMI_EVNT_XGMI_1_REQUEST_TX
XGMI_1_RESPONSE_TX = amdsmi_wrapper.AMDSMI_EVNT_XGMI_1_RESPONSE_TX
XGMI_1_BEATS_TX = amdsmi_wrapper.AMDSMI_EVNT_XGMI_1_BEATS_TX
XGMI_DATA_OUT_0 = amdsmi_wrapper.AMDSMI_EVNT_XGMI_DATA_OUT_0
XGMI_DATA_OUT_1 = amdsmi_wrapper.AMDSMI_EVNT_XGMI_DATA_OUT_1
XGMI_DATA_OUT_2 = amdsmi_wrapper.AMDSMI_EVNT_XGMI_DATA_OUT_2
XGMI_DATA_OUT_3 = amdsmi_wrapper.AMDSMI_EVNT_XGMI_DATA_OUT_3
XGMI_DATA_OUT_4 = amdsmi_wrapper.AMDSMI_EVNT_XGMI_DATA_OUT_4
XGMI_DATA_OUT_5 = amdsmi_wrapper.AMDSMI_EVNT_XGMI_DATA_OUT_5Commands for controlling counter operation.
class AmdSmiCounterCommand(IntEnum):
"""
Performance counter control commands.
Commands used to start and stop performance counter data collection.
Values:
- CMD_START: Begin counter data collection
- CMD_STOP: Stop counter data collection
Example:
```python
from amdsmi import AmdSmiCounterCommand
# Start a counter
start_cmd = AmdSmiCounterCommand.CMD_START
# Stop a counter
stop_cmd = AmdSmiCounterCommand.CMD_STOP
```
"""
CMD_START = amdsmi_wrapper.AMDSMI_CNTR_CMD_START
CMD_STOP = amdsmi_wrapper.AMDSMI_CNTR_CMD_STOPThis example demonstrates the full lifecycle of performance counters, from checking support through creation, usage, and cleanup.
import amdsmi
import time
from amdsmi import (
AmdSmiEventGroup,
AmdSmiEventType,
AmdSmiCounterCommand
)
def monitor_xgmi_traffic(device, duration=5):
"""Monitor XGMI traffic on link 0 for specified duration."""
# Check if XGMI counters are supported
try:
amdsmi.amdsmi_gpu_counter_group_supported(
device,
AmdSmiEventGroup.XGMI
)
print("XGMI performance counters are supported")
except Exception as e:
print(f"XGMI counters not supported: {e}")
return
# Check available counters
available = amdsmi.amdsmi_get_gpu_available_counters(
device,
AmdSmiEventGroup.XGMI
)
print(f"Available XGMI counters: {available}")
# Create counters for different event types
counters = {}
event_types = [
("NOP_TX", AmdSmiEventType.XGMI_0_NOP_TX),
("REQUEST_TX", AmdSmiEventType.XGMI_0_REQUEST_TX),
("RESPONSE_TX", AmdSmiEventType.XGMI_0_RESPONSE_TX),
("BEATS_TX", AmdSmiEventType.XGMI_0_BEATS_TX),
]
print("\nCreating counters...")
for name, event_type in event_types:
try:
handle = amdsmi.amdsmi_gpu_create_counter(device, event_type)
counters[name] = handle
print(f" Created {name} counter")
except Exception as e:
print(f" Failed to create {name} counter: {e}")
if not counters:
print("No counters created, exiting")
return
# Start all counters
print("\nStarting counters...")
for name, handle in counters.items():
try:
amdsmi.amdsmi_gpu_control_counter(
handle,
AmdSmiCounterCommand.CMD_START
)
print(f" Started {name} counter")
except Exception as e:
print(f" Failed to start {name} counter: {e}")
# Monitor for specified duration
print(f"\nCollecting data for {duration} seconds...")
time.sleep(duration)
# Stop all counters
print("\nStopping counters...")
for name, handle in counters.items():
try:
amdsmi.amdsmi_gpu_control_counter(
handle,
AmdSmiCounterCommand.CMD_STOP
)
except Exception as e:
print(f" Failed to stop {name} counter: {e}")
# Read and display results
print("\nCounter Results:")
print("-" * 70)
for name, handle in counters.items():
try:
data = amdsmi.amdsmi_gpu_read_counter(handle)
time_sec = data['time_running'] / 1e9
rate = data['value'] / time_sec if time_sec > 0 else 0
print(f"{name:15} Value: {data['value']:15,d}")
print(f" Time enabled: {data['time_enabled'] / 1e9:10.3f} sec")
print(f" Time running: {time_sec:10.3f} sec")
print(f" Average rate: {rate:15,.2f} events/sec")
print()
except Exception as e:
print(f" Failed to read {name} counter: {e}")
# Clean up all counters
print("Cleaning up counters...")
for name, handle in counters.items():
try:
amdsmi.amdsmi_gpu_destroy_counter(handle)
print(f" Destroyed {name} counter")
except Exception as e:
print(f" Failed to destroy {name} counter: {e}")
def main():
"""Main function to demonstrate performance counter usage."""
try:
# Initialize library
amdsmi.amdsmi_init()
print("AMD SMI library initialized\n")
# Get GPU devices
devices = amdsmi.amdsmi_get_processor_handles()
if not devices:
print("No AMD GPU devices found")
return
print(f"Found {len(devices)} GPU device(s)\n")
# Monitor the first device
device = devices[0]
bdf = amdsmi.amdsmi_get_gpu_device_bdf(device)
print(f"Monitoring device: {bdf}")
print("=" * 70)
monitor_xgmi_traffic(device, duration=5)
except Exception as e:
print(f"Error: {e}")
finally:
# Cleanup
try:
amdsmi.amdsmi_shut_down()
print("\nAMD SMI library shut down")
except:
pass
if __name__ == "__main__":
main()amdsmi_get_gpu_available_counters() to determine limitsamdsmi_gpu_destroy_counter() when donetime_running for accurate rate calculationsPerformance counters primarily monitor XGMI (AMD's high-speed GPU-to-GPU interconnect):
amdsmi_get_xgmi_info() - Get XGMI configuration detailsamdsmi_get_link_metrics() - Get bandwidth and other link metricsamdsmi_topo_get_link_type() - Identify link types between processorsamdsmi_get_utilization_count() - Higher-level utilization metrics