or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

examples

edge-cases.mdreal-world-scenarios.md
index.md
tile.json

tessl/pypi-amdsmi

Python library for monitoring and managing AMD GPUs and CPUs with programmatic hardware metrics access

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/amdsmi@7.0.x

To install, run

npx @tessl/cli install tessl/pypi-amdsmi@7.0.0

index.mddocs/

AMDSMI - AMD System Management Interface

The AMD System Management Interface (AMDSMI) Python library provides comprehensive monitoring and management capabilities for AMD GPUs and CPUs. It offers programmatic access to hardware metrics including GPU utilization, temperature, power consumption, memory usage, clock frequencies, and extensive CPU monitoring through Python bindings wrapping the native AMD SMI C library.

Package Information

  • Package Name: amdsmi
  • Language: Python
  • Installation: pip install amdsmi
  • Requirements: Python >= 3.6, AMD hardware with ROCm support
  • Version: 7.0.2

Quick Start

import amdsmi

# Initialize library for GPU monitoring
amdsmi.amdsmi_init(amdsmi.AmdSmiInitFlags.INIT_AMD_GPUS)

try:
    # Get list of GPU devices
    devices = amdsmi.amdsmi_get_processor_handles()

    # Query GPU information
    for device in devices:
        # Get ASIC information
        asic_info = amdsmi.amdsmi_get_gpu_asic_info(device)
        print(f"GPU: {asic_info['market_name']}")

        # Get temperature
        temp = amdsmi.amdsmi_get_temp_metric(
            device,
            amdsmi.AmdSmiTemperatureType.EDGE,
            amdsmi.AmdSmiTemperatureMetric.CURRENT
        )
        print(f"Temperature: {temp} C")

        # Get GPU activity
        activity = amdsmi.amdsmi_get_gpu_activity(device)
        print(f"GFX Activity: {activity['gfx_activity']}%")

        # Get power consumption
        power = amdsmi.amdsmi_get_power_info(device)
        print(f"Power: {power['current_socket_power']} W")

        # Get VRAM usage
        vram = amdsmi.amdsmi_get_gpu_vram_usage(device)
        print(f"VRAM Used: {vram['vram_used']} bytes")

finally:
    # Always clean up
    amdsmi.amdsmi_shut_down()

For detailed getting started instructions, see the Quick Start Guide.

Core Concepts

Architecture

The AMDSMI library is organized around these key components:

  • Processor Handles: Opaque references to hardware devices (GPUs, CPUs) used for all device operations
  • Initialization Flags: Control which device types to initialize (GPUs, CPUs, APUs)
  • Exception System: Comprehensive error handling with specific exception types for different error conditions
  • Enum Types: Type-safe constants for device types, clock types, temperature sensors, and other hardware parameters
  • C Library Wrapper: Low-level ctypes bindings to the native AMD SMI C library

Supported Hardware

  • AMD GPUs: Comprehensive monitoring and control for all ROCm-compatible GPUs
  • AMD CPUs: ESMI-based CPU monitoring (requires ESMI library support)
  • Multiple Devices: Simultaneous monitoring of multiple GPUs/CPUs
  • Hot-plugging: Dynamic device discovery

Capabilities Overview

Library Initialization and Lifecycle

Initialize and shut down the AMDSMI library. Must be called before using any other functions.

Key Functions:

  • amdsmi_init(flag) - Initialize the library
  • amdsmi_shut_down() - Shut down and release resources

Detailed Reference

Device Discovery

Discover and identify available processors (GPUs and CPUs), retrieve device handles, and query device enumeration information.

Key Functions:

  • amdsmi_get_processor_handles() - Get all processor handles
  • amdsmi_get_processor_type(handle) - Get processor type
  • amdsmi_get_gpu_device_bdf(handle) - Get BDF identifier
  • amdsmi_get_gpu_device_uuid(handle) - Get UUID

Detailed Reference

GPU Monitoring

Real-time monitoring of GPU activity, utilization, VRAM usage, power consumption, clock frequencies, and PCIe information.

Key Functions:

  • amdsmi_get_gpu_activity(handle) - Get utilization metrics
  • amdsmi_get_gpu_vram_usage(handle) - Get VRAM statistics
  • amdsmi_get_power_info(handle) - Get power consumption
  • amdsmi_get_clock_info(handle, clock_type) - Get clock frequencies

Detailed Reference

GPU Temperature and Thermal

Monitor GPU temperatures across various sensors including edge, hotspot, junction, VRAM, and HBM temperatures.

Key Functions:

  • amdsmi_get_temp_metric(handle, sensor_type, metric) - Get temperature metric

Detailed Reference

GPU Device Information

Query static GPU device information including ASIC details, VBIOS, firmware versions, driver information, and board information.

Key Functions:

  • amdsmi_get_gpu_asic_info(handle) - Get ASIC information
  • amdsmi_get_gpu_driver_info(handle) - Get driver information
  • amdsmi_get_gpu_vbios_info(handle) - Get VBIOS information

Detailed Reference

GPU Memory

Query GPU memory information including total memory, usage, memory types, and reserved pages.

Key Functions:

  • amdsmi_get_gpu_memory_total(handle, mem_type) - Get total memory
  • amdsmi_get_gpu_memory_usage(handle, mem_type) - Get memory usage
  • amdsmi_get_gpu_vram_info(handle) - Get detailed VRAM information

Detailed Reference

GPU Configuration and Control

Configure and control GPU settings including power caps, clock frequencies, fan speeds, performance levels, and overdrive settings.

Key Functions:

  • amdsmi_set_power_cap(handle, sensor_ind, cap) - Set power cap
  • amdsmi_set_gpu_perf_level(handle, perf_level) - Set performance level
  • amdsmi_set_gpu_fan_speed(handle, sensor_ind, speed) - Set fan speed

Detailed Reference

GPU Performance Counters

Create, control, and read GPU performance counters for detailed performance analysis.

Key Functions:

  • amdsmi_gpu_create_counter(handle, event_type) - Create counter
  • amdsmi_gpu_read_counter(handle, counter_handle) - Read counter value

Detailed Reference

GPU Error and RAS

Query GPU error information including ECC errors, RAS features, and error counts for various GPU blocks.

Key Functions:

  • amdsmi_get_gpu_ecc_count(handle, block) - Get ECC error count
  • amdsmi_get_gpu_ecc_enabled(handle) - Check if ECC is enabled

Detailed Reference

GPU Process Information

Monitor GPU processes, including process list and per-process resource usage.

Key Functions:

  • amdsmi_get_gpu_process_list(handle) - Get process list
  • amdsmi_get_gpu_compute_process_info(handle) - Get compute process info

Detailed Reference

Hardware Topology

Query hardware topology including NUMA affinity, P2P connectivity, link metrics, and XGMI information.

Key Functions:

  • amdsmi_topo_get_numa_node_number(handle) - Get NUMA node
  • amdsmi_is_P2P_accessible(handle_src, handle_dst) - Check P2P accessibility
  • amdsmi_get_link_metrics(handle) - Get link metrics

Detailed Reference

GPU Partitioning

Configure and query GPU compute and memory partitioning for multi-instance GPU (MIG) support.

Key Functions:

  • amdsmi_get_gpu_compute_partition(handle) - Get compute partition
  • amdsmi_set_gpu_compute_partition(handle, partition_type) - Set compute partition
  • amdsmi_get_gpu_memory_partition(handle) - Get memory partition
  • amdsmi_set_gpu_memory_partition(handle, memory_partition) - Set memory partition

Detailed Reference

CPU Monitoring

Monitor AMD CPUs including energy consumption, power, temperature, frequencies, and HSMP metrics (requires ESMI library support).

Key Functions:

  • amdsmi_get_cpusocket_handles() - Get CPU socket handles
  • amdsmi_get_cpu_socket_power(socket_handle) - Get CPU power
  • amdsmi_get_cpu_socket_temperature(socket_handle) - Get CPU temperature

Detailed Reference

Event Handling

Read and monitor asynchronous GPU events including thermal throttling, VM faults, and GPU resets.

Key Classes:

  • AmdSmiEventReader - Event reader class

Detailed Reference

Version and Library Information

Query library version, ROCm version, and convert status codes to strings.

Key Functions:

  • amdsmi_get_lib_version() - Get library version
  • amdsmi_get_rocm_version() - Get ROCm version
  • amdsmi_status_code_to_string(status) - Convert status code to string

Detailed Reference

Quick Reference

Core Imports

import amdsmi
from amdsmi import (
    amdsmi_init,
    amdsmi_shut_down,
    amdsmi_get_processor_handles,
    amdsmi_get_gpu_activity,
    AmdSmiInitFlags,
    AmdSmiTemperatureType,
    AmdSmiTemperatureMetric
)

Common Initialization Flags

  • AmdSmiInitFlags.INIT_ALL_PROCESSORS - Initialize all processor types
  • AmdSmiInitFlags.INIT_AMD_CPUS - Initialize AMD CPUs only
  • AmdSmiInitFlags.INIT_AMD_GPUS - Initialize AMD GPUs only (default)
  • AmdSmiInitFlags.INIT_AMD_APUS - Initialize AMD APUs only

Exception Hierarchy

  • AmdSmiException (base)
    • AmdSmiLibraryException (with error code mapping)
      • AmdSmiRetryException
      • AmdSmiTimeoutException
    • AmdSmiParameterException
    • AmdSmiKeyException
    • AmdSmiBdfFormatException

For complete type definitions and constants, see the Type Reference.

Documentation Structure

Guides

Step-by-step instructions for common workflows:

Examples

Real-world usage scenarios:

Reference

Complete API documentation: