Functions for monitoring GPU temperature and voltage metrics. These functions provide access to various thermal sensors and voltage rails on AMD GPUs, including edge temperatures, hotspot temperatures, HBM memory temperatures, and voltage readings for different power domains.
Query temperature readings from various GPU thermal sensors.
def amdsmi_get_temp_metric(
processor_handle: processor_handle,
sensor_type: AmdSmiTemperatureType,
metric: AmdSmiTemperatureMetric
) -> int:
"""
Get temperature metric for a specific sensor on the GPU.
Retrieves temperature readings from various thermal sensors on the GPU, including
GPU edge, hotspot (junction), VRAM, HBM memory, PLX switch, and board components.
Temperature values are returned in millidegrees Celsius (m°C).
Parameters:
- processor_handle (processor_handle): Handle for the GPU device to query
- sensor_type (AmdSmiTemperatureType): The temperature sensor to read from.
Common sensor types:
- EDGE: GPU edge temperature (die edge)
- HOTSPOT: GPU hotspot/junction temperature (highest die temperature)
- JUNCTION: GPU junction temperature (similar to hotspot)
- VRAM: VRAM temperature
- HBM_0, HBM_1, HBM_2, HBM_3: Individual HBM memory stack temperatures
- PLX: PLX switch temperature
- MEM: Memory temperature
- GPUBOARD_*: Various board component temperatures
- BASEBOARD_*: Baseboard system temperatures
- metric (AmdSmiTemperatureMetric): The temperature metric to retrieve.
Available metrics:
- CURRENT: Current temperature reading
- MAX: Maximum temperature threshold
- MIN: Minimum temperature threshold
- MAX_HYST: Maximum threshold hysteresis
- MIN_HYST: Minimum threshold hysteresis
- CRITICAL: Critical temperature threshold
- CRITICAL_HYST: Critical threshold hysteresis
- EMERGENCY: Emergency temperature threshold
- EMERGENCY_HYST: Emergency threshold hysteresis
- CRIT_MIN: Critical minimum temperature threshold
- CRIT_MIN_HYST: Critical minimum threshold hysteresis
- OFFSET: Temperature offset value
- LOWEST: Lowest recorded temperature
- HIGHEST: Highest recorded temperature
Returns:
- int: Temperature value in millidegrees Celsius (m°C).
Divide by 1000 to get degrees Celsius.
Raises:
- AmdSmiParameterException: If any parameter is invalid
- AmdSmiLibraryException: If unable to retrieve temperature or sensor not supported
Example:
```python
import amdsmi
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
gpu = devices[0]
# Get current edge temperature
edge_temp = amdsmi.amdsmi_get_temp_metric(
gpu,
amdsmi.AmdSmiTemperatureType.EDGE,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
print(f"GPU Edge Temperature: {edge_temp / 1000.0}°C")
# Get hotspot temperature
hotspot_temp = amdsmi.amdsmi_get_temp_metric(
gpu,
amdsmi.AmdSmiTemperatureType.HOTSPOT,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
print(f"GPU Hotspot Temperature: {hotspot_temp / 1000.0}°C")
# Get critical temperature threshold
crit_temp = amdsmi.amdsmi_get_temp_metric(
gpu,
amdsmi.AmdSmiTemperatureType.EDGE,
amdsmi.AmdSmiTemperatureMetric.CRITICAL
)
print(f"Critical Temperature: {crit_temp / 1000.0}°C")
# Get VRAM temperature
vram_temp = amdsmi.amdsmi_get_temp_metric(
gpu,
amdsmi.AmdSmiTemperatureType.VRAM,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
print(f"VRAM Temperature: {vram_temp / 1000.0}°C")
finally:
amdsmi.amdsmi_shut_down()
```
"""Query voltage readings from GPU power rails.
def amdsmi_get_gpu_volt_metric(
processor_handle: processor_handle,
sensor_type: AmdSmiVoltageType,
metric: AmdSmiVoltageMetric
) -> int:
"""
Get voltage metric for a specific voltage rail on the GPU.
Retrieves voltage readings from GPU power rails. Voltage values are returned
in millivolts (mV).
Parameters:
- processor_handle (processor_handle): Handle for the GPU device to query
- sensor_type (AmdSmiVoltageType): The voltage rail to read from.
Available voltage types:
- VDDGFX: GPU graphics voltage (core voltage)
- VDDBOARD: Board voltage
- INVALID: Invalid voltage type (used for error handling)
- metric (AmdSmiVoltageMetric): The voltage metric to retrieve.
Available metrics:
- CURRENT: Current voltage reading
- MAX: Maximum voltage threshold
- MIN: Minimum voltage threshold
- MIN_CRIT: Critical minimum voltage threshold
- MAX_CRIT: Critical maximum voltage threshold
- AVERAGE: Average voltage
- LOWEST: Lowest recorded voltage
- HIGHEST: Highest recorded voltage
Returns:
- int: Voltage value in millivolts (mV).
Divide by 1000 to get volts (V).
Raises:
- AmdSmiParameterException: If any parameter is invalid
- AmdSmiLibraryException: If unable to retrieve voltage or sensor not supported
Example:
```python
import amdsmi
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
gpu = devices[0]
# Get current GPU core voltage
vddgfx = amdsmi.amdsmi_get_gpu_volt_metric(
gpu,
amdsmi.AmdSmiVoltageType.VDDGFX,
amdsmi.AmdSmiVoltageMetric.CURRENT
)
print(f"GPU Core Voltage (VDDGFX): {vddgfx / 1000.0}V ({vddgfx}mV)")
# Get average voltage
avg_voltage = amdsmi.amdsmi_get_gpu_volt_metric(
gpu,
amdsmi.AmdSmiVoltageType.VDDGFX,
amdsmi.AmdSmiVoltageMetric.AVERAGE
)
print(f"Average Voltage: {avg_voltage / 1000.0}V")
# Get board voltage
board_voltage = amdsmi.amdsmi_get_gpu_volt_metric(
gpu,
amdsmi.AmdSmiVoltageType.VDDBOARD,
amdsmi.AmdSmiVoltageMetric.CURRENT
)
print(f"Board Voltage: {board_voltage / 1000.0}V")
finally:
amdsmi.amdsmi_shut_down()
```
"""class AmdSmiTemperatureType(IntEnum):
"""
Temperature sensor types available on AMD GPUs.
These enums identify different thermal sensors on the GPU and board components.
Not all sensors are available on all GPU models.
"""
# Primary GPU sensors
EDGE = ... # GPU edge temperature (die edge)
HOTSPOT = ... # GPU hotspot/junction temperature
JUNCTION = ... # GPU junction temperature
VRAM = ... # VRAM temperature
# HBM memory stack sensors (for GPUs with HBM memory)
HBM_0 = ... # HBM stack 0 temperature
HBM_1 = ... # HBM stack 1 temperature
HBM_2 = ... # HBM stack 2 temperature
HBM_3 = ... # HBM stack 3 temperature
# PCIe and interconnect
PLX = ... # PLX switch temperature
# GPU board node sensors
GPUBOARD_NODE_RETIMER_X = ... # Retimer X temperature
GPUBOARD_NODE_OAM_X_IBC = ... # OAM X IBC temperature
GPUBOARD_NODE_OAM_X_IBC_2 = ... # OAM X IBC 2 temperature
GPUBOARD_NODE_OAM_X_VDD18_VR = ... # OAM X 1.8V VR temperature
GPUBOARD_NODE_OAM_X_04_HBM_B_VR = ... # OAM X 0.4V HBM B VR temperature
GPUBOARD_NODE_OAM_X_04_HBM_D_VR = ... # OAM X 0.4V HBM D VR temperature
GPUBOARD_NODE_LAST = ... # Last GPU board node sensor
# GPU board voltage regulator sensors
GPUBOARD_VDDCR_VDD0 = ... # VDDCR VDD0 VR temperature
GPUBOARD_VDDCR_VDD1 = ... # VDDCR VDD1 VR temperature
GPUBOARD_VDDCR_VDD2 = ... # VDDCR VDD2 VR temperature
GPUBOARD_VDDCR_VDD3 = ... # VDDCR VDD3 VR temperature
GPUBOARD_VDDCR_SOC_A = ... # VDDCR SOC A VR temperature
GPUBOARD_VDDCR_SOC_C = ... # VDDCR SOC C VR temperature
GPUBOARD_VDDCR_SOCIO_A = ... # VDDCR SOCIO A VR temperature
GPUBOARD_VDDCR_SOCIO_C = ... # VDDCR SOCIO C VR temperature
GPUBOARD_VDD_085_HBM = ... # VDD 0.85V HBM VR temperature
GPUBOARD_VDDCR_11_HBM_B = ... # VDDCR 1.1V HBM B VR temperature
GPUBOARD_VDDCR_11_HBM_D = ... # VDDCR 1.1V HBM D VR temperature
GPUBOARD_VDD_USR = ... # VDD USR VR temperature
GPUBOARD_VDDIO_11_E32 = ... # VDDIO 1.1V E32 VR temperature
GPUBOARD_VR_LAST = ... # Last GPU board VR sensor
# Baseboard system sensors
BASEBOARD_UBB_FPGA = ... # UBB FPGA temperature
BASEBOARD_UBB_FRONT = ... # UBB front temperature
BASEBOARD_UBB_BACK = ... # UBB back temperature
BASEBOARD_UBB_OAM7 = ... # UBB OAM7 temperature
BASEBOARD_UBB_IBC = ... # UBB IBC temperature
BASEBOARD_UBB_UFPGA = ... # UBB UFPGA temperature
BASEBOARD_UBB_OAM1 = ... # UBB OAM1 temperature
BASEBOARD_OAM_0_1_HSC = ... # OAM 0-1 HSC temperature
BASEBOARD_OAM_2_3_HSC = ... # OAM 2-3 HSC temperature
BASEBOARD_OAM_4_5_HSC = ... # OAM 4-5 HSC temperature
BASEBOARD_OAM_6_7_HSC = ... # OAM 6-7 HSC temperature
BASEBOARD_UBB_FPGA_0V72_VR = ... # UBB FPGA 0.72V VR temperature
BASEBOARD_UBB_FPGA_3V3_VR = ... # UBB FPGA 3.3V VR temperature
BASEBOARD_RETIMER_0_1_2_3_1V2_VR = ... # Retimer 0-1-2-3 1.2V VR temperature
BASEBOARD_RETIMER_4_5_6_7_1V2_VR = ... # Retimer 4-5-6-7 1.2V VR temperature
BASEBOARD_RETIMER_0_1_0V9_VR = ... # Retimer 0-1 0.9V VR temperature
BASEBOARD_RETIMER_4_5_0V9_VR = ... # Retimer 4-5 0.9V VR temperature
BASEBOARD_RETIMER_2_3_0V9_VR = ... # Retimer 2-3 0.9V VR temperature
BASEBOARD_RETIMER_6_7_0V9_VR = ... # Retimer 6-7 0.9V VR temperature
BASEBOARD_OAM_0_1_2_3_3V3_VR = ... # OAM 0-1-2-3 3.3V VR temperature
BASEBOARD_OAM_4_5_6_7_3V3_VR = ... # OAM 4-5-6-7 3.3V VR temperature
BASEBOARD_IBC_HSC = ... # IBC HSC temperature
BASEBOARD_IBC = ... # IBC temperature
BASEBOARD_LAST = ... # Last baseboard sensor
BASEBOARD__MAX = ... # Maximum temperature type valueclass AmdSmiTemperatureMetric(IntEnum):
"""
Temperature metric types for querying different temperature values.
These metrics specify what type of temperature value to retrieve for a
given sensor.
"""
CURRENT = ... # Current temperature reading
MAX = ... # Maximum temperature threshold
MIN = ... # Minimum temperature threshold
MAX_HYST = ... # Maximum threshold hysteresis
MIN_HYST = ... # Minimum threshold hysteresis
CRITICAL = ... # Critical temperature threshold
CRITICAL_HYST = ... # Critical threshold hysteresis
EMERGENCY = ... # Emergency temperature threshold
EMERGENCY_HYST = ... # Emergency threshold hysteresis
CRIT_MIN = ... # Critical minimum temperature threshold
CRIT_MIN_HYST = ... # Critical minimum threshold hysteresis
OFFSET = ... # Temperature offset value
LOWEST = ... # Lowest recorded temperature
HIGHEST = ... # Highest recorded temperatureclass AmdSmiVoltageType(IntEnum):
"""
Voltage rail types available on AMD GPUs.
These enums identify different voltage rails that can be monitored.
"""
VDDGFX = ... # GPU graphics core voltage
VDDBOARD = ... # Board voltage
INVALID = ... # Invalid voltage typeclass AmdSmiVoltageMetric(IntEnum):
"""
Voltage metric types for querying different voltage values.
These metrics specify what type of voltage value to retrieve for a
given voltage rail.
"""
CURRENT = ... # Current voltage reading
MAX = ... # Maximum voltage threshold
MIN = ... # Minimum voltage threshold
MIN_CRIT = ... # Critical minimum voltage threshold
MAX_CRIT = ... # Critical maximum voltage threshold
AVERAGE = ... # Average voltage
LOWEST = ... # Lowest recorded voltage
HIGHEST = ... # Highest recorded voltageMonitor essential GPU temperatures:
import amdsmi
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
for i, gpu in enumerate(devices):
print(f"\nGPU {i} Temperatures:")
# Edge temperature (most common)
edge_temp = amdsmi.amdsmi_get_temp_metric(
gpu,
amdsmi.AmdSmiTemperatureType.EDGE,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
print(f" Edge: {edge_temp / 1000.0:.1f}°C")
# Hotspot temperature (highest die temperature)
try:
hotspot_temp = amdsmi.amdsmi_get_temp_metric(
gpu,
amdsmi.AmdSmiTemperatureType.HOTSPOT,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
print(f" Hotspot: {hotspot_temp / 1000.0:.1f}°C")
except amdsmi.AmdSmiLibraryException:
print(" Hotspot: Not available")
# Memory temperature
try:
mem_temp = amdsmi.amdsmi_get_temp_metric(
gpu,
amdsmi.AmdSmiTemperatureType.VRAM,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
print(f" VRAM: {mem_temp / 1000.0:.1f}°C")
except amdsmi.AmdSmiLibraryException:
print(" VRAM: Not available")
finally:
amdsmi.amdsmi_shut_down()Check temperatures against critical thresholds:
import amdsmi
def check_thermal_status(gpu_handle):
"""Check if GPU is approaching thermal limits."""
# Get current and critical temperatures
current = amdsmi.amdsmi_get_temp_metric(
gpu_handle,
amdsmi.AmdSmiTemperatureType.EDGE,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
try:
critical = amdsmi.amdsmi_get_temp_metric(
gpu_handle,
amdsmi.AmdSmiTemperatureType.EDGE,
amdsmi.AmdSmiTemperatureMetric.CRITICAL
)
temp_c = current / 1000.0
crit_c = critical / 1000.0
margin = crit_c - temp_c
print(f"Temperature: {temp_c:.1f}°C")
print(f"Critical: {crit_c:.1f}°C")
print(f"Margin: {margin:.1f}°C")
if margin < 10:
print("WARNING: Approaching critical temperature!")
return False
return True
except amdsmi.AmdSmiLibraryException:
print("Critical threshold not available")
return True
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
for i, gpu in enumerate(devices):
print(f"\nGPU {i} Thermal Status:")
check_thermal_status(gpu)
finally:
amdsmi.amdsmi_shut_down()Monitor individual HBM memory stack temperatures:
import amdsmi
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
gpu = devices[0]
print("HBM Memory Temperatures:")
# Query each HBM stack
hbm_sensors = [
amdsmi.AmdSmiTemperatureType.HBM_0,
amdsmi.AmdSmiTemperatureType.HBM_1,
amdsmi.AmdSmiTemperatureType.HBM_2,
amdsmi.AmdSmiTemperatureType.HBM_3,
]
for i, sensor in enumerate(hbm_sensors):
try:
temp = amdsmi.amdsmi_get_temp_metric(
gpu,
sensor,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
print(f" HBM Stack {i}: {temp / 1000.0:.1f}°C")
except amdsmi.AmdSmiLibraryException:
# HBM stack not present or not monitored
pass
finally:
amdsmi.amdsmi_shut_down()Monitor GPU voltage rails:
import amdsmi
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
for i, gpu in enumerate(devices):
print(f"\nGPU {i} Voltages:")
# GPU core voltage
try:
vddgfx = amdsmi.amdsmi_get_gpu_volt_metric(
gpu,
amdsmi.AmdSmiVoltageType.VDDGFX,
amdsmi.AmdSmiVoltageMetric.CURRENT
)
print(f" Core (VDDGFX): {vddgfx / 1000.0:.3f}V ({vddgfx}mV)")
# Get average voltage
avg_volt = amdsmi.amdsmi_get_gpu_volt_metric(
gpu,
amdsmi.AmdSmiVoltageType.VDDGFX,
amdsmi.AmdSmiVoltageMetric.AVERAGE
)
print(f" Average: {avg_volt / 1000.0:.3f}V")
except amdsmi.AmdSmiLibraryException as e:
print(f" Core voltage: Not available")
# Board voltage
try:
vddboard = amdsmi.amdsmi_get_gpu_volt_metric(
gpu,
amdsmi.AmdSmiVoltageType.VDDBOARD,
amdsmi.AmdSmiVoltageMetric.CURRENT
)
print(f" Board: {vddboard / 1000.0:.3f}V ({vddboard}mV)")
except amdsmi.AmdSmiLibraryException:
print(f" Board voltage: Not available")
finally:
amdsmi.amdsmi_shut_down()Create a comprehensive thermal and power monitoring dashboard:
import amdsmi
import time
def print_thermal_status(gpu_handle, gpu_id):
"""Print comprehensive thermal and voltage status."""
print(f"\n{'='*60}")
print(f"GPU {gpu_id} - Thermal and Voltage Status")
print(f"{'='*60}")
# Temperature readings
print("\nTemperatures:")
temp_sensors = {
"Edge": amdsmi.AmdSmiTemperatureType.EDGE,
"Hotspot": amdsmi.AmdSmiTemperatureType.HOTSPOT,
"VRAM": amdsmi.AmdSmiTemperatureType.VRAM,
}
for name, sensor in temp_sensors.items():
try:
temp = amdsmi.amdsmi_get_temp_metric(
gpu_handle,
sensor,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
print(f" {name:12s}: {temp / 1000.0:6.1f}°C")
except amdsmi.AmdSmiLibraryException:
print(f" {name:12s}: N/A")
# Voltage readings
print("\nVoltages:")
volt_sensors = {
"Core (VDDGFX)": amdsmi.AmdSmiVoltageType.VDDGFX,
"Board": amdsmi.AmdSmiVoltageType.VDDBOARD,
}
for name, sensor in volt_sensors.items():
try:
volt = amdsmi.amdsmi_get_gpu_volt_metric(
gpu_handle,
sensor,
amdsmi.AmdSmiVoltageMetric.CURRENT
)
print(f" {name:15s}: {volt / 1000.0:6.3f}V ({volt:5d}mV)")
except amdsmi.AmdSmiLibraryException:
print(f" {name:15s}: N/A")
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
# Monitor continuously
while True:
for i, gpu in enumerate(devices):
print_thermal_status(gpu, i)
print(f"\n{'='*60}")
print("Press Ctrl+C to exit...")
time.sleep(5)
except KeyboardInterrupt:
print("\nMonitoring stopped.")
finally:
amdsmi.amdsmi_shut_down()Track temperature statistics over time:
import amdsmi
import time
from collections import defaultdict
class TemperatureTracker:
"""Track GPU temperature statistics."""
def __init__(self, gpu_handle):
self.gpu_handle = gpu_handle
self.readings = defaultdict(list)
def record(self):
"""Record current temperature readings."""
sensors = {
"edge": amdsmi.AmdSmiTemperatureType.EDGE,
"hotspot": amdsmi.AmdSmiTemperatureType.HOTSPOT,
"vram": amdsmi.AmdSmiTemperatureType.VRAM,
}
for name, sensor in sensors.items():
try:
temp = amdsmi.amdsmi_get_temp_metric(
self.gpu_handle,
sensor,
amdsmi.AmdSmiTemperatureMetric.CURRENT
)
self.readings[name].append(temp / 1000.0)
except amdsmi.AmdSmiLibraryException:
pass
def get_statistics(self):
"""Get temperature statistics."""
stats = {}
for name, temps in self.readings.items():
if temps:
stats[name] = {
"current": temps[-1],
"min": min(temps),
"max": max(temps),
"avg": sum(temps) / len(temps),
"samples": len(temps),
}
return stats
# Example usage
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
gpu = devices[0]
tracker = TemperatureTracker(gpu)
# Collect samples for 60 seconds
print("Collecting temperature data for 60 seconds...")
for _ in range(60):
tracker.record()
time.sleep(1)
# Print statistics
print("\nTemperature Statistics:")
print(f"{'Sensor':<12} {'Current':>8} {'Min':>8} {'Max':>8} {'Avg':>8} {'Samples':>8}")
print("-" * 64)
stats = tracker.get_statistics()
for sensor, data in stats.items():
print(f"{sensor:<12} "
f"{data['current']:>7.1f}°C "
f"{data['min']:>7.1f}°C "
f"{data['max']:>7.1f}°C "
f"{data['avg']:>7.1f}°C "
f"{data['samples']:>8d}")
finally:
amdsmi.amdsmi_shut_down()AmdSmiLibraryExceptionamdsmi_get_gpu_metrics_info(): Provides comprehensive metrics including temperatures in a single callamdsmi_get_power_info(): Power information complements thermal monitoringamdsmi_get_gpu_activity(): GPU utilization affects thermal output