Functions for querying GPU memory information including total capacity, current usage, and reserved/retired memory pages. These functions provide visibility into different memory types (VRAM, visible VRAM, GTT) and memory health status.
Get the total memory capacity for a specific memory type.
def amdsmi_get_gpu_memory_total(
processor_handle: processor_handle,
mem_type: AmdSmiMemoryType
) -> int:
"""
Get total GPU memory capacity for a specific memory type.
Returns the total amount of memory available in the specified memory pool.
This represents the hardware capacity, not the amount currently available.
Parameters:
- processor_handle: Handle for the target GPU device
- mem_type (AmdSmiMemoryType): Type of memory to query:
- VRAM: Video RAM (total GPU memory)
- VIS_VRAM: CPU-visible VRAM (BAR memory that CPU can directly access)
- GTT: Graphics Translation Table (system memory accessible by GPU)
Returns:
- int: Total memory capacity in bytes
Raises:
- AmdSmiParameterException: If processor_handle or mem_type is invalid
- AmdSmiLibraryException: On query failure
Example:
```python
import amdsmi
from amdsmi import AmdSmiMemoryType
amdsmi.amdsmi_init()
device = amdsmi.amdsmi_get_processor_handles()[0]
# Query total VRAM
vram_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.VRAM
)
print(f"Total VRAM: {vram_total / (1024**3):.2f} GB")
# Query visible VRAM (CPU-accessible)
vis_vram_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.VIS_VRAM
)
print(f"Visible VRAM: {vis_vram_total / (1024**3):.2f} GB")
# Query GTT memory
gtt_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.GTT
)
print(f"GTT Memory: {gtt_total / (1024**3):.2f} GB")
amdsmi.amdsmi_shut_down()
```
"""Get the current memory usage for a specific memory type.
def amdsmi_get_gpu_memory_usage(
processor_handle: processor_handle,
mem_type: AmdSmiMemoryType
) -> int:
"""
Get current GPU memory usage for a specific memory type.
Returns the amount of memory currently in use from the specified memory pool.
This can be used with amdsmi_get_gpu_memory_total() to calculate utilization.
Parameters:
- processor_handle: Handle for the target GPU device
- mem_type (AmdSmiMemoryType): Type of memory to query:
- VRAM: Video RAM usage
- VIS_VRAM: CPU-visible VRAM usage
- GTT: Graphics Translation Table usage
Returns:
- int: Used memory in bytes
Raises:
- AmdSmiParameterException: If processor_handle or mem_type is invalid
- AmdSmiLibraryException: On query failure
Example:
```python
import amdsmi
from amdsmi import AmdSmiMemoryType
amdsmi.amdsmi_init()
device = amdsmi.amdsmi_get_processor_handles()[0]
# Query VRAM usage
vram_used = amdsmi.amdsmi_get_gpu_memory_usage(
device, AmdSmiMemoryType.VRAM
)
vram_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.VRAM
)
# Calculate utilization
usage_gb = vram_used / (1024**3)
total_gb = vram_total / (1024**3)
usage_percent = (vram_used / vram_total) * 100
print(f"VRAM Usage: {usage_gb:.2f} / {total_gb:.2f} GB ({usage_percent:.1f}%)")
amdsmi.amdsmi_shut_down()
```
"""Get information about reserved (retired) memory pages due to errors.
def amdsmi_get_gpu_memory_reserved_pages(
processor_handle: processor_handle
) -> List[Dict[str, Any]]:
"""
Get list of reserved (retired) memory pages.
Returns information about memory pages that have been reserved due to uncorrectable
errors. These pages are removed from the available memory pool to prevent data
corruption. This is an important health metric for GPU memory.
Parameters:
- processor_handle: Handle for the target GPU device
Returns:
- List[dict]: List of reserved page records, each containing:
- value (int): Record index
- page_address (int): Physical address of the reserved page
- page_size (int): Size of the page in bytes
- status (AmdSmiMemoryPageStatus): Page status:
- RESERVED: Page has been reserved and removed from use
- PENDING: Page is pending retirement
- UNRESERVABLE: Page cannot be reserved
An empty list indicates no reserved pages (healthy memory).
Raises:
- AmdSmiParameterException: If processor_handle is invalid
- AmdSmiLibraryException: On query failure
Example:
```python
import amdsmi
from amdsmi import AmdSmiMemoryPageStatus
amdsmi.amdsmi_init()
device = amdsmi.amdsmi_get_processor_handles()[0]
reserved_pages = amdsmi.amdsmi_get_gpu_memory_reserved_pages(device)
if not reserved_pages:
print("No reserved pages - memory is healthy")
else:
print(f"Found {len(reserved_pages)} reserved pages:")
for page in reserved_pages:
addr = page['page_address']
size = page['page_size']
status = page['status']
status_str = {
AmdSmiMemoryPageStatus.RESERVED: "RESERVED",
AmdSmiMemoryPageStatus.PENDING: "PENDING",
AmdSmiMemoryPageStatus.UNRESERVABLE: "UNRESERVABLE"
}.get(status, f"UNKNOWN({status})")
print(f" [{page['value']}] Address: 0x{addr:016x}, "
f"Size: {size} bytes, Status: {status_str}")
amdsmi.amdsmi_shut_down()
```
"""Monitor memory usage across different memory types:
import amdsmi
from amdsmi import AmdSmiMemoryType
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
for i, device in enumerate(devices):
print(f"\n=== GPU {i} Memory Status ===")
# VRAM (main GPU memory)
vram_used = amdsmi.amdsmi_get_gpu_memory_usage(
device, AmdSmiMemoryType.VRAM
)
vram_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.VRAM
)
vram_used_gb = vram_used / (1024**3)
vram_total_gb = vram_total / (1024**3)
vram_percent = (vram_used / vram_total) * 100
print(f"VRAM: {vram_used_gb:.2f} / {vram_total_gb:.2f} GB "
f"({vram_percent:.1f}%)")
# Visible VRAM (CPU-accessible)
try:
vis_vram_used = amdsmi.amdsmi_get_gpu_memory_usage(
device, AmdSmiMemoryType.VIS_VRAM
)
vis_vram_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.VIS_VRAM
)
vis_vram_used_gb = vis_vram_used / (1024**3)
vis_vram_total_gb = vis_vram_total / (1024**3)
vis_vram_percent = (vis_vram_used / vis_vram_total) * 100
print(f"Visible VRAM: {vis_vram_used_gb:.2f} / {vis_vram_total_gb:.2f} GB "
f"({vis_vram_percent:.1f}%)")
except:
print("Visible VRAM: Not available")
# GTT (system memory)
try:
gtt_used = amdsmi.amdsmi_get_gpu_memory_usage(
device, AmdSmiMemoryType.GTT
)
gtt_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.GTT
)
gtt_used_gb = gtt_used / (1024**3)
gtt_total_gb = gtt_total / (1024**3)
gtt_percent = (gtt_used / gtt_total) * 100
print(f"GTT: {gtt_used_gb:.2f} / {gtt_total_gb:.2f} GB "
f"({gtt_percent:.1f}%)")
except:
print("GTT: Not available")
finally:
amdsmi.amdsmi_shut_down()Check for reserved pages indicating memory errors:
import amdsmi
from amdsmi import AmdSmiMemoryPageStatus
def check_memory_health(device):
"""Check GPU memory health status."""
reserved_pages = amdsmi.amdsmi_get_gpu_memory_reserved_pages(device)
if not reserved_pages:
return {
'status': 'HEALTHY',
'reserved_count': 0,
'pending_count': 0,
'total_size': 0
}
# Count by status
status_counts = {
AmdSmiMemoryPageStatus.RESERVED: 0,
AmdSmiMemoryPageStatus.PENDING: 0,
AmdSmiMemoryPageStatus.UNRESERVABLE: 0
}
total_size = 0
for page in reserved_pages:
status = page['status']
status_counts[status] = status_counts.get(status, 0) + 1
total_size += page['page_size']
# Determine overall health
reserved_count = status_counts[AmdSmiMemoryPageStatus.RESERVED]
pending_count = status_counts[AmdSmiMemoryPageStatus.PENDING]
if reserved_count + pending_count > 10:
health_status = 'DEGRADED'
elif reserved_count + pending_count > 0:
health_status = 'WARNING'
else:
health_status = 'HEALTHY'
return {
'status': health_status,
'reserved_count': reserved_count,
'pending_count': pending_count,
'unreservable_count': status_counts[AmdSmiMemoryPageStatus.UNRESERVABLE],
'total_size': total_size,
'details': reserved_pages
}
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
for i, device in enumerate(devices):
print(f"\n=== GPU {i} Memory Health ===")
health = check_memory_health(device)
print(f"Status: {health['status']}")
print(f"Reserved pages: {health['reserved_count']}")
print(f"Pending pages: {health['pending_count']}")
if health['reserved_count'] > 0:
size_mb = health['total_size'] / (1024**2)
print(f"Total reserved: {size_mb:.2f} MB")
print("\nDetailed page list:")
for page in health['details']:
print(f" 0x{page['page_address']:016x} - "
f"{page['page_size']} bytes - "
f"Status: {page['status']}")
finally:
amdsmi.amdsmi_shut_down()Monitor memory pressure and alert on high usage:
import amdsmi
from amdsmi import AmdSmiMemoryType
import time
def get_memory_pressure(device):
"""Calculate memory pressure level."""
vram_used = amdsmi.amdsmi_get_gpu_memory_usage(
device, AmdSmiMemoryType.VRAM
)
vram_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.VRAM
)
usage_percent = (vram_used / vram_total) * 100
if usage_percent > 95:
return 'CRITICAL', usage_percent
elif usage_percent > 85:
return 'HIGH', usage_percent
elif usage_percent > 70:
return 'MODERATE', usage_percent
else:
return 'LOW', usage_percent
amdsmi.amdsmi_init()
try:
device = amdsmi.amdsmi_get_processor_handles()[0]
print("Monitoring memory pressure (Ctrl+C to stop)...")
while True:
pressure, percent = get_memory_pressure(device)
# Get actual values
vram_used = amdsmi.amdsmi_get_gpu_memory_usage(
device, AmdSmiMemoryType.VRAM
)
vram_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.VRAM
)
used_gb = vram_used / (1024**3)
total_gb = vram_total / (1024**3)
available_gb = (vram_total - vram_used) / (1024**3)
timestamp = time.strftime("%H:%M:%S")
# Display with color coding
if pressure == 'CRITICAL':
print(f"[{timestamp}] !!! CRITICAL !!! "
f"{used_gb:.2f} / {total_gb:.2f} GB ({percent:.1f}%) "
f"- Only {available_gb:.2f} GB available")
elif pressure == 'HIGH':
print(f"[{timestamp}] ** HIGH ** "
f"{used_gb:.2f} / {total_gb:.2f} GB ({percent:.1f}%)")
elif pressure == 'MODERATE':
print(f"[{timestamp}] * MODERATE * "
f"{used_gb:.2f} / {total_gb:.2f} GB ({percent:.1f}%)")
else:
print(f"[{timestamp}] OK - "
f"{used_gb:.2f} / {total_gb:.2f} GB ({percent:.1f}%)")
time.sleep(2)
except KeyboardInterrupt:
print("\nMonitoring stopped")
finally:
amdsmi.amdsmi_shut_down()Generate a detailed memory report:
import amdsmi
from amdsmi import AmdSmiMemoryType, AmdSmiMemoryPageStatus
def generate_memory_report(device):
"""Generate comprehensive memory report for a GPU."""
report = {
'vram': {},
'vis_vram': {},
'gtt': {},
'health': {}
}
# VRAM
try:
vram_used = amdsmi.amdsmi_get_gpu_memory_usage(
device, AmdSmiMemoryType.VRAM
)
vram_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.VRAM
)
report['vram'] = {
'used_bytes': vram_used,
'total_bytes': vram_total,
'available_bytes': vram_total - vram_used,
'used_gb': vram_used / (1024**3),
'total_gb': vram_total / (1024**3),
'available_gb': (vram_total - vram_used) / (1024**3),
'usage_percent': (vram_used / vram_total) * 100
}
except Exception as e:
report['vram']['error'] = str(e)
# Visible VRAM
try:
vis_vram_used = amdsmi.amdsmi_get_gpu_memory_usage(
device, AmdSmiMemoryType.VIS_VRAM
)
vis_vram_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.VIS_VRAM
)
report['vis_vram'] = {
'used_bytes': vis_vram_used,
'total_bytes': vis_vram_total,
'available_bytes': vis_vram_total - vis_vram_used,
'used_gb': vis_vram_used / (1024**3),
'total_gb': vis_vram_total / (1024**3),
'available_gb': (vis_vram_total - vis_vram_used) / (1024**3),
'usage_percent': (vis_vram_used / vis_vram_total) * 100
}
except Exception as e:
report['vis_vram']['error'] = str(e)
# GTT
try:
gtt_used = amdsmi.amdsmi_get_gpu_memory_usage(
device, AmdSmiMemoryType.GTT
)
gtt_total = amdsmi.amdsmi_get_gpu_memory_total(
device, AmdSmiMemoryType.GTT
)
report['gtt'] = {
'used_bytes': gtt_used,
'total_bytes': gtt_total,
'available_bytes': gtt_total - gtt_used,
'used_gb': gtt_used / (1024**3),
'total_gb': gtt_total / (1024**3),
'available_gb': (gtt_total - gtt_used) / (1024**3),
'usage_percent': (gtt_used / gtt_total) * 100
}
except Exception as e:
report['gtt']['error'] = str(e)
# Health check
try:
reserved_pages = amdsmi.amdsmi_get_gpu_memory_reserved_pages(device)
report['health']['reserved_pages'] = len(reserved_pages)
report['health']['has_errors'] = len(reserved_pages) > 0
report['health']['pages'] = reserved_pages
except Exception as e:
report['health']['error'] = str(e)
return report
def print_memory_report(device_num, report):
"""Pretty print memory report."""
print(f"\n{'='*70}")
print(f"GPU {device_num} - Memory Report")
print(f"{'='*70}")
# VRAM
print("\n[VRAM - Video RAM]")
if 'error' in report['vram']:
print(f" Error: {report['vram']['error']}")
else:
vram = report['vram']
print(f" Total: {vram['total_gb']:8.2f} GB")
print(f" Used: {vram['used_gb']:8.2f} GB ({vram['usage_percent']:.1f}%)")
print(f" Available: {vram['available_gb']:8.2f} GB")
# Visible VRAM
print("\n[Visible VRAM - CPU-accessible]")
if 'error' in report['vis_vram']:
print(f" Not available")
else:
vis = report['vis_vram']
print(f" Total: {vis['total_gb']:8.2f} GB")
print(f" Used: {vis['used_gb']:8.2f} GB ({vis['usage_percent']:.1f}%)")
print(f" Available: {vis['available_gb']:8.2f} GB")
# GTT
print("\n[GTT - Graphics Translation Table]")
if 'error' in report['gtt']:
print(f" Not available")
else:
gtt = report['gtt']
print(f" Total: {gtt['total_gb']:8.2f} GB")
print(f" Used: {gtt['used_gb']:8.2f} GB ({gtt['usage_percent']:.1f}%)")
print(f" Available: {gtt['available_gb']:8.2f} GB")
# Health
print("\n[Memory Health]")
if 'error' in report['health']:
print(f" Error: {report['health']['error']}")
else:
health = report['health']
if health['has_errors']:
print(f" Status: WARNING - {health['reserved_pages']} reserved pages found")
print(f" Reserved pages indicate memory errors")
else:
print(f" Status: HEALTHY - No reserved pages")
print(f"\n{'='*70}")
amdsmi.amdsmi_init()
try:
devices = amdsmi.amdsmi_get_processor_handles()
for i, device in enumerate(devices):
report = generate_memory_report(device)
print_memory_report(i, report)
finally:
amdsmi.amdsmi_shut_down()Memory type enumeration for querying different memory pools:
class AmdSmiMemoryType(IntEnum):
"""
GPU memory type identifiers.
Different memory pools accessible by the GPU and CPU.
"""
VRAM = ... # Video RAM - Main GPU memory (GDDR/HBM)
VIS_VRAM = ... # CPU-visible VRAM - Portion of VRAM accessible by CPU (BAR)
GTT = ... # Graphics Translation Table - System memory usable by GPUMemory page status enumeration for reserved pages:
class AmdSmiMemoryPageStatus(IntEnum):
"""
Status of reserved/retired memory pages.
Indicates the state of pages that have experienced errors.
"""
RESERVED = ... # Page has been reserved (retired) due to errors
PENDING = ... # Page retirement is pending
UNRESERVABLE = ... # Page cannot be reservedVRAM technology type enumeration:
class AmdSmiVramType(IntEnum):
"""
VRAM technology types.
Identifies the type of memory technology used for GPU VRAM.
"""
UNKNOWN = ... # Unknown VRAM type
HBM = ... # High Bandwidth Memory (1st gen)
HBM2 = ... # High Bandwidth Memory 2
HBM2E = ... # High Bandwidth Memory 2E (Enhanced)
HBM3 = ... # High Bandwidth Memory 3
DDR2 = ... # DDR2 SDRAM
DDR3 = ... # DDR3 SDRAM
DDR4 = ... # DDR4 SDRAM
GDDR1 = ... # GDDR SDRAM
GDDR2 = ... # GDDR2 SDRAM
GDDR3 = ... # GDDR3 SDRAM
GDDR4 = ... # GDDR4 SDRAM
GDDR5 = ... # GDDR5 SDRAM
GDDR6 = ... # GDDR6 SDRAM
GDDR7 = ... # GDDR7 SDRAM
MAX = ... # Maximum value markerThe main GPU memory, typically using GDDR (Graphics DDR) or HBM (High Bandwidth Memory) technology. This is the primary memory pool for GPU operations including:
VRAM provides the highest bandwidth for GPU operations but is not directly accessible by the CPU.
A portion of VRAM that is CPU-visible through the PCIe Base Address Register (BAR). This memory can be:
The size of visible VRAM depends on the PCIe BAR size, which can be configured in BIOS settings. Larger BAR sizes (Resizable BAR or Smart Access Memory) improve CPU-GPU transfer performance.
System RAM that the GPU can access through the GTT aperture. This provides:
The GPU uses the GTT to map system memory pages, allowing access to data stored in system RAM. This is useful for large datasets that don't fit in VRAM.
amdsmi_get_gpu_vram_usage() from GPU Monitoring for simpler VRAM queries