or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/github-amd-smi

AMD System Management Interface Library for monitoring and controlling AMD GPU devices on Linux systems

Workspace
tessl
Visibility
Public
Created
Last updated
Describes

pkg:github/radeonopencompute/amdsmi@5.7.x

To install, run

npx @tessl/cli install tessl/github-amd-smi@5.7.0

0

# AMD SMI Library

1

2

The AMD System Management Interface (AMD SMI) Library is a comprehensive C/C++ library with Python bindings that provides user-space applications the ability to monitor and control AMD GPU devices on Linux systems. It offers socket and device handle abstractions for better hardware representation, supports querying device information like temperature, power consumption, and performance metrics, and includes comprehensive device management capabilities.

3

4

## Package Information

5

6

- **Package Name**: amdsmi

7

- **Language**: C/C++ (primary), Python (bindings)

8

- **Installation**: Available as part of ROCm installation or built from source

9

- **Supported Platforms**: Linux bare metal and Linux virtual machine guest for AMD GPUs

10

11

## Core Imports

12

13

### C/C++

14

15

```c

16

#include "amd_smi/amdsmi.h"

17

```

18

19

### Python

20

21

```python

22

import amdsmi

23

```

24

25

For specific functionality imports:

26

27

```python

28

from amdsmi import (

29

amdsmi_init, amdsmi_shut_down,

30

amdsmi_get_socket_handles, amdsmi_get_processor_handles,

31

amdsmi_get_gpu_activity, amdsmi_get_power_info

32

)

33

```

34

35

## Basic Usage

36

37

### C/C++ Example

38

39

```c

40

#include <iostream>

41

#include <vector>

42

#include "amd_smi/amdsmi.h"

43

44

int main() {

45

amdsmi_status_t ret;

46

uint32_t socket_count = 0;

47

48

// Initialize AMD SMI for GPUs only

49

ret = amdsmi_init(AMDSMI_INIT_AMD_GPUS);

50

if (ret != AMDSMI_STATUS_SUCCESS) {

51

return 1;

52

}

53

54

// Get socket count

55

ret = amdsmi_get_socket_handles(&socket_count, nullptr);

56

std::vector<amdsmi_socket_handle> sockets(socket_count);

57

ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]);

58

59

// Get devices for first socket

60

uint32_t device_count = 0;

61

ret = amdsmi_get_processor_handles(sockets[0], &device_count, nullptr);

62

std::vector<amdsmi_processor_handle> devices(device_count);

63

ret = amdsmi_get_processor_handles(sockets[0], &device_count, &devices[0]);

64

65

// Get temperature for first device

66

int64_t temperature = 0;

67

ret = amdsmi_get_temp_metric(devices[0], TEMPERATURE_TYPE_EDGE,

68

AMDSMI_TEMP_CURRENT, &temperature);

69

std::cout << "GPU Temperature: " << temperature << "C" << std::endl;

70

71

// Cleanup

72

amdsmi_shut_down();

73

return 0;

74

}

75

```

76

77

### Python Example

78

79

```python

80

import amdsmi

81

82

# Initialize the library

83

amdsmi.amdsmi_init()

84

85

try:

86

# Get socket handles

87

sockets = amdsmi.amdsmi_get_socket_handles()

88

89

if sockets:

90

# Get processor handles for first socket

91

processors = amdsmi.amdsmi_get_processor_handles(sockets[0])

92

93

if processors:

94

# Get GPU activity information

95

activity = amdsmi.amdsmi_get_gpu_activity(processors[0])

96

print(f"GFX Activity: {activity.gfx_activity}%")

97

98

# Get power information

99

power_info = amdsmi.amdsmi_get_power_info(processors[0])

100

print(f"Socket Power: {power_info.average_socket_power}W")

101

102

finally:

103

# Always shut down the library

104

amdsmi.amdsmi_shut_down()

105

```

106

107

## Architecture

108

109

The AMD SMI Library uses a hierarchical device representation:

110

111

- **Sockets**: Physical hardware sockets that can contain multiple processors

112

- **Processors**: Individual processing units (GPUs, CPUs) within a socket

113

- **Handles**: Opaque references to sockets and processors used throughout the API

114

- **Initialization Flags**: Control which processor types are discovered and monitored

115

116

This design enables the library to provide a unified interface for mixed-processor systems while maintaining efficient resource management and clear hardware topology representation.

117

118

## Capabilities

119

120

### Library Management

121

122

Core library initialization, shutdown, and version management functions that must be called before using other AMD SMI functionality.

123

124

```c { .api }

125

amdsmi_status_t amdsmi_init(uint64_t init_flags);

126

amdsmi_status_t amdsmi_shut_down(void);

127

amdsmi_status_t amdsmi_get_lib_version(amdsmi_version_t *version);

128

```

129

130

[Library Management](./library-management.md)

131

132

### Device Discovery

133

134

Functions for discovering and identifying AMD processors, sockets, and their properties in the system.

135

136

```c { .api }

137

amdsmi_status_t amdsmi_get_socket_handles(uint32_t *socket_count, amdsmi_socket_handle *socket_handles);

138

amdsmi_status_t amdsmi_get_processor_handles(amdsmi_socket_handle socket_handle, uint32_t *processor_count, amdsmi_processor_handle *processor_handles);

139

amdsmi_status_t amdsmi_get_processor_type(amdsmi_processor_handle processor_handle, processor_type_t *processor_type);

140

amdsmi_status_t amdsmi_get_gpu_device_uuid(amdsmi_processor_handle processor_handle, unsigned int *uuid_length, char *uuid);

141

```

142

143

[Device Discovery](./device-discovery.md)

144

145

### Hardware Information

146

147

Static hardware information including ASIC details, board information, firmware versions, and driver information.

148

149

```c { .api }

150

amdsmi_status_t amdsmi_get_gpu_asic_info(amdsmi_processor_handle processor_handle, amdsmi_asic_info_t *info);

151

amdsmi_status_t amdsmi_get_gpu_board_info(amdsmi_processor_handle processor_handle, amdsmi_board_info_t *info);

152

amdsmi_status_t amdsmi_get_fw_info(amdsmi_processor_handle processor_handle, amdsmi_fw_info_t *info);

153

amdsmi_status_t amdsmi_get_gpu_driver_version(amdsmi_processor_handle processor_handle, int *length, char *version);

154

```

155

156

[Hardware Information](./hardware-information.md)

157

158

### Performance Monitoring

159

160

Real-time monitoring of GPU performance metrics including activity levels, clock frequencies, power consumption, and temperature measurements.

161

162

```c { .api }

163

amdsmi_status_t amdsmi_get_gpu_activity(amdsmi_processor_handle processor_handle, amdsmi_engine_usage_t *info);

164

amdsmi_status_t amdsmi_get_power_info(amdsmi_processor_handle processor_handle, amdsmi_power_info_t *info);

165

amdsmi_status_t amdsmi_get_temp_metric(amdsmi_processor_handle processor_handle, amdsmi_temperature_type_t sensor_type, amdsmi_temperature_metric_t metric, int64_t *temperature);

166

amdsmi_status_t amdsmi_get_clk_freq(amdsmi_processor_handle processor_handle, amdsmi_clk_type_t clk_type, amdsmi_frequencies_t *f);

167

```

168

169

[Performance Monitoring](./performance-monitoring.md)

170

171

### Memory Management

172

173

Memory information including total memory, usage statistics, VRAM details, and memory error management.

174

175

```c { .api }

176

amdsmi_status_t amdsmi_get_gpu_memory_total(amdsmi_processor_handle processor_handle, amdsmi_memory_type_t mem_type, uint64_t *total);

177

amdsmi_status_t amdsmi_get_gpu_memory_usage(amdsmi_processor_handle processor_handle, amdsmi_memory_type_t mem_type, uint64_t *used);

178

amdsmi_status_t amdsmi_get_gpu_vram_usage(amdsmi_processor_handle processor_handle, amdsmi_vram_info_t *info);

179

amdsmi_status_t amdsmi_get_gpu_bad_page_info(amdsmi_processor_handle processor_handle, uint32_t *num_pages, amdsmi_retired_page_record_t *info);

180

```

181

182

[Memory Management](./memory-management.md)

183

184

### PCIe and Connectivity

185

186

PCIe interface monitoring, bandwidth management, topology discovery, and multi-GPU connectivity features.

187

188

```c { .api }

189

amdsmi_status_t amdsmi_get_gpu_pci_bandwidth(amdsmi_processor_handle processor_handle, amdsmi_pcie_bandwidth_t *bandwidth);

190

amdsmi_status_t amdsmi_get_gpu_pci_throughput(amdsmi_processor_handle processor_handle, uint64_t *sent, uint64_t *received, uint64_t *max_pkt_sz);

191

amdsmi_status_t amdsmi_get_gpu_topo_numa_affinity(amdsmi_processor_handle processor_handle, uint32_t *numa_node);

192

amdsmi_status_t amdsmi_topo_get_link_type(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *hops, AMDSMI_IO_LINK_TYPE *type);

193

```

194

195

[PCIe and Connectivity](./pcie-connectivity.md)

196

197

### Performance Control

198

199

Advanced performance tuning including clock control, power management, fan control, and overclocking capabilities. Note: Many control functions require root privileges and are not supported in virtual environments.

200

201

```c { .api }

202

amdsmi_status_t amdsmi_set_gpu_perf_level(amdsmi_processor_handle processor_handle, amdsmi_dev_perf_level_t perf_lvl);

203

amdsmi_status_t amdsmi_set_power_cap(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, uint64_t cap);

204

amdsmi_status_t amdsmi_set_gpu_fan_speed(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, uint64_t speed);

205

amdsmi_status_t amdsmi_set_clk_freq(amdsmi_processor_handle processor_handle, amdsmi_clk_type_t clk_type, uint64_t freq_bitmask);

206

```

207

208

[Performance Control](./performance-control.md)

209

210

### Error Handling and RAS

211

212

Error detection, RAS (Reliability, Availability, Serviceability) features, ECC error monitoring, and comprehensive error reporting.

213

214

```c { .api }

215

amdsmi_status_t amdsmi_get_gpu_ecc_count(amdsmi_processor_handle processor_handle, amdsmi_gpu_block_t block, amdsmi_error_count_t *ec);

216

amdsmi_status_t amdsmi_get_gpu_ecc_enabled(amdsmi_processor_handle processor_handle, uint64_t *enabled_blocks);

217

amdsmi_status_t amdsmi_status_code_to_string(amdsmi_status_t status, const char **status_string);

218

amdsmi_status_t amdsmi_get_gpu_ras_block_features_enabled(amdsmi_processor_handle processor_handle, amdsmi_gpu_block_t block, amdsmi_ras_err_state_t *state);

219

```

220

221

[Error Handling and RAS](./error-handling-ras.md)

222

223

### Process and System Information

224

225

Process monitoring, system-level GPU usage information, and multi-process GPU utilization tracking.

226

227

```c { .api }

228

amdsmi_status_t amdsmi_get_gpu_process_list(amdsmi_processor_handle processor_handle, uint32_t *max_processes, amdsmi_process_handle_t *list);

229

amdsmi_status_t amdsmi_get_gpu_process_info(amdsmi_processor_handle processor_handle, amdsmi_process_handle_t process, amdsmi_proc_info_t *info);

230

amdsmi_status_t amdsmi_get_gpu_compute_process_info(amdsmi_process_info_t *procs, uint32_t *num_items);

231

```

232

233

[Process and System Information](./process-system-info.md)

234

235

### Event Monitoring

236

237

Asynchronous event notification system for GPU state changes, thermal events, and error conditions.

238

239

```c { .api }

240

amdsmi_status_t amdsmi_init_gpu_event_notification(amdsmi_processor_handle processor_handle);

241

amdsmi_status_t amdsmi_set_gpu_event_notification_mask(amdsmi_processor_handle processor_handle, uint64_t mask);

242

amdsmi_status_t amdsmi_get_gpu_event_notification(int timeout_ms, uint32_t *num_elem, amdsmi_evt_notification_data_t *data);

243

amdsmi_status_t amdsmi_stop_gpu_event_notification(amdsmi_processor_handle processor_handle);

244

```

245

246

[Event Monitoring](./event-monitoring.md)

247

248

### Performance Counters

249

250

Low-level performance counter management for detailed GPU profiling and performance analysis.

251

252

```c { .api }

253

amdsmi_status_t amdsmi_gpu_counter_group_supported(amdsmi_processor_handle processor_handle, amdsmi_event_group_t group);

254

amdsmi_status_t amdsmi_gpu_create_counter(amdsmi_processor_handle processor_handle, amdsmi_event_type_t type, amdsmi_event_handle_t *evnt_handle);

255

amdsmi_status_t amdsmi_gpu_destroy_counter(amdsmi_event_handle_t evnt_handle);

256

amdsmi_status_t amdsmi_gpu_control_counter(amdsmi_event_handle_t evt_handle, amdsmi_counter_command_t cmd, void *cmd_args);

257

amdsmi_status_t amdsmi_gpu_read_counter(amdsmi_event_handle_t evt_handle, amdsmi_counter_value_t *value);

258

```

259

260

[Performance Counters](./performance-counters.md)

261

262

## Core Types

263

264

### Handle Types

265

266

```c { .api }

267

typedef void *amdsmi_socket_handle;

268

typedef void *amdsmi_processor_handle;

269

typedef uint32_t amdsmi_process_handle_t;

270

typedef uintptr_t amdsmi_event_handle_t;

271

```

272

273

### Status and Initialization

274

275

```c { .api }

276

typedef enum {

277

AMDSMI_STATUS_SUCCESS = 0,

278

AMDSMI_STATUS_INVAL = 1,

279

AMDSMI_STATUS_NOT_SUPPORTED = 2,

280

AMDSMI_STATUS_NOT_YET_IMPLEMENTED = 3,

281

AMDSMI_STATUS_FAIL_LOAD_MODULE = 4,

282

AMDSMI_STATUS_FAIL_LOAD_SYMBOL = 5,

283

// ... additional error codes

284

AMDSMI_STATUS_UNKNOWN_ERROR = 0xFFFFFFFF

285

} amdsmi_status_t;

286

287

typedef enum {

288

AMDSMI_INIT_ALL_PROCESSORS = 0x0,

289

AMDSMI_INIT_AMD_CPUS = (1 << 0),

290

AMDSMI_INIT_AMD_GPUS = (1 << 1),

291

AMDSMI_INIT_NON_AMD_CPUS = (1 << 2),

292

AMDSMI_INIT_NON_AMD_GPUS = (1 << 3)

293

} amdsmi_init_flags_t;

294

295

typedef enum {

296

UNKNOWN = 0,

297

AMD_GPU,

298

AMD_CPU,

299

NON_AMD_GPU,

300

NON_AMD_CPU

301

} processor_type_t;

302

```

303

304

### Version Information

305

306

```c { .api }

307

typedef struct {

308

uint32_t year;

309

uint32_t major;

310

uint32_t minor;

311

uint32_t release;

312

const char *build;

313

} amdsmi_version_t;

314

```