or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

device-discovery.mderror-handling-ras.mdevent-monitoring.mdhardware-information.mdindex.mdlibrary-management.mdmemory-management.mdpcie-connectivity.mdperformance-control.mdperformance-counters.mdperformance-monitoring.mdprocess-system-info.md

performance-monitoring.mddocs/

0

# Performance Monitoring

1

2

Real-time monitoring of GPU performance metrics including activity levels, clock frequencies, power consumption, temperature measurements, and comprehensive system metrics.

3

4

## Capabilities

5

6

### GPU Activity Monitoring

7

8

Get current GPU engine utilization percentages across different processing units.

9

10

```c { .api }

11

amdsmi_status_t amdsmi_get_gpu_activity(amdsmi_processor_handle processor_handle, amdsmi_engine_usage_t *info);

12

```

13

14

**Parameters:**

15

- `processor_handle`: Handle to the GPU processor

16

- `info`: Pointer to receive engine usage information

17

18

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

19

20

**Usage Example:**

21

22

```c

23

amdsmi_engine_usage_t usage;

24

amdsmi_status_t ret = amdsmi_get_gpu_activity(processor, &usage);

25

if (ret == AMDSMI_STATUS_SUCCESS) {

26

printf("GFX Activity: %u%%\n", usage.gfx_activity);

27

printf("UMC Activity: %u%%\n", usage.umc_activity);

28

printf("MM Activity: %u%%\n", usage.mm_activity);

29

}

30

```

31

32

### Utilization Counters

33

34

Get coarse grain utilization counters that provide minimally invasive GPU usage information.

35

36

```c { .api }

37

amdsmi_status_t amdsmi_get_utilization_count(amdsmi_processor_handle processor_handle, amdsmi_utilization_counter_t utilization_counters[], uint32_t count, uint64_t *timestamp);

38

```

39

40

**Parameters:**

41

- `processor_handle`: Handle to the GPU processor

42

- `utilization_counters`: Array of utilization counter structures (caller must set types)

43

- `count`: Number of counters in the array

44

- `timestamp`: Pointer to receive timestamp of measurements

45

46

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

47

48

### Power Information

49

50

Get current power consumption and voltage measurements from the GPU.

51

52

```c { .api }

53

amdsmi_status_t amdsmi_get_power_info(amdsmi_processor_handle processor_handle, amdsmi_power_info_t *info);

54

```

55

56

**Parameters:**

57

- `processor_handle`: Handle to the GPU processor

58

- `info`: Pointer to receive power information

59

60

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

61

62

**Usage Example:**

63

64

```c

65

amdsmi_power_info_t power_info;

66

amdsmi_status_t ret = amdsmi_get_power_info(processor, &power_info);

67

if (ret == AMDSMI_STATUS_SUCCESS) {

68

printf("Socket Power: %u W\n", power_info.average_socket_power);

69

printf("GFX Voltage: %u mV\n", power_info.gfx_voltage);

70

printf("SOC Voltage: %u mV\n", power_info.soc_voltage);

71

printf("Memory Voltage: %u mV\n", power_info.mem_voltage);

72

printf("Power Limit: %u W\n", power_info.power_limit);

73

}

74

```

75

76

### Energy Accumulator

77

78

Get energy consumption accumulator with high precision measurements.

79

80

```c { .api }

81

amdsmi_status_t amdsmi_get_energy_count(amdsmi_processor_handle processor_handle, uint64_t *power, float *counter_resolution, uint64_t *timestamp);

82

```

83

84

**Parameters:**

85

- `processor_handle`: Handle to the GPU processor

86

- `power`: Pointer to receive energy counter value

87

- `counter_resolution`: Pointer to receive counter resolution in micro Joules

88

- `timestamp`: Pointer to receive timestamp with 1ns resolution

89

90

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

91

92

### Temperature Monitoring

93

94

Get temperature measurements from various sensors on the GPU.

95

96

```c { .api }

97

amdsmi_status_t amdsmi_get_temp_metric(amdsmi_processor_handle processor_handle, amdsmi_temperature_type_t sensor_type, amdsmi_temperature_metric_t metric, int64_t *temperature);

98

```

99

100

**Parameters:**

101

- `processor_handle`: Handle to the GPU processor

102

- `sensor_type`: Type of temperature sensor (EDGE, JUNCTION, VRAM, HBM, etc.)

103

- `metric`: Temperature metric to retrieve (CURRENT, MAX, MIN, CRITICAL, etc.)

104

- `temperature`: Pointer to receive temperature in millidegrees Celsius

105

106

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

107

108

**Usage Example:**

109

110

```c

111

int64_t temperature;

112

amdsmi_status_t ret = amdsmi_get_temp_metric(processor,

113

TEMPERATURE_TYPE_EDGE,

114

AMDSMI_TEMP_CURRENT,

115

&temperature);

116

if (ret == AMDSMI_STATUS_SUCCESS) {

117

printf("Edge Temperature: %ld mC (%.1f C)\n", temperature, temperature / 1000.0);

118

}

119

120

// Get junction temperature maximum

121

ret = amdsmi_get_temp_metric(processor,

122

TEMPERATURE_TYPE_JUNCTION,

123

AMDSMI_TEMP_MAX,

124

&temperature);

125

```

126

127

### Voltage Monitoring

128

129

Get voltage measurements from GPU voltage rails.

130

131

```c { .api }

132

amdsmi_status_t amdsmi_get_gpu_volt_metric(amdsmi_processor_handle processor_handle, amdsmi_voltage_type_t sensor_type, amdsmi_voltage_metric_t metric, int64_t *voltage);

133

```

134

135

**Parameters:**

136

- `processor_handle`: Handle to the GPU processor

137

- `sensor_type`: Type of voltage sensor

138

- `metric`: Voltage metric to retrieve (CURRENT, MAX, MIN, etc.)

139

- `voltage`: Pointer to receive voltage in millivolts

140

141

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

142

143

### Clock Frequency Information

144

145

Get available clock frequencies and current frequency selection for different clock domains.

146

147

```c { .api }

148

amdsmi_status_t amdsmi_get_clk_freq(amdsmi_processor_handle processor_handle, amdsmi_clk_type_t clk_type, amdsmi_frequencies_t *f);

149

```

150

151

**Parameters:**

152

- `processor_handle`: Handle to the GPU processor

153

- `clk_type`: Type of clock (SYS/GFX, MEM, SOC, etc.)

154

- `f`: Pointer to receive frequency information

155

156

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

157

158

### Clock Measurements

159

160

Get real-time clock measurements averaged over 1 second.

161

162

```c { .api }

163

amdsmi_status_t amdsmi_get_clock_info(amdsmi_processor_handle processor_handle, amdsmi_clk_type_t clk_type, amdsmi_clk_info_t *info);

164

```

165

166

**Parameters:**

167

- `processor_handle`: Handle to the GPU processor

168

- `clk_type`: Type of clock to measure

169

- `info`: Pointer to receive clock measurement information

170

171

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

172

173

**Usage Example:**

174

175

```c

176

amdsmi_clk_info_t clk_info;

177

amdsmi_status_t ret = amdsmi_get_clock_info(processor, CLK_TYPE_GFX, &clk_info);

178

if (ret == AMDSMI_STATUS_SUCCESS) {

179

printf("GFX Clock - Current: %u MHz, Min: %u MHz, Max: %u MHz\n",

180

clk_info.cur_clk, clk_info.min_clk, clk_info.max_clk);

181

}

182

```

183

184

### Comprehensive GPU Metrics

185

186

Get comprehensive GPU metrics structure with extensive telemetry data.

187

188

```c { .api }

189

amdsmi_status_t amdsmi_get_gpu_metrics_info(amdsmi_processor_handle processor_handle, amdsmi_gpu_metrics_t *pgpu_metrics);

190

```

191

192

**Parameters:**

193

- `processor_handle`: Handle to the GPU processor

194

- `pgpu_metrics`: Pointer to receive comprehensive GPU metrics

195

196

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

197

198

### Fan Speed Monitoring

199

200

Monitor fan speeds in RPMs and as percentage of maximum speed.

201

202

```c { .api }

203

amdsmi_status_t amdsmi_get_gpu_fan_rpms(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, int64_t *speed);

204

amdsmi_status_t amdsmi_get_gpu_fan_speed(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, int64_t *speed);

205

amdsmi_status_t amdsmi_get_gpu_fan_speed_max(amdsmi_processor_handle processor_handle, uint32_t sensor_ind, uint64_t *max_speed);

206

```

207

208

**Parameters:**

209

- `processor_handle`: Handle to the GPU processor

210

- `sensor_ind`: 0-based sensor index (usually 0)

211

- `speed`: Pointer to receive fan speed (RPMs or 0-255 scale)

212

- `max_speed`: Pointer to receive maximum fan speed

213

214

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

215

216

## Python API

217

218

### GPU Activity

219

220

```python { .api }

221

def amdsmi_get_gpu_activity(processor_handle):

222

"""

223

Get GPU engine activity percentages.

224

225

Args:

226

processor_handle: GPU processor handle

227

228

Returns:

229

dict: Engine usage with keys 'gfx_activity', 'umc_activity', 'mm_activity'

230

231

Raises:

232

AmdSmiException: If activity query fails

233

"""

234

```

235

236

### Power Information

237

238

```python { .api }

239

def amdsmi_get_power_info(processor_handle):

240

"""

241

Get GPU power and voltage information.

242

243

Args:

244

processor_handle: GPU processor handle

245

246

Returns:

247

dict: Power info with keys 'average_socket_power', 'gfx_voltage',

248

'soc_voltage', 'mem_voltage', 'power_limit'

249

250

Raises:

251

AmdSmiException: If power query fails

252

"""

253

```

254

255

### Temperature Monitoring

256

257

```python { .api }

258

def amdsmi_get_temp_metric(processor_handle, sensor_type, metric):

259

"""

260

Get temperature measurement from GPU sensor.

261

262

Args:

263

processor_handle: GPU processor handle

264

sensor_type (AmdSmiTemperatureType): Temperature sensor type

265

metric (AmdSmiTemperatureMetric): Temperature metric to retrieve

266

267

Returns:

268

int: Temperature in millidegrees Celsius

269

270

Raises:

271

AmdSmiException: If temperature query fails

272

"""

273

```

274

275

### Clock Information

276

277

```python { .api }

278

def amdsmi_get_clock_info(processor_handle, clk_type):

279

"""

280

Get real-time clock measurements.

281

282

Args:

283

processor_handle: GPU processor handle

284

clk_type (AmdSmiClkType): Clock type to measure

285

286

Returns:

287

dict: Clock info with keys 'cur_clk', 'min_clk', 'max_clk'

288

289

Raises:

290

AmdSmiException: If clock query fails

291

"""

292

```

293

294

**Python Usage Example:**

295

296

```python

297

import amdsmi

298

from amdsmi import AmdSmiTemperatureType, AmdSmiTemperatureMetric, AmdSmiClkType

299

300

# Initialize and get GPU handle

301

amdsmi.amdsmi_init()

302

303

try:

304

sockets = amdsmi.amdsmi_get_socket_handles()

305

processors = amdsmi.amdsmi_get_processor_handles(sockets[0])

306

gpu = processors[0] # First GPU

307

308

# Get GPU activity

309

activity = amdsmi.amdsmi_get_gpu_activity(gpu)

310

print(f"GPU Usage: GFX={activity['gfx_activity']}%, "

311

f"Memory={activity['umc_activity']}%, MM={activity['mm_activity']}%")

312

313

# Get power information

314

power = amdsmi.amdsmi_get_power_info(gpu)

315

print(f"Power: {power['average_socket_power']}W, "

316

f"GFX Voltage: {power['gfx_voltage']}mV")

317

318

# Get temperature

319

temp = amdsmi.amdsmi_get_temp_metric(gpu,

320

AmdSmiTemperatureType.TEMPERATURE_TYPE_EDGE,

321

AmdSmiTemperatureMetric.AMDSMI_TEMP_CURRENT)

322

print(f"GPU Temperature: {temp/1000:.1f}°C")

323

324

# Get clock information

325

gfx_clk = amdsmi.amdsmi_get_clock_info(gpu, AmdSmiClkType.CLK_TYPE_GFX)

326

print(f"GFX Clock: {gfx_clk['cur_clk']} MHz "

327

f"(Range: {gfx_clk['min_clk']}-{gfx_clk['max_clk']} MHz)")

328

329

finally:

330

amdsmi.amdsmi_shut_down()

331

```

332

333

## Types

334

335

### Engine Usage Structure

336

337

```c { .api }

338

typedef struct {

339

uint32_t gfx_activity; // GFX engine activity percentage (0-100)

340

uint32_t umc_activity; // Memory controller activity percentage (0-100)

341

uint32_t mm_activity; // Multimedia engine activity percentage (0-100)

342

uint32_t reserved[13]; // Reserved for future use

343

} amdsmi_engine_usage_t;

344

```

345

346

### Power Information Structure

347

348

```c { .api }

349

typedef struct {

350

uint32_t average_socket_power; // Average socket power in Watts

351

uint32_t gfx_voltage; // GFX voltage in millivolts

352

uint32_t soc_voltage; // SOC voltage in millivolts

353

uint32_t mem_voltage; // Memory voltage in millivolts

354

uint32_t power_limit; // Power limit in Watts

355

uint32_t reserved[11]; // Reserved for future use

356

} amdsmi_power_info_t;

357

```

358

359

### Clock Information Structure

360

361

```c { .api }

362

typedef struct {

363

uint32_t cur_clk; // Current clock frequency in MHz

364

uint32_t min_clk; // Minimum clock frequency in MHz

365

uint32_t max_clk; // Maximum clock frequency in MHz

366

uint32_t reserved[5]; // Reserved for future use

367

} amdsmi_clk_info_t;

368

```

369

370

### Frequency Information Structure

371

372

```c { .api }

373

typedef struct {

374

uint32_t num_supported; // Number of supported frequencies

375

uint32_t current; // Current frequency index

376

uint64_t frequency[AMDSMI_MAX_NUM_FREQUENCIES]; // Array of available frequencies in Hz

377

} amdsmi_frequencies_t;

378

```

379

380

### Utilization Counter Structure

381

382

```c { .api }

383

typedef struct {

384

AMDSMI_UTILIZATION_COUNTER_TYPE type; // Counter type

385

uint64_t value; // Counter value

386

} amdsmi_utilization_counter_t;

387

```

388

389

### Temperature Types

390

391

```c { .api }

392

typedef enum {

393

TEMPERATURE_TYPE_EDGE, // Edge temperature sensor

394

TEMPERATURE_TYPE_JUNCTION, // Junction temperature sensor

395

TEMPERATURE_TYPE_VRAM, // VRAM temperature sensor

396

TEMPERATURE_TYPE_HBM_0, // HBM stack 0 temperature

397

TEMPERATURE_TYPE_HBM_1, // HBM stack 1 temperature

398

TEMPERATURE_TYPE_HBM_2, // HBM stack 2 temperature

399

TEMPERATURE_TYPE_HBM_3, // HBM stack 3 temperature

400

TEMPERATURE_TYPE_PLX // PLX temperature sensor

401

} amdsmi_temperature_type_t;

402

```

403

404

### Temperature Metrics

405

406

```c { .api }

407

typedef enum {

408

AMDSMI_TEMP_CURRENT, // Current temperature

409

AMDSMI_TEMP_MAX, // Maximum temperature

410

AMDSMI_TEMP_MIN, // Minimum temperature

411

AMDSMI_TEMP_MAX_HYST, // Maximum temperature hysteresis

412

AMDSMI_TEMP_MIN_HYST, // Minimum temperature hysteresis

413

AMDSMI_TEMP_CRITICAL, // Critical temperature threshold

414

AMDSMI_TEMP_CRITICAL_HYST, // Critical temperature hysteresis

415

AMDSMI_TEMP_EMERGENCY, // Emergency temperature threshold

416

AMDSMI_TEMP_EMERGENCY_HYST, // Emergency temperature hysteresis

417

AMDSMI_TEMP_CRIT_MIN, // Critical minimum temperature

418

AMDSMI_TEMP_CRIT_MIN_HYST, // Critical minimum hysteresis

419

AMDSMI_TEMP_OFFSET, // Temperature offset

420

AMDSMI_TEMP_LOWEST, // Historical minimum temperature

421

AMDSMI_TEMP_HIGHEST // Historical maximum temperature

422

} amdsmi_temperature_metric_t;

423

```

424

425

### Clock Types

426

427

```c { .api }

428

typedef enum {

429

CLK_TYPE_SYS = 0x0, // System/GFX clock

430

CLK_TYPE_GFX = CLK_TYPE_SYS,

431

CLK_TYPE_DF, // Data Fabric clock

432

CLK_TYPE_DCEF, // Display Controller Engine clock

433

CLK_TYPE_SOC, // SOC clock

434

CLK_TYPE_MEM, // Memory clock

435

CLK_TYPE_PCIE, // PCIe clock

436

CLK_TYPE_VCLK0, // Video clock 0

437

CLK_TYPE_VCLK1, // Video clock 1

438

CLK_TYPE_DCLK0, // Display clock 0

439

CLK_TYPE_DCLK1 // Display clock 1

440

} amdsmi_clk_type_t;

441

```

442

443

## Constants

444

445

```c { .api }

446

#define AMDSMI_MAX_NUM_FREQUENCIES 32 // Maximum supported frequencies

447

#define AMDSMI_MAX_FAN_SPEED 255 // Maximum fan speed value

448

#define AMDSMI_NUM_HBM_INSTANCES 4 // Number of HBM instances

449

```

450

451

## Important Notes

452

453

1. **Virtual Machine Limitations**: Many monitoring functions are not supported on virtual machine guests.

454

455

2. **Sensor Availability**: Not all sensors may be available on all GPU models. Functions will return appropriate error codes for unavailable sensors.

456

457

3. **Units**: Temperature values are in millidegrees Celsius, voltages in millivolts, power in watts, and frequencies in MHz or Hz as specified.

458

459

4. **Sampling Rates**: Some metrics are averaged over time periods (e.g., 1 second for clock measurements).

460

461

5. **Precision**: Energy counters provide high-precision accumulation while power measurements provide current instantaneous values.

462

463

6. **Multi-Sensor Support**: Some GPUs have multiple sensors of the same type, accessed via the `sensor_ind` parameter.