or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

device-discovery.mderror-handling-ras.mdevent-monitoring.mdhardware-information.mdindex.mdlibrary-management.mdmemory-management.mdpcie-connectivity.mdperformance-control.mdperformance-counters.mdperformance-monitoring.mdprocess-system-info.md

performance-counters.mddocs/

0

# Performance Counters

1

2

Low-level performance counter management for detailed GPU profiling and performance analysis. Performance counters provide access to hardware-level metrics for advanced performance tuning and debugging.

3

4

## Capabilities

5

6

### Counter Group Support

7

8

Check if a specific performance counter group is supported by the GPU.

9

10

```c { .api }

11

amdsmi_status_t amdsmi_gpu_counter_group_supported(amdsmi_processor_handle processor_handle, amdsmi_event_group_t group);

12

```

13

14

**Parameters:**

15

- `processor_handle`: Handle to the GPU processor

16

- `group`: Performance counter group to check

17

18

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS if supported, error code otherwise

19

20

**Usage Example:**

21

22

```c

23

amdsmi_status_t ret = amdsmi_gpu_counter_group_supported(processor, AMDSMI_EVNT_GRP_XGMI);

24

if (ret == AMDSMI_STATUS_SUCCESS) {

25

printf("XGMI counter group is supported\n");

26

} else {

27

printf("XGMI counter group not available\n");

28

}

29

```

30

31

### Counter Creation

32

33

Create a performance counter for monitoring specific hardware events.

34

35

```c { .api }

36

amdsmi_status_t amdsmi_gpu_create_counter(amdsmi_processor_handle processor_handle, amdsmi_event_type_t type, amdsmi_event_handle_t *evnt_handle);

37

```

38

39

**Parameters:**

40

- `processor_handle`: Handle to the GPU processor

41

- `type`: Type of performance counter event to monitor

42

- `evnt_handle`: Pointer to receive the created counter handle

43

44

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

45

46

**Usage Example:**

47

48

```c

49

amdsmi_event_handle_t counter_handle;

50

amdsmi_status_t ret = amdsmi_gpu_create_counter(processor,

51

AMDSMI_EVNT_XGMI_0_NOP_TX,

52

&counter_handle);

53

if (ret == AMDSMI_STATUS_SUCCESS) {

54

printf("Counter created successfully, handle: 0x%lx\n", counter_handle);

55

}

56

```

57

58

### Counter Control

59

60

Control counter operations such as starting, stopping, and resetting measurements.

61

62

```c { .api }

63

amdsmi_status_t amdsmi_gpu_control_counter(amdsmi_event_handle_t evt_handle, amdsmi_counter_command_t cmd, void *cmd_args);

64

```

65

66

**Parameters:**

67

- `evt_handle`: Handle to the performance counter

68

- `cmd`: Counter command (START, STOP, RESET)

69

- `cmd_args`: Command-specific arguments (may be NULL)

70

71

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

72

73

**Usage Example:**

74

75

```c

76

// Start the counter

77

amdsmi_status_t ret = amdsmi_gpu_control_counter(counter_handle,

78

AMDSMI_CNTR_CMD_START,

79

NULL);

80

if (ret == AMDSMI_STATUS_SUCCESS) {

81

printf("Counter started\n");

82

83

// ... run workload ...

84

85

// Stop the counter

86

ret = amdsmi_gpu_control_counter(counter_handle,

87

AMDSMI_CNTR_CMD_STOP,

88

NULL);

89

}

90

```

91

92

### Counter Reading

93

94

Read the current value from a performance counter.

95

96

```c { .api }

97

amdsmi_status_t amdsmi_gpu_read_counter(amdsmi_event_handle_t evt_handle, amdsmi_counter_value_t *value);

98

```

99

100

**Parameters:**

101

- `evt_handle`: Handle to the performance counter

102

- `value`: Pointer to receive counter value and metadata

103

104

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

105

106

**Usage Example:**

107

108

```c

109

amdsmi_counter_value_t counter_value;

110

amdsmi_status_t ret = amdsmi_gpu_read_counter(counter_handle, &counter_value);

111

if (ret == AMDSMI_STATUS_SUCCESS) {

112

printf("Counter Value: %llu\n", counter_value.value);

113

printf("Time Enabled: %llu ns\n", counter_value.time_enabled);

114

printf("Time Running: %llu ns\n", counter_value.time_running);

115

}

116

```

117

118

### Counter Destruction

119

120

Destroy a performance counter and free associated resources.

121

122

```c { .api }

123

amdsmi_status_t amdsmi_gpu_destroy_counter(amdsmi_event_handle_t evnt_handle);

124

```

125

126

**Parameters:**

127

- `evnt_handle`: Handle to the performance counter to destroy

128

129

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

130

131

**Usage Example:**

132

133

```c

134

amdsmi_status_t ret = amdsmi_gpu_destroy_counter(counter_handle);

135

if (ret == AMDSMI_STATUS_SUCCESS) {

136

printf("Counter destroyed successfully\n");

137

}

138

```

139

140

### Available Counters Query

141

142

Get the number of available counters for a specific event group.

143

144

```c { .api }

145

amdsmi_status_t amdsmi_get_gpu_available_counters(amdsmi_processor_handle processor_handle, amdsmi_event_group_t grp, uint32_t *available);

146

```

147

148

**Parameters:**

149

- `processor_handle`: Handle to the GPU processor

150

- `grp`: Performance counter group to query

151

- `available`: Pointer to receive number of available counters

152

153

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

154

155

**Usage Example:**

156

157

```c

158

uint32_t available_counters;

159

amdsmi_status_t ret = amdsmi_get_gpu_available_counters(processor,

160

AMDSMI_EVNT_GRP_XGMI,

161

&available_counters);

162

if (ret == AMDSMI_STATUS_SUCCESS) {

163

printf("Available XGMI counters: %u\n", available_counters);

164

}

165

```

166

167

## Python API

168

169

### Counter Group Support

170

171

```python { .api }

172

def amdsmi_gpu_counter_group_supported(processor_handle, group):

173

"""

174

Check if a counter group is supported.

175

176

Args:

177

processor_handle: GPU processor handle

178

group: Counter group type from AmdSmiEventGroup

179

180

Returns:

181

bool: True if supported, False otherwise

182

183

Raises:

184

AmdSmiException: If query fails

185

"""

186

```

187

188

### Counter Management

189

190

```python { .api }

191

def amdsmi_gpu_create_counter(processor_handle, event_type):

192

"""

193

Create a performance counter.

194

195

Args:

196

processor_handle: GPU processor handle

197

event_type: Event type from AmdSmiEventType

198

199

Returns:

200

int: Counter handle

201

202

Raises:

203

AmdSmiException: If counter creation fails

204

"""

205

206

def amdsmi_gpu_control_counter(counter_handle, command, args=None):

207

"""

208

Control counter operations.

209

210

Args:

211

counter_handle (int): Counter handle

212

command: Command from AmdSmiCounterCommand

213

args: Optional command arguments

214

215

Raises:

216

AmdSmiException: If control operation fails

217

"""

218

219

def amdsmi_gpu_read_counter(counter_handle):

220

"""

221

Read counter value.

222

223

Args:

224

counter_handle (int): Counter handle

225

226

Returns:

227

dict: Counter value with keys 'value', 'time_enabled', 'time_running'

228

229

Raises:

230

AmdSmiException: If counter read fails

231

"""

232

233

def amdsmi_gpu_destroy_counter(counter_handle):

234

"""

235

Destroy a performance counter.

236

237

Args:

238

counter_handle (int): Counter handle

239

240

Raises:

241

AmdSmiException: If counter destruction fails

242

"""

243

```

244

245

**Python Usage Example:**

246

247

```python

248

import amdsmi

249

from amdsmi import AmdSmiEventGroup, AmdSmiEventType, AmdSmiCounterCommand

250

251

# Initialize and get GPU

252

amdsmi.amdsmi_init()

253

try:

254

sockets = amdsmi.amdsmi_get_socket_handles()

255

processors = amdsmi.amdsmi_get_processor_handles(sockets[0])

256

gpu = processors[0]

257

258

# Check if XGMI counters are supported

259

if amdsmi.amdsmi_gpu_counter_group_supported(gpu, AmdSmiEventGroup.XGMI):

260

print("XGMI counters supported")

261

262

# Get available counter count

263

available = amdsmi.amdsmi_get_gpu_available_counters(gpu, AmdSmiEventGroup.XGMI)

264

print(f"Available XGMI counters: {available}")

265

266

# Create a counter

267

counter = amdsmi.amdsmi_gpu_create_counter(gpu, AmdSmiEventType.XGMI_0_NOP_TX)

268

269

try:

270

# Start the counter

271

amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.START)

272

273

# ... run workload here ...

274

275

# Stop and read the counter

276

amdsmi.amdsmi_gpu_control_counter(counter, AmdSmiCounterCommand.STOP)

277

value = amdsmi.amdsmi_gpu_read_counter(counter)

278

279

print(f"Counter value: {value['value']}")

280

print(f"Time enabled: {value['time_enabled']} ns")

281

print(f"Time running: {value['time_running']} ns")

282

283

finally:

284

# Clean up the counter

285

amdsmi.amdsmi_gpu_destroy_counter(counter)

286

287

finally:

288

amdsmi.amdsmi_shut_down()

289

```

290

291

## Types

292

293

### Counter Value Structure

294

295

```c { .api }

296

typedef struct {

297

uint64_t value; // Counter value

298

uint64_t time_enabled; // Time counter was enabled (nanoseconds)

299

uint64_t time_running; // Time counter was actively running (nanoseconds)

300

uint64_t reserved[4]; // Reserved for future use

301

} amdsmi_counter_value_t;

302

```

303

304

### Event Groups

305

306

```c { .api }

307

typedef enum {

308

AMDSMI_EVNT_GRP_XGMI = 0, // XGMI event group

309

AMDSMI_EVNT_GRP_XGMI_DATA_OUT, // XGMI data out events

310

AMDSMI_EVNT_GRP_GMI, // GMI event group

311

AMDSMI_EVNT_GRP_INVALID = 0xFFFFFFFF // Invalid event group

312

} amdsmi_event_group_t;

313

```

314

315

### Event Types

316

317

```c { .api }

318

typedef enum {

319

AMDSMI_EVNT_XGMI_0_NOP_TX = 0, // XGMI link 0 NOP transmit

320

AMDSMI_EVNT_XGMI_0_REQ_TX, // XGMI link 0 request transmit

321

AMDSMI_EVNT_XGMI_0_RESP_TX, // XGMI link 0 response transmit

322

AMDSMI_EVNT_XGMI_0_BEATS_TX, // XGMI link 0 data beats transmit

323

AMDSMI_EVNT_XGMI_1_NOP_TX, // XGMI link 1 NOP transmit

324

AMDSMI_EVNT_XGMI_1_REQ_TX, // XGMI link 1 request transmit

325

// ... additional event types

326

AMDSMI_EVNT_LAST = 0xFFFFFFFF // Last event type marker

327

} amdsmi_event_type_t;

328

```

329

330

### Counter Commands

331

332

```c { .api }

333

typedef enum {

334

AMDSMI_CNTR_CMD_START = 0, // Start counter

335

AMDSMI_CNTR_CMD_STOP, // Stop counter

336

AMDSMI_CNTR_CMD_RESET, // Reset counter value

337

AMDSMI_CNTR_CMD_INVALID = 0xFFFFFFFF // Invalid command

338

} amdsmi_counter_command_t;

339

```

340

341

## Counter Categories

342

343

### XGMI Counters

344

Monitor traffic and performance on Infinity Fabric links between GPUs:

345

- **NOP_TX/RX**: No-operation packet transmission/reception

346

- **REQ_TX/RX**: Request packet transmission/reception

347

- **RESP_TX/RX**: Response packet transmission/reception

348

- **BEATS_TX/RX**: Data beat transmission/reception

349

350

### GMI Counters

351

Monitor Global Memory Interface activity for memory subsystem analysis.

352

353

## Important Notes

354

355

1. **Hardware Dependent**: Available counters depend on GPU generation and architecture.

356

357

2. **Resource Limits**: Limited number of counters can be active simultaneously.

358

359

3. **Precision**: Counters provide nanosecond-precision timing information.

360

361

4. **Overhead**: Performance counters introduce minimal overhead when properly managed.

362

363

5. **Group Support**: Always check group support before creating counters.

364

365

6. **Resource Cleanup**: Always destroy counters to free hardware resources.

366

367

7. **Multiplexing**: Hardware may multiplex counters when resource limits are exceeded.

368

369

8. **Root Privileges**: Some counters may require elevated privileges to access.

370

371

9. **Thread Safety**: Counter operations are thread-safe but should be coordinated across threads.

372

373

10. **Sampling**: For continuous monitoring, implement proper sampling intervals to avoid overflow.