or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

device-discovery.mderror-handling-ras.mdevent-monitoring.mdhardware-information.mdindex.mdlibrary-management.mdmemory-management.mdpcie-connectivity.mdperformance-control.mdperformance-counters.mdperformance-monitoring.mdprocess-system-info.md

pcie-connectivity.mddocs/

0

# PCIe and Connectivity

1

2

PCIe interface monitoring, bandwidth management, topology discovery, and multi-GPU connectivity features for comprehensive system topology understanding.

3

4

## Capabilities

5

6

### PCIe Bandwidth Information

7

8

Get PCIe bandwidth capabilities and limitations for a GPU device.

9

10

```c { .api }

11

amdsmi_status_t amdsmi_get_gpu_pci_bandwidth(amdsmi_processor_handle processor_handle, amdsmi_pcie_bandwidth_t *bandwidth);

12

```

13

14

**Parameters:**

15

- `processor_handle`: Handle to the GPU processor

16

- `bandwidth`: Pointer to receive PCIe bandwidth information

17

18

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

19

20

**Usage Example:**

21

22

```c

23

amdsmi_pcie_bandwidth_t pcie_bw;

24

amdsmi_status_t ret = amdsmi_get_gpu_pci_bandwidth(processor, &pcie_bw);

25

if (ret == AMDSMI_STATUS_SUCCESS) {

26

printf("PCIe Bandwidth:\n");

27

printf(" Transfer Rate: %u\n", pcie_bw.transfer_rate);

28

printf(" Lanes: %u\n", pcie_bw.lanes);

29

printf(" Max Payload Size: %u bytes\n", pcie_bw.max_pkt_sz);

30

}

31

```

32

33

### PCIe Link Status

34

35

Get current PCIe link status and capabilities.

36

37

```c { .api }

38

amdsmi_status_t amdsmi_get_pcie_link_status(amdsmi_processor_handle processor_handle, amdsmi_pcie_info_t *info);

39

amdsmi_status_t amdsmi_get_pcie_link_caps(amdsmi_processor_handle processor_handle, amdsmi_pcie_info_t *info);

40

```

41

42

**Parameters:**

43

- `processor_handle`: Handle to the GPU processor

44

- `info`: Pointer to receive PCIe information

45

46

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

47

48

### PCIe Traffic Monitoring

49

50

Monitor PCIe traffic throughput and packet statistics.

51

52

```c { .api }

53

amdsmi_status_t amdsmi_get_gpu_pci_throughput(amdsmi_processor_handle processor_handle, uint64_t *sent, uint64_t *received, uint64_t *max_pkt_sz);

54

```

55

56

**Parameters:**

57

- `processor_handle`: Handle to the GPU processor

58

- `sent`: Pointer to receive bytes sent through PCIe interface

59

- `received`: Pointer to receive bytes received through PCIe interface

60

- `max_pkt_sz`: Pointer to receive maximum packet size

61

62

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

63

64

### PCIe Replay Counter

65

66

Get PCIe replay counter information for link quality assessment.

67

68

```c { .api }

69

amdsmi_status_t amdsmi_get_gpu_pci_replay_counter(amdsmi_processor_handle processor_handle, uint64_t *counter);

70

```

71

72

**Parameters:**

73

- `processor_handle`: Handle to the GPU processor

74

- `counter`: Pointer to receive replay counter value

75

76

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

77

78

### BDF Information

79

80

Get Bus/Device/Function identification for a GPU.

81

82

```c { .api }

83

amdsmi_status_t amdsmi_get_gpu_bdf_id(amdsmi_processor_handle processor_handle, uint64_t *bdfid);

84

```

85

86

**Parameters:**

87

- `processor_handle`: Handle to the GPU processor

88

- `bdfid`: Pointer to receive BDF identifier as integer

89

90

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

91

92

### NUMA Topology

93

94

Get NUMA affinity and topology information for GPU devices.

95

96

```c { .api }

97

amdsmi_status_t amdsmi_get_gpu_topo_numa_affinity(amdsmi_processor_handle processor_handle, uint32_t *numa_node);

98

amdsmi_status_t amdsmi_topo_get_numa_node_number(amdsmi_processor_handle processor_handle, uint32_t *numa_node);

99

```

100

101

**Parameters:**

102

- `processor_handle`: Handle to the GPU processor

103

- `numa_node`: Pointer to receive NUMA node number

104

105

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

106

107

### Topology Link Information

108

109

Get detailed topology information between processors, including link types and weights.

110

111

```c { .api }

112

amdsmi_status_t amdsmi_topo_get_link_weight(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *weight);

113

amdsmi_status_t amdsmi_topo_get_link_type(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *hops, AMDSMI_IO_LINK_TYPE *type);

114

```

115

116

**Parameters:**

117

- `processor_handle_src`: Source processor handle

118

- `processor_handle_dst`: Destination processor handle

119

- `weight`: Pointer to receive link weight/distance

120

- `hops`: Pointer to receive number of hops between processors

121

- `type`: Pointer to receive link type (PCIe, XGMI, etc.)

122

123

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

124

125

### Bandwidth Between Processors

126

127

Get minimum and maximum bandwidth capabilities between two processors.

128

129

```c { .api }

130

amdsmi_status_t amdsmi_get_minmax_bandwith_between_processors(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *min_bandwidth, uint64_t *max_bandwidth);

131

```

132

133

**Parameters:**

134

- `processor_handle_src`: Source processor handle

135

- `processor_handle_dst`: Destination processor handle

136

- `min_bandwidth`: Pointer to receive minimum bandwidth

137

- `max_bandwidth`: Pointer to receive maximum bandwidth

138

139

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

140

141

### P2P Accessibility

142

143

Check if peer-to-peer access is available between two processors.

144

145

```c { .api }

146

amdsmi_status_t amdsmi_is_P2P_accessible(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, bool *accessible);

147

```

148

149

**Parameters:**

150

- `processor_handle_src`: Source processor handle

151

- `processor_handle_dst`: Destination processor handle

152

- `accessible`: Pointer to receive P2P accessibility status

153

154

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

155

156

### PCIe Bandwidth Control

157

158

Control PCIe bandwidth allocation (requires root privileges).

159

160

```c { .api }

161

amdsmi_status_t amdsmi_set_gpu_pci_bandwidth(amdsmi_processor_handle processor_handle, uint64_t bw_bitmask);

162

```

163

164

**Parameters:**

165

- `processor_handle`: Handle to the GPU processor

166

- `bw_bitmask`: Bandwidth bitmask for allowed bandwidth levels

167

168

**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure

169

170

**Note:** This function requires root privileges and is not supported in virtual environments.

171

172

## Python API

173

174

### PCIe Information

175

176

```python { .api }

177

def amdsmi_get_gpu_pci_bandwidth(processor_handle):

178

"""

179

Get PCIe bandwidth information for a GPU.

180

181

Args:

182

processor_handle: GPU processor handle

183

184

Returns:

185

dict: PCIe bandwidth info with keys 'transfer_rate', 'lanes', 'max_pkt_sz'

186

187

Raises:

188

AmdSmiException: If PCIe bandwidth query fails

189

"""

190

191

def amdsmi_get_pcie_link_status(processor_handle):

192

"""

193

Get current PCIe link status.

194

195

Args:

196

processor_handle: GPU processor handle

197

198

Returns:

199

dict: PCIe status info

200

201

Raises:

202

AmdSmiException: If PCIe status query fails

203

"""

204

```

205

206

### PCIe Traffic

207

208

```python { .api }

209

def amdsmi_get_gpu_pci_throughput(processor_handle):

210

"""

211

Get PCIe traffic throughput statistics.

212

213

Args:

214

processor_handle: GPU processor handle

215

216

Returns:

217

dict: Traffic info with keys 'sent', 'received', 'max_pkt_sz'

218

219

Raises:

220

AmdSmiException: If PCIe throughput query fails

221

"""

222

223

def amdsmi_get_gpu_pci_replay_counter(processor_handle):

224

"""

225

Get PCIe replay counter.

226

227

Args:

228

processor_handle: GPU processor handle

229

230

Returns:

231

int: Replay counter value

232

233

Raises:

234

AmdSmiException: If replay counter query fails

235

"""

236

```

237

238

### Topology Information

239

240

```python { .api }

241

def amdsmi_get_gpu_topo_numa_affinity(processor_handle):

242

"""

243

Get NUMA node affinity for a GPU.

244

245

Args:

246

processor_handle: GPU processor handle

247

248

Returns:

249

int: NUMA node number

250

251

Raises:

252

AmdSmiException: If NUMA query fails

253

"""

254

255

def amdsmi_topo_get_link_type(processor_handle_src, processor_handle_dst):

256

"""

257

Get link type and hop count between processors.

258

259

Args:

260

processor_handle_src: Source processor handle

261

processor_handle_dst: Destination processor handle

262

263

Returns:

264

dict: Link info with keys 'hops', 'type'

265

266

Raises:

267

AmdSmiException: If link type query fails

268

"""

269

```

270

271

**Python Usage Example:**

272

273

```python

274

import amdsmi

275

276

# Initialize and get GPU handles

277

amdsmi.amdsmi_init()

278

279

try:

280

sockets = amdsmi.amdsmi_get_socket_handles()

281

processors = amdsmi.amdsmi_get_processor_handles(sockets[0])

282

283

for i, gpu in enumerate(processors):

284

print(f"GPU {i} PCIe Information:")

285

286

# Get PCIe bandwidth

287

pcie_bw = amdsmi.amdsmi_get_gpu_pci_bandwidth(gpu)

288

print(f" Bandwidth: {pcie_bw['lanes']} lanes @ Gen{pcie_bw['transfer_rate']}")

289

290

# Get PCIe traffic

291

traffic = amdsmi.amdsmi_get_gpu_pci_throughput(gpu)

292

print(f" Traffic: {traffic['sent']} sent, {traffic['received']} received")

293

294

# Get NUMA affinity

295

numa_node = amdsmi.amdsmi_get_gpu_topo_numa_affinity(gpu)

296

print(f" NUMA Node: {numa_node}")

297

298

# Check topology to other GPUs

299

for j, other_gpu in enumerate(processors):

300

if i != j:

301

link_info = amdsmi.topo_get_link_type(gpu, other_gpu)

302

p2p_access = amdsmi.amdsmi_is_P2P_accessible(gpu, other_gpu)

303

print(f" -> GPU {j}: {link_info['hops']} hops, "

304

f"Type: {link_info['type']}, P2P: {p2p_access}")

305

306

finally:

307

amdsmi.amdsmi_shut_down()

308

```

309

310

## Types

311

312

### PCIe Bandwidth Structure

313

314

```c { .api }

315

typedef struct {

316

uint32_t transfer_rate; // PCIe generation/transfer rate

317

uint32_t lanes; // Number of PCIe lanes

318

uint64_t max_pkt_sz; // Maximum packet size

319

uint32_t reserved[3]; // Reserved for future use

320

} amdsmi_pcie_bandwidth_t;

321

```

322

323

### PCIe Information Structure

324

325

```c { .api }

326

typedef struct {

327

uint32_t width; // Link width in lanes

328

uint32_t speed; // Link speed

329

uint32_t reserved[6]; // Reserved for future use

330

} amdsmi_pcie_info_t;

331

```

332

333

### IO Link Types

334

335

```c { .api }

336

typedef enum {

337

AMDSMI_IOLINK_TYPE_UNDEFINED = 0, // Undefined link type

338

AMDSMI_IOLINK_TYPE_PCIEXPRESS, // PCIe link

339

AMDSMI_IOLINK_TYPE_XGMI, // XGMI/Infinity Fabric link

340

AMDSMI_IOLINK_TYPE_NUMIOLINKTYPES, // Number of link types

341

AMDSMI_IOLINK_TYPE_SIZE = 0xFFFFFFFF // Force enum size

342

} AMDSMI_IO_LINK_TYPE;

343

```

344

345

## Important Notes

346

347

1. **Virtual Machine Limitations**: Many connectivity control functions are not supported in virtual environments.

348

349

2. **Root Privileges**: Control functions like `amdsmi_set_gpu_pci_bandwidth()` require root privileges.

350

351

3. **Multi-GPU Systems**: Topology functions are most useful in multi-GPU systems with various interconnect types.

352

353

4. **Link Types**:

354

- **PCIe**: Standard PCIe connections with variable lanes and generations

355

- **XGMI**: High-speed Infinity Fabric connections for GPU-to-GPU communication

356

357

5. **NUMA Awareness**: NUMA affinity information is crucial for optimal memory allocation and performance.

358

359

6. **P2P Access**: Peer-to-peer accessibility determines if GPUs can directly access each other's memory.

360

361

7. **Traffic Monitoring**: PCIe traffic counters help identify bandwidth bottlenecks and utilization patterns.

362

363

8. **Replay Counters**: High replay counter values may indicate link quality issues or signal integrity problems.