0
# PCIe and Connectivity
1
2
PCIe interface monitoring, bandwidth management, topology discovery, and multi-GPU connectivity features for comprehensive system topology understanding.
3
4
## Capabilities
5
6
### PCIe Bandwidth Information
7
8
Get PCIe bandwidth capabilities and limitations for a GPU device.
9
10
```c { .api }
11
amdsmi_status_t amdsmi_get_gpu_pci_bandwidth(amdsmi_processor_handle processor_handle, amdsmi_pcie_bandwidth_t *bandwidth);
12
```
13
14
**Parameters:**
15
- `processor_handle`: Handle to the GPU processor
16
- `bandwidth`: Pointer to receive PCIe bandwidth information
17
18
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
19
20
**Usage Example:**
21
22
```c
23
amdsmi_pcie_bandwidth_t pcie_bw;
24
amdsmi_status_t ret = amdsmi_get_gpu_pci_bandwidth(processor, &pcie_bw);
25
if (ret == AMDSMI_STATUS_SUCCESS) {
26
printf("PCIe Bandwidth:\n");
27
printf(" Transfer Rate: %u\n", pcie_bw.transfer_rate);
28
printf(" Lanes: %u\n", pcie_bw.lanes);
29
printf(" Max Payload Size: %u bytes\n", pcie_bw.max_pkt_sz);
30
}
31
```
32
33
### PCIe Link Status
34
35
Get current PCIe link status and capabilities.
36
37
```c { .api }
38
amdsmi_status_t amdsmi_get_pcie_link_status(amdsmi_processor_handle processor_handle, amdsmi_pcie_info_t *info);
39
amdsmi_status_t amdsmi_get_pcie_link_caps(amdsmi_processor_handle processor_handle, amdsmi_pcie_info_t *info);
40
```
41
42
**Parameters:**
43
- `processor_handle`: Handle to the GPU processor
44
- `info`: Pointer to receive PCIe information
45
46
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
47
48
### PCIe Traffic Monitoring
49
50
Monitor PCIe traffic throughput and packet statistics.
51
52
```c { .api }
53
amdsmi_status_t amdsmi_get_gpu_pci_throughput(amdsmi_processor_handle processor_handle, uint64_t *sent, uint64_t *received, uint64_t *max_pkt_sz);
54
```
55
56
**Parameters:**
57
- `processor_handle`: Handle to the GPU processor
58
- `sent`: Pointer to receive bytes sent through PCIe interface
59
- `received`: Pointer to receive bytes received through PCIe interface
60
- `max_pkt_sz`: Pointer to receive maximum packet size
61
62
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
63
64
### PCIe Replay Counter
65
66
Get PCIe replay counter information for link quality assessment.
67
68
```c { .api }
69
amdsmi_status_t amdsmi_get_gpu_pci_replay_counter(amdsmi_processor_handle processor_handle, uint64_t *counter);
70
```
71
72
**Parameters:**
73
- `processor_handle`: Handle to the GPU processor
74
- `counter`: Pointer to receive replay counter value
75
76
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
77
78
### BDF Information
79
80
Get Bus/Device/Function identification for a GPU.
81
82
```c { .api }
83
amdsmi_status_t amdsmi_get_gpu_bdf_id(amdsmi_processor_handle processor_handle, uint64_t *bdfid);
84
```
85
86
**Parameters:**
87
- `processor_handle`: Handle to the GPU processor
88
- `bdfid`: Pointer to receive BDF identifier as integer
89
90
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
91
92
### NUMA Topology
93
94
Get NUMA affinity and topology information for GPU devices.
95
96
```c { .api }
97
amdsmi_status_t amdsmi_get_gpu_topo_numa_affinity(amdsmi_processor_handle processor_handle, uint32_t *numa_node);
98
amdsmi_status_t amdsmi_topo_get_numa_node_number(amdsmi_processor_handle processor_handle, uint32_t *numa_node);
99
```
100
101
**Parameters:**
102
- `processor_handle`: Handle to the GPU processor
103
- `numa_node`: Pointer to receive NUMA node number
104
105
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
106
107
### Topology Link Information
108
109
Get detailed topology information between processors, including link types and weights.
110
111
```c { .api }
112
amdsmi_status_t amdsmi_topo_get_link_weight(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *weight);
113
amdsmi_status_t amdsmi_topo_get_link_type(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *hops, AMDSMI_IO_LINK_TYPE *type);
114
```
115
116
**Parameters:**
117
- `processor_handle_src`: Source processor handle
118
- `processor_handle_dst`: Destination processor handle
119
- `weight`: Pointer to receive link weight/distance
120
- `hops`: Pointer to receive number of hops between processors
121
- `type`: Pointer to receive link type (PCIe, XGMI, etc.)
122
123
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
124
125
### Bandwidth Between Processors
126
127
Get minimum and maximum bandwidth capabilities between two processors.
128
129
```c { .api }
130
amdsmi_status_t amdsmi_get_minmax_bandwith_between_processors(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *min_bandwidth, uint64_t *max_bandwidth);
131
```
132
133
**Parameters:**
134
- `processor_handle_src`: Source processor handle
135
- `processor_handle_dst`: Destination processor handle
136
- `min_bandwidth`: Pointer to receive minimum bandwidth
137
- `max_bandwidth`: Pointer to receive maximum bandwidth
138
139
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
140
141
### P2P Accessibility
142
143
Check if peer-to-peer access is available between two processors.
144
145
```c { .api }
146
amdsmi_status_t amdsmi_is_P2P_accessible(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, bool *accessible);
147
```
148
149
**Parameters:**
150
- `processor_handle_src`: Source processor handle
151
- `processor_handle_dst`: Destination processor handle
152
- `accessible`: Pointer to receive P2P accessibility status
153
154
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
155
156
### PCIe Bandwidth Control
157
158
Control PCIe bandwidth allocation (requires root privileges).
159
160
```c { .api }
161
amdsmi_status_t amdsmi_set_gpu_pci_bandwidth(amdsmi_processor_handle processor_handle, uint64_t bw_bitmask);
162
```
163
164
**Parameters:**
165
- `processor_handle`: Handle to the GPU processor
166
- `bw_bitmask`: Bandwidth bitmask for allowed bandwidth levels
167
168
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
169
170
**Note:** This function requires root privileges and is not supported in virtual environments.
171
172
## Python API
173
174
### PCIe Information
175
176
```python { .api }
177
def amdsmi_get_gpu_pci_bandwidth(processor_handle):
178
"""
179
Get PCIe bandwidth information for a GPU.
180
181
Args:
182
processor_handle: GPU processor handle
183
184
Returns:
185
dict: PCIe bandwidth info with keys 'transfer_rate', 'lanes', 'max_pkt_sz'
186
187
Raises:
188
AmdSmiException: If PCIe bandwidth query fails
189
"""
190
191
def amdsmi_get_pcie_link_status(processor_handle):
192
"""
193
Get current PCIe link status.
194
195
Args:
196
processor_handle: GPU processor handle
197
198
Returns:
199
dict: PCIe status info
200
201
Raises:
202
AmdSmiException: If PCIe status query fails
203
"""
204
```
205
206
### PCIe Traffic
207
208
```python { .api }
209
def amdsmi_get_gpu_pci_throughput(processor_handle):
210
"""
211
Get PCIe traffic throughput statistics.
212
213
Args:
214
processor_handle: GPU processor handle
215
216
Returns:
217
dict: Traffic info with keys 'sent', 'received', 'max_pkt_sz'
218
219
Raises:
220
AmdSmiException: If PCIe throughput query fails
221
"""
222
223
def amdsmi_get_gpu_pci_replay_counter(processor_handle):
224
"""
225
Get PCIe replay counter.
226
227
Args:
228
processor_handle: GPU processor handle
229
230
Returns:
231
int: Replay counter value
232
233
Raises:
234
AmdSmiException: If replay counter query fails
235
"""
236
```
237
238
### Topology Information
239
240
```python { .api }
241
def amdsmi_get_gpu_topo_numa_affinity(processor_handle):
242
"""
243
Get NUMA node affinity for a GPU.
244
245
Args:
246
processor_handle: GPU processor handle
247
248
Returns:
249
int: NUMA node number
250
251
Raises:
252
AmdSmiException: If NUMA query fails
253
"""
254
255
def amdsmi_topo_get_link_type(processor_handle_src, processor_handle_dst):
256
"""
257
Get link type and hop count between processors.
258
259
Args:
260
processor_handle_src: Source processor handle
261
processor_handle_dst: Destination processor handle
262
263
Returns:
264
dict: Link info with keys 'hops', 'type'
265
266
Raises:
267
AmdSmiException: If link type query fails
268
"""
269
```
270
271
**Python Usage Example:**
272
273
```python
274
import amdsmi
275
276
# Initialize and get GPU handles
277
amdsmi.amdsmi_init()
278
279
try:
280
sockets = amdsmi.amdsmi_get_socket_handles()
281
processors = amdsmi.amdsmi_get_processor_handles(sockets[0])
282
283
for i, gpu in enumerate(processors):
284
print(f"GPU {i} PCIe Information:")
285
286
# Get PCIe bandwidth
287
pcie_bw = amdsmi.amdsmi_get_gpu_pci_bandwidth(gpu)
288
print(f" Bandwidth: {pcie_bw['lanes']} lanes @ Gen{pcie_bw['transfer_rate']}")
289
290
# Get PCIe traffic
291
traffic = amdsmi.amdsmi_get_gpu_pci_throughput(gpu)
292
print(f" Traffic: {traffic['sent']} sent, {traffic['received']} received")
293
294
# Get NUMA affinity
295
numa_node = amdsmi.amdsmi_get_gpu_topo_numa_affinity(gpu)
296
print(f" NUMA Node: {numa_node}")
297
298
# Check topology to other GPUs
299
for j, other_gpu in enumerate(processors):
300
if i != j:
301
link_info = amdsmi.topo_get_link_type(gpu, other_gpu)
302
p2p_access = amdsmi.amdsmi_is_P2P_accessible(gpu, other_gpu)
303
print(f" -> GPU {j}: {link_info['hops']} hops, "
304
f"Type: {link_info['type']}, P2P: {p2p_access}")
305
306
finally:
307
amdsmi.amdsmi_shut_down()
308
```
309
310
## Types
311
312
### PCIe Bandwidth Structure
313
314
```c { .api }
315
typedef struct {
316
uint32_t transfer_rate; // PCIe generation/transfer rate
317
uint32_t lanes; // Number of PCIe lanes
318
uint64_t max_pkt_sz; // Maximum packet size
319
uint32_t reserved[3]; // Reserved for future use
320
} amdsmi_pcie_bandwidth_t;
321
```
322
323
### PCIe Information Structure
324
325
```c { .api }
326
typedef struct {
327
uint32_t width; // Link width in lanes
328
uint32_t speed; // Link speed
329
uint32_t reserved[6]; // Reserved for future use
330
} amdsmi_pcie_info_t;
331
```
332
333
### IO Link Types
334
335
```c { .api }
336
typedef enum {
337
AMDSMI_IOLINK_TYPE_UNDEFINED = 0, // Undefined link type
338
AMDSMI_IOLINK_TYPE_PCIEXPRESS, // PCIe link
339
AMDSMI_IOLINK_TYPE_XGMI, // XGMI/Infinity Fabric link
340
AMDSMI_IOLINK_TYPE_NUMIOLINKTYPES, // Number of link types
341
AMDSMI_IOLINK_TYPE_SIZE = 0xFFFFFFFF // Force enum size
342
} AMDSMI_IO_LINK_TYPE;
343
```
344
345
## Important Notes
346
347
1. **Virtual Machine Limitations**: Many connectivity control functions are not supported in virtual environments.
348
349
2. **Root Privileges**: Control functions like `amdsmi_set_gpu_pci_bandwidth()` require root privileges.
350
351
3. **Multi-GPU Systems**: Topology functions are most useful in multi-GPU systems with various interconnect types.
352
353
4. **Link Types**:
354
- **PCIe**: Standard PCIe connections with variable lanes and generations
355
- **XGMI**: High-speed Infinity Fabric connections for GPU-to-GPU communication
356
357
5. **NUMA Awareness**: NUMA affinity information is crucial for optimal memory allocation and performance.
358
359
6. **P2P Access**: Peer-to-peer accessibility determines if GPUs can directly access each other's memory.
360
361
7. **Traffic Monitoring**: PCIe traffic counters help identify bandwidth bottlenecks and utilization patterns.
362
363
8. **Replay Counters**: High replay counter values may indicate link quality issues or signal integrity problems.