Tessl Tile for github/radeonopencompute/amdsmi@5.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

device-discovery.md error-handling-ras.md event-monitoring.md hardware-information.md index.md library-management.md memory-management.md pcie-connectivity.md performance-control.md performance-counters.md performance-monitoring.md process-system-info.md

pcie-connectivity.mddocs/

0
# PCIe and Connectivity
1

2
PCIe interface monitoring, bandwidth management, topology discovery, and multi-GPU connectivity features for comprehensive system topology understanding.
3

4
## Capabilities
5

6
### PCIe Bandwidth Information
7

8
Get PCIe bandwidth capabilities and limitations for a GPU device.
9

10
```c { .api }
11
amdsmi_status_t amdsmi_get_gpu_pci_bandwidth(amdsmi_processor_handle processor_handle, amdsmi_pcie_bandwidth_t *bandwidth);
12
```
13

14
**Parameters:**
15
- `processor_handle`: Handle to the GPU processor
16
- `bandwidth`: Pointer to receive PCIe bandwidth information
17

18
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
19

20
**Usage Example:**
21

22
```c
23
amdsmi_pcie_bandwidth_t pcie_bw;
24
amdsmi_status_t ret = amdsmi_get_gpu_pci_bandwidth(processor, &pcie_bw);
25
if (ret == AMDSMI_STATUS_SUCCESS) {
26
    printf("PCIe Bandwidth:\n");
27
    printf("  Transfer Rate: %u\n", pcie_bw.transfer_rate);
28
    printf("  Lanes: %u\n", pcie_bw.lanes);
29
    printf("  Max Payload Size: %u bytes\n", pcie_bw.max_pkt_sz);
30
}
31
```
32

33
### PCIe Link Status
34

35
Get current PCIe link status and capabilities.
36

37
```c { .api }
38
amdsmi_status_t amdsmi_get_pcie_link_status(amdsmi_processor_handle processor_handle, amdsmi_pcie_info_t *info);
39
amdsmi_status_t amdsmi_get_pcie_link_caps(amdsmi_processor_handle processor_handle, amdsmi_pcie_info_t *info);
40
```
41

42
**Parameters:**
43
- `processor_handle`: Handle to the GPU processor
44
- `info`: Pointer to receive PCIe information
45

46
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
47

48
### PCIe Traffic Monitoring
49

50
Monitor PCIe traffic throughput and packet statistics.
51

52
```c { .api }
53
amdsmi_status_t amdsmi_get_gpu_pci_throughput(amdsmi_processor_handle processor_handle, uint64_t *sent, uint64_t *received, uint64_t *max_pkt_sz);
54
```
55

56
**Parameters:**
57
- `processor_handle`: Handle to the GPU processor
58
- `sent`: Pointer to receive bytes sent through PCIe interface
59
- `received`: Pointer to receive bytes received through PCIe interface
60
- `max_pkt_sz`: Pointer to receive maximum packet size
61

62
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
63

64
### PCIe Replay Counter
65

66
Get PCIe replay counter information for link quality assessment.
67

68
```c { .api }
69
amdsmi_status_t amdsmi_get_gpu_pci_replay_counter(amdsmi_processor_handle processor_handle, uint64_t *counter);
70
```
71

72
**Parameters:**
73
- `processor_handle`: Handle to the GPU processor
74
- `counter`: Pointer to receive replay counter value
75

76
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
77

78
### BDF Information
79

80
Get Bus/Device/Function identification for a GPU.
81

82
```c { .api }
83
amdsmi_status_t amdsmi_get_gpu_bdf_id(amdsmi_processor_handle processor_handle, uint64_t *bdfid);
84
```
85

86
**Parameters:**
87
- `processor_handle`: Handle to the GPU processor
88
- `bdfid`: Pointer to receive BDF identifier as integer
89

90
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
91

92
### NUMA Topology
93

94
Get NUMA affinity and topology information for GPU devices.
95

96
```c { .api }
97
amdsmi_status_t amdsmi_get_gpu_topo_numa_affinity(amdsmi_processor_handle processor_handle, uint32_t *numa_node);
98
amdsmi_status_t amdsmi_topo_get_numa_node_number(amdsmi_processor_handle processor_handle, uint32_t *numa_node);
99
```
100

101
**Parameters:**
102
- `processor_handle`: Handle to the GPU processor
103
- `numa_node`: Pointer to receive NUMA node number
104

105
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
106

107
### Topology Link Information
108

109
Get detailed topology information between processors, including link types and weights.
110

111
```c { .api }
112
amdsmi_status_t amdsmi_topo_get_link_weight(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *weight);
113
amdsmi_status_t amdsmi_topo_get_link_type(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *hops, AMDSMI_IO_LINK_TYPE *type);
114
```
115

116
**Parameters:**
117
- `processor_handle_src`: Source processor handle
118
- `processor_handle_dst`: Destination processor handle
119
- `weight`: Pointer to receive link weight/distance
120
- `hops`: Pointer to receive number of hops between processors
121
- `type`: Pointer to receive link type (PCIe, XGMI, etc.)
122

123
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
124

125
### Bandwidth Between Processors
126

127
Get minimum and maximum bandwidth capabilities between two processors.
128

129
```c { .api }
130
amdsmi_status_t amdsmi_get_minmax_bandwith_between_processors(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, uint64_t *min_bandwidth, uint64_t *max_bandwidth);
131
```
132

133
**Parameters:**
134
- `processor_handle_src`: Source processor handle
135
- `processor_handle_dst`: Destination processor handle
136
- `min_bandwidth`: Pointer to receive minimum bandwidth
137
- `max_bandwidth`: Pointer to receive maximum bandwidth
138

139
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
140

141
### P2P Accessibility
142

143
Check if peer-to-peer access is available between two processors.
144

145
```c { .api }
146
amdsmi_status_t amdsmi_is_P2P_accessible(amdsmi_processor_handle processor_handle_src, amdsmi_processor_handle processor_handle_dst, bool *accessible);
147
```
148

149
**Parameters:**
150
- `processor_handle_src`: Source processor handle
151
- `processor_handle_dst`: Destination processor handle
152
- `accessible`: Pointer to receive P2P accessibility status
153

154
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
155

156
### PCIe Bandwidth Control
157

158
Control PCIe bandwidth allocation (requires root privileges).
159

160
```c { .api }
161
amdsmi_status_t amdsmi_set_gpu_pci_bandwidth(amdsmi_processor_handle processor_handle, uint64_t bw_bitmask);
162
```
163

164
**Parameters:**
165
- `processor_handle`: Handle to the GPU processor
166
- `bw_bitmask`: Bandwidth bitmask for allowed bandwidth levels
167

168
**Returns:** `amdsmi_status_t` - AMDSMI_STATUS_SUCCESS on success, error code on failure
169

170
**Note:** This function requires root privileges and is not supported in virtual environments.
171

172
## Python API
173

174
### PCIe Information
175

176
```python { .api }
177
def amdsmi_get_gpu_pci_bandwidth(processor_handle):
178
    """
179
    Get PCIe bandwidth information for a GPU.
180
    
181
    Args:
182
        processor_handle: GPU processor handle
183
        
184
    Returns:
185
        dict: PCIe bandwidth info with keys 'transfer_rate', 'lanes', 'max_pkt_sz'
186
        
187
    Raises:
188
        AmdSmiException: If PCIe bandwidth query fails
189
    """
190

191
def amdsmi_get_pcie_link_status(processor_handle):
192
    """
193
    Get current PCIe link status.
194
    
195
    Args:
196
        processor_handle: GPU processor handle
197
        
198
    Returns:
199
        dict: PCIe status info
200
        
201
    Raises:
202
        AmdSmiException: If PCIe status query fails
203
    """
204
```
205

206
### PCIe Traffic
207

208
```python { .api }
209
def amdsmi_get_gpu_pci_throughput(processor_handle):
210
    """
211
    Get PCIe traffic throughput statistics.
212
    
213
    Args:
214
        processor_handle: GPU processor handle
215
        
216
    Returns:
217
        dict: Traffic info with keys 'sent', 'received', 'max_pkt_sz'
218
        
219
    Raises:
220
        AmdSmiException: If PCIe throughput query fails
221
    """
222

223
def amdsmi_get_gpu_pci_replay_counter(processor_handle):
224
    """
225
    Get PCIe replay counter.
226
    
227
    Args:
228
        processor_handle: GPU processor handle
229
        
230
    Returns:
231
        int: Replay counter value
232
        
233
    Raises:
234
        AmdSmiException: If replay counter query fails
235
    """
236
```
237

238
### Topology Information
239

240
```python { .api }
241
def amdsmi_get_gpu_topo_numa_affinity(processor_handle):
242
    """
243
    Get NUMA node affinity for a GPU.
244
    
245
    Args:
246
        processor_handle: GPU processor handle
247
        
248
    Returns:
249
        int: NUMA node number
250
        
251
    Raises:
252
        AmdSmiException: If NUMA query fails
253
    """
254

255
def amdsmi_topo_get_link_type(processor_handle_src, processor_handle_dst):
256
    """
257
    Get link type and hop count between processors.
258
    
259
    Args:
260
        processor_handle_src: Source processor handle
261
        processor_handle_dst: Destination processor handle
262
        
263
    Returns:
264
        dict: Link info with keys 'hops', 'type'
265
        
266
    Raises:
267
        AmdSmiException: If link type query fails
268
    """
269
```
270

271
**Python Usage Example:**
272

273
```python
274
import amdsmi
275

276
# Initialize and get GPU handles
277
amdsmi.amdsmi_init()
278

279
try:
280
    sockets = amdsmi.amdsmi_get_socket_handles()
281
    processors = amdsmi.amdsmi_get_processor_handles(sockets[0])
282
    
283
    for i, gpu in enumerate(processors):
284
        print(f"GPU {i} PCIe Information:")
285
        
286
        # Get PCIe bandwidth
287
        pcie_bw = amdsmi.amdsmi_get_gpu_pci_bandwidth(gpu)
288
        print(f"  Bandwidth: {pcie_bw['lanes']} lanes @ Gen{pcie_bw['transfer_rate']}")
289
        
290
        # Get PCIe traffic
291
        traffic = amdsmi.amdsmi_get_gpu_pci_throughput(gpu)
292
        print(f"  Traffic: {traffic['sent']} sent, {traffic['received']} received")
293
        
294
        # Get NUMA affinity
295
        numa_node = amdsmi.amdsmi_get_gpu_topo_numa_affinity(gpu)
296
        print(f"  NUMA Node: {numa_node}")
297
        
298
        # Check topology to other GPUs
299
        for j, other_gpu in enumerate(processors):
300
            if i != j:
301
                link_info = amdsmi.topo_get_link_type(gpu, other_gpu)
302
                p2p_access = amdsmi.amdsmi_is_P2P_accessible(gpu, other_gpu)
303
                print(f"  -> GPU {j}: {link_info['hops']} hops, "
304
                      f"Type: {link_info['type']}, P2P: {p2p_access}")
305

306
finally:
307
    amdsmi.amdsmi_shut_down()
308
```
309

310
## Types
311

312
### PCIe Bandwidth Structure
313

314
```c { .api }
315
typedef struct {
316
    uint32_t transfer_rate;    // PCIe generation/transfer rate
317
    uint32_t lanes;           // Number of PCIe lanes
318
    uint64_t max_pkt_sz;      // Maximum packet size
319
    uint32_t reserved[3];     // Reserved for future use
320
} amdsmi_pcie_bandwidth_t;
321
```
322

323
### PCIe Information Structure
324

325
```c { .api }
326
typedef struct {
327
    uint32_t width;           // Link width in lanes
328
    uint32_t speed;           // Link speed
329
    uint32_t reserved[6];     // Reserved for future use
330
} amdsmi_pcie_info_t;
331
```
332

333
### IO Link Types
334

335
```c { .api }
336
typedef enum {
337
    AMDSMI_IOLINK_TYPE_UNDEFINED = 0,    // Undefined link type
338
    AMDSMI_IOLINK_TYPE_PCIEXPRESS,       // PCIe link
339
    AMDSMI_IOLINK_TYPE_XGMI,             // XGMI/Infinity Fabric link
340
    AMDSMI_IOLINK_TYPE_NUMIOLINKTYPES,   // Number of link types
341
    AMDSMI_IOLINK_TYPE_SIZE = 0xFFFFFFFF // Force enum size
342
} AMDSMI_IO_LINK_TYPE;
343
```
344

345
## Important Notes
346

347
1. **Virtual Machine Limitations**: Many connectivity control functions are not supported in virtual environments.
348

349
2. **Root Privileges**: Control functions like `amdsmi_set_gpu_pci_bandwidth()` require root privileges.
350

351
3. **Multi-GPU Systems**: Topology functions are most useful in multi-GPU systems with various interconnect types.
352

353
4. **Link Types**: 
354
   - **PCIe**: Standard PCIe connections with variable lanes and generations
355
   - **XGMI**: High-speed Infinity Fabric connections for GPU-to-GPU communication
356

357
5. **NUMA Awareness**: NUMA affinity information is crucial for optimal memory allocation and performance.
358

359
6. **P2P Access**: Peer-to-peer accessibility determines if GPUs can directly access each other's memory.
360

361
7. **Traffic Monitoring**: PCIe traffic counters help identify bandwidth bottlenecks and utilization patterns.
362

363
8. **Replay Counters**: High replay counter values may indicate link quality issues or signal integrity problems.

Version

Tile

Files

pcie-connectivity.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

pcie-connectivity.mddocs/