or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# Flink External Resource GPU Driver

1

2

The Flink External Resource GPU Driver provides GPU resource management capabilities for Apache Flink streaming and batch processing jobs. It implements Flink's ExternalResourceDriver interface to enable discovery, allocation, and management of GPU resources across cluster nodes using configurable discovery scripts.

3

4

## Package Information

5

6

- **Package Name**: org.apache.flink:flink-external-resource-gpu

7

- **Package Type**: maven

8

- **Language**: Java

9

- **Installation**: Add to Maven dependencies with groupId `org.apache.flink` and artifactId `flink-external-resource-gpu`

10

11

## Core Imports

12

13

```java

14

import org.apache.flink.externalresource.gpu.GPUDriverFactory;

15

import org.apache.flink.externalresource.gpu.GPUDriverOptions;

16

import org.apache.flink.externalresource.gpu.GPUInfo;

17

import org.apache.flink.configuration.Configuration;

18

```

19

20

## Basic Usage

21

22

```java

23

import org.apache.flink.externalresource.gpu.GPUDriverFactory;

24

import org.apache.flink.externalresource.gpu.GPUDriverOptions;

25

import org.apache.flink.api.common.externalresource.ExternalResourceDriver;

26

import org.apache.flink.configuration.Configuration;

27

import java.util.Set;

28

29

// Configure GPU discovery

30

Configuration config = new Configuration();

31

config.set(GPUDriverOptions.DISCOVERY_SCRIPT_PATH, "/path/to/gpu-discovery-script.sh");

32

config.set(GPUDriverOptions.DISCOVERY_SCRIPT_ARG, "--device-type nvidia");

33

34

// Create GPU driver through factory

35

GPUDriverFactory factory = new GPUDriverFactory();

36

ExternalResourceDriver driver = factory.createExternalResourceDriver(config);

37

38

// Discover GPU resources

39

Set<GPUInfo> gpuResources = driver.retrieveResourceInfo(2L); // Request 2 GPUs

40

41

// Use GPU information

42

for (GPUInfo gpu : gpuResources) {

43

// Get GPU device index (GPUInfo always provides "index" property)

44

String deviceIndex = gpu.getProperty("index").orElse("unknown");

45

System.out.println("Available GPU: " + gpu.toString()); // e.g., "GPU Device(0)"

46

}

47

```

48

49

## Architecture

50

51

The GPU driver is built around several key components:

52

53

- **GPUDriverFactory**: Factory for creating GPU driver instances from configuration

54

- **GPUDriver**: Main driver implementation that executes discovery scripts and manages GPU resources

55

- **GPUInfo**: Value object representing individual GPU devices with their properties

56

- **GPUDriverOptions**: Configuration options for discovery script path and arguments

57

- **Discovery Script Integration**: Executes external scripts to detect available GPU hardware

58

59

## Capabilities

60

61

### GPU Driver Factory

62

63

Factory for creating GPU driver instances with proper configuration validation.

64

65

```java { .api }

66

/**

67

* Factory for creating GPU driver instances

68

*/

69

public class GPUDriverFactory implements ExternalResourceDriverFactory {

70

/**

71

* Creates an external resource driver for GPU management

72

* @param config Configuration containing GPU discovery settings

73

* @return ExternalResourceDriver instance for GPU resources

74

* @throws Exception if configuration is invalid or driver creation fails

75

*/

76

public ExternalResourceDriver createExternalResourceDriver(Configuration config) throws Exception;

77

}

78

```

79

80

### GPU Information

81

82

Represents individual GPU device information including device indices and properties.

83

84

```java { .api }

85

/**

86

* Information container for GPU resource, currently including the GPU index

87

* Note: Constructor is package-private, instances created through GPUDriver.retrieveResourceInfo()

88

*/

89

public class GPUInfo implements ExternalResourceInfo {

90

91

/**

92

* Gets property value by key

93

* @param key Property key to retrieve (supports "index")

94

* @return Optional containing property value, or empty if key not found

95

*/

96

public Optional<String> getProperty(String key);

97

98

/**

99

* Gets all available property keys

100

* @return Collection of available property keys (currently only "index")

101

*/

102

public Collection<String> getKeys();

103

104

/**

105

* String representation of GPU device

106

* @return Formatted string like "GPU Device(0)"

107

*/

108

public String toString();

109

110

/**

111

* Hash code based on GPU index

112

* @return Hash code for this GPU info

113

*/

114

public int hashCode();

115

116

/**

117

* Equality comparison based on GPU index

118

* @param obj Object to compare

119

* @return true if objects represent same GPU device

120

*/

121

public boolean equals(Object obj);

122

}

123

```

124

125

### GPU Driver Configuration

126

127

Configuration options for GPU discovery script path and arguments.

128

129

```java { .api }

130

/**

131

* Configuration options for GPU driver

132

*/

133

@PublicEvolving

134

public class GPUDriverOptions {

135

/**

136

* Configuration option for discovery script path

137

* Key: "discovery-script.path"

138

* Default: "/opt/flink/plugins/external-resource-gpu/nvidia-gpu-discovery.sh" (DEFAULT_FLINK_PLUGINS_DIRS + "/external-resource-gpu/nvidia-gpu-discovery.sh")

139

* Description: Path to GPU discovery script (absolute or relative to FLINK_HOME)

140

*/

141

public static final ConfigOption<String> DISCOVERY_SCRIPT_PATH;

142

143

/**

144

* Configuration option for discovery script arguments

145

* Key: "discovery-script.args"

146

* Default: No default value

147

* Description: Arguments passed to the discovery script

148

*/

149

public static final ConfigOption<String> DISCOVERY_SCRIPT_ARG;

150

}

151

```

152

153

### GPU Resource Discovery

154

155

Core functionality for discovering and retrieving GPU resources through configurable scripts.

156

157

```java { .api }

158

/**

159

* Driver for GPU resource discovery and management

160

* Implements ExternalResourceDriver interface for Flink integration

161

* Note: Constructor is package-private, instances created through GPUDriverFactory

162

*/

163

class GPUDriver implements ExternalResourceDriver {

164

165

/**

166

* Discovers and retrieves GPU resources by executing discovery script

167

* @param gpuAmount Number of GPUs to discover (must be > 0)

168

* @return Unmodifiable set of GPUInfo objects representing discovered GPUs

169

* @throws IllegalArgumentException if gpuAmount <= 0

170

* @throws TimeoutException if discovery script times out (10 second limit)

171

* @throws FlinkException if discovery script exits with non-zero code

172

* @throws FileNotFoundException if discovery script file does not exist

173

* @throws IllegalConfigurationException if discovery script path is not configured

174

*/

175

public Set<GPUInfo> retrieveResourceInfo(long gpuAmount) throws Exception;

176

}

177

```

178

179

## Implementation Details

180

181

The GPU driver uses a 10-second timeout for discovery script execution (defined by private constant DISCOVERY_SCRIPT_TIMEOUT_MS = 10000L) and expects GPU device indices to be identified by the "index" property key. The discovery script execution includes comprehensive error handling and logging for debugging script execution issues.

182

183

Logging behavior:

184

- Successfully discovered GPU resources are logged at INFO level

185

- Script execution warnings (non-zero exit, multiple output lines) are logged at WARN level with stdout/stderr details

186

- Empty indices and whitespace-only indices are automatically filtered out during parsing

187

188

## Types

189

190

```java { .api }

191

// External dependencies from flink-core

192

interface ExternalResourceDriver {

193

Set<? extends ExternalResourceInfo> retrieveResourceInfo(long amount) throws Exception;

194

}

195

196

interface ExternalResourceDriverFactory {

197

ExternalResourceDriver createExternalResourceDriver(Configuration config) throws Exception;

198

}

199

200

interface ExternalResourceInfo {

201

Optional<String> getProperty(String key);

202

Collection<String> getKeys();

203

}

204

205

// Configuration types

206

class Configuration {

207

<T> T get(ConfigOption<T> option);

208

<T> void set(ConfigOption<T> option, T value);

209

}

210

211

class ConfigOption<T> {

212

String key();

213

}

214

```

215

216

## Error Handling

217

218

The GPU driver throws specific exceptions for different error conditions:

219

220

- **IllegalConfigurationException**: Thrown when discovery script path is not configured or is whitespace-only

221

- **FileNotFoundException**: Thrown when the specified discovery script file does not exist

222

- **FlinkException**: Thrown when discovery script is not executable or exits with non-zero return code

223

- **IllegalArgumentException**: Thrown when gpuAmount parameter is <= 0

224

- **TimeoutException**: Thrown when discovery script execution exceeds 10 second timeout

225

226

Configuration and script validation during driver initialization:

227

- Discovery script path is resolved as absolute path if not already absolute, relative to FLINK_HOME (or current directory if FLINK_HOME not set)

228

- Script file existence and executable permissions are verified during GPUDriver construction

229

- If args configuration is not provided, it defaults to null (passed as "null" string to discovery script)

230

231

Discovery script integration expects:

232

- Script to accept two arguments: `gpuAmount` and optional `args`

233

- Script to output comma-separated GPU indices on a single line to stdout

234

- Script to exit with code 0 for success

235

- Script execution to complete within 10 seconds (DISCOVERY_SCRIPT_TIMEOUT_MS)

236

- If script outputs multiple lines, only the first line is processed (others are logged as warnings)

237

238

## Discovery Script Integration

239

240

The driver integrates with external discovery scripts to detect GPU hardware:

241

242

```bash

243

# Example script execution (command format: <script_path> <gpuAmount> <args>)

244

/path/to/discovery-script.sh 2 --device-type nvidia

245

246

# Expected output format (comma-separated indices on single line)

247

0,1

248

249

# If no GPUs found, script should output empty string or just whitespace

250

```

251

252

The discovery script should:

253

1. Accept GPU amount as first argument

254

2. Accept optional configuration arguments as second argument (or "null" if no args configured)

255

3. Output comma-separated GPU device indices to stdout on a single line

256

4. Exit with code 0 on success

257

5. Complete execution within 10 seconds

258

6. Handle whitespace in GPU indices (indices are trimmed during parsing)

259

260

The driver executes the script using Runtime.exec() with command format: `<script_absolute_path> <gpuAmount> <args>`