or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/maven-org-apache-flink--flink-external-resource-gpu

External resource driver for GPU management in Apache Flink streaming and batch processing jobs

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.flink/flink-external-resource-gpu@2.1.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-flink--flink-external-resource-gpu@2.1.0

index.mddocs/

Flink External Resource GPU Driver

The Flink External Resource GPU Driver provides GPU resource management capabilities for Apache Flink streaming and batch processing jobs. It implements Flink's ExternalResourceDriver interface to enable discovery, allocation, and management of GPU resources across cluster nodes using configurable discovery scripts.

Package Information

  • Package Name: org.apache.flink:flink-external-resource-gpu
  • Package Type: maven
  • Language: Java
  • Installation: Add to Maven dependencies with groupId org.apache.flink and artifactId flink-external-resource-gpu

Core Imports

import org.apache.flink.externalresource.gpu.GPUDriverFactory;
import org.apache.flink.externalresource.gpu.GPUDriverOptions;
import org.apache.flink.externalresource.gpu.GPUInfo;
import org.apache.flink.configuration.Configuration;

Basic Usage

import org.apache.flink.externalresource.gpu.GPUDriverFactory;
import org.apache.flink.externalresource.gpu.GPUDriverOptions;
import org.apache.flink.api.common.externalresource.ExternalResourceDriver;
import org.apache.flink.configuration.Configuration;
import java.util.Set;

// Configure GPU discovery
Configuration config = new Configuration();
config.set(GPUDriverOptions.DISCOVERY_SCRIPT_PATH, "/path/to/gpu-discovery-script.sh");
config.set(GPUDriverOptions.DISCOVERY_SCRIPT_ARG, "--device-type nvidia");

// Create GPU driver through factory
GPUDriverFactory factory = new GPUDriverFactory();
ExternalResourceDriver driver = factory.createExternalResourceDriver(config);

// Discover GPU resources
Set<GPUInfo> gpuResources = driver.retrieveResourceInfo(2L); // Request 2 GPUs

// Use GPU information
for (GPUInfo gpu : gpuResources) {
    // Get GPU device index (GPUInfo always provides "index" property)
    String deviceIndex = gpu.getProperty("index").orElse("unknown");
    System.out.println("Available GPU: " + gpu.toString()); // e.g., "GPU Device(0)"
}

Architecture

The GPU driver is built around several key components:

  • GPUDriverFactory: Factory for creating GPU driver instances from configuration
  • GPUDriver: Main driver implementation that executes discovery scripts and manages GPU resources
  • GPUInfo: Value object representing individual GPU devices with their properties
  • GPUDriverOptions: Configuration options for discovery script path and arguments
  • Discovery Script Integration: Executes external scripts to detect available GPU hardware

Capabilities

GPU Driver Factory

Factory for creating GPU driver instances with proper configuration validation.

/**
 * Factory for creating GPU driver instances
 */
public class GPUDriverFactory implements ExternalResourceDriverFactory {
    /**
     * Creates an external resource driver for GPU management
     * @param config Configuration containing GPU discovery settings
     * @return ExternalResourceDriver instance for GPU resources
     * @throws Exception if configuration is invalid or driver creation fails
     */
    public ExternalResourceDriver createExternalResourceDriver(Configuration config) throws Exception;
}

GPU Information

Represents individual GPU device information including device indices and properties.

/**
 * Information container for GPU resource, currently including the GPU index
 * Note: Constructor is package-private, instances created through GPUDriver.retrieveResourceInfo()
 */
public class GPUInfo implements ExternalResourceInfo {
    
    /**
     * Gets property value by key
     * @param key Property key to retrieve (supports "index")
     * @return Optional containing property value, or empty if key not found
     */
    public Optional<String> getProperty(String key);
    
    /**
     * Gets all available property keys
     * @return Collection of available property keys (currently only "index")
     */
    public Collection<String> getKeys();
    
    /**
     * String representation of GPU device
     * @return Formatted string like "GPU Device(0)"
     */
    public String toString();
    
    /**
     * Hash code based on GPU index
     * @return Hash code for this GPU info
     */
    public int hashCode();
    
    /**
     * Equality comparison based on GPU index
     * @param obj Object to compare
     * @return true if objects represent same GPU device
     */
    public boolean equals(Object obj);
}

GPU Driver Configuration

Configuration options for GPU discovery script path and arguments.

/**
 * Configuration options for GPU driver
 */
@PublicEvolving
public class GPUDriverOptions {
    /**
     * Configuration option for discovery script path
     * Key: "discovery-script.path"
     * Default: "/opt/flink/plugins/external-resource-gpu/nvidia-gpu-discovery.sh" (DEFAULT_FLINK_PLUGINS_DIRS + "/external-resource-gpu/nvidia-gpu-discovery.sh")
     * Description: Path to GPU discovery script (absolute or relative to FLINK_HOME)
     */
    public static final ConfigOption<String> DISCOVERY_SCRIPT_PATH;
    
    /**
     * Configuration option for discovery script arguments  
     * Key: "discovery-script.args"
     * Default: No default value
     * Description: Arguments passed to the discovery script
     */
    public static final ConfigOption<String> DISCOVERY_SCRIPT_ARG;
}

GPU Resource Discovery

Core functionality for discovering and retrieving GPU resources through configurable scripts.

/**
 * Driver for GPU resource discovery and management
 * Implements ExternalResourceDriver interface for Flink integration
 * Note: Constructor is package-private, instances created through GPUDriverFactory
 */
class GPUDriver implements ExternalResourceDriver {
    
    /**
     * Discovers and retrieves GPU resources by executing discovery script
     * @param gpuAmount Number of GPUs to discover (must be > 0)
     * @return Unmodifiable set of GPUInfo objects representing discovered GPUs
     * @throws IllegalArgumentException if gpuAmount <= 0
     * @throws TimeoutException if discovery script times out (10 second limit)
     * @throws FlinkException if discovery script exits with non-zero code
     * @throws FileNotFoundException if discovery script file does not exist
     * @throws IllegalConfigurationException if discovery script path is not configured
     */
    public Set<GPUInfo> retrieveResourceInfo(long gpuAmount) throws Exception;
}

Implementation Details

The GPU driver uses a 10-second timeout for discovery script execution (defined by private constant DISCOVERY_SCRIPT_TIMEOUT_MS = 10000L) and expects GPU device indices to be identified by the "index" property key. The discovery script execution includes comprehensive error handling and logging for debugging script execution issues.

Logging behavior:

  • Successfully discovered GPU resources are logged at INFO level
  • Script execution warnings (non-zero exit, multiple output lines) are logged at WARN level with stdout/stderr details
  • Empty indices and whitespace-only indices are automatically filtered out during parsing

Types

// External dependencies from flink-core
interface ExternalResourceDriver {
    Set<? extends ExternalResourceInfo> retrieveResourceInfo(long amount) throws Exception;
}

interface ExternalResourceDriverFactory {
    ExternalResourceDriver createExternalResourceDriver(Configuration config) throws Exception;
}

interface ExternalResourceInfo {
    Optional<String> getProperty(String key);
    Collection<String> getKeys();
}

// Configuration types
class Configuration {
    <T> T get(ConfigOption<T> option);
    <T> void set(ConfigOption<T> option, T value);
}

class ConfigOption<T> {
    String key();
}

Error Handling

The GPU driver throws specific exceptions for different error conditions:

  • IllegalConfigurationException: Thrown when discovery script path is not configured or is whitespace-only
  • FileNotFoundException: Thrown when the specified discovery script file does not exist
  • FlinkException: Thrown when discovery script is not executable or exits with non-zero return code
  • IllegalArgumentException: Thrown when gpuAmount parameter is <= 0
  • TimeoutException: Thrown when discovery script execution exceeds 10 second timeout

Configuration and script validation during driver initialization:

  • Discovery script path is resolved as absolute path if not already absolute, relative to FLINK_HOME (or current directory if FLINK_HOME not set)
  • Script file existence and executable permissions are verified during GPUDriver construction
  • If args configuration is not provided, it defaults to null (passed as "null" string to discovery script)

Discovery script integration expects:

  • Script to accept two arguments: gpuAmount and optional args
  • Script to output comma-separated GPU indices on a single line to stdout
  • Script to exit with code 0 for success
  • Script execution to complete within 10 seconds (DISCOVERY_SCRIPT_TIMEOUT_MS)
  • If script outputs multiple lines, only the first line is processed (others are logged as warnings)

Discovery Script Integration

The driver integrates with external discovery scripts to detect GPU hardware:

# Example script execution (command format: <script_path> <gpuAmount> <args>)
/path/to/discovery-script.sh 2 --device-type nvidia

# Expected output format (comma-separated indices on single line)
0,1

# If no GPUs found, script should output empty string or just whitespace

The discovery script should:

  1. Accept GPU amount as first argument
  2. Accept optional configuration arguments as second argument (or "null" if no args configured)
  3. Output comma-separated GPU device indices to stdout on a single line
  4. Exit with code 0 on success
  5. Complete execution within 10 seconds
  6. Handle whitespace in GPU indices (indices are trimmed during parsing)

The driver executes the script using Runtime.exec() with command format: <script_absolute_path> <gpuAmount> <args>