or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

catalog.mdconfiguration.mddata-source.mdfunctions.mdindex.mdtable-api.md
tile.json

functions.mddocs/

Function Module

Integration with Hive built-in functions through Flink's module system, enabling access to hundreds of Hive functions within Flink SQL queries and providing seamless function compatibility between Hive and Flink.

Capabilities

HiveModule

Main module class that provides access to Hive built-in functions within Flink.

/**
 * Module providing Hive built-in functions to Flink
 */
public class HiveModule implements Module {
    
    /**
     * List all available Hive functions
     * @return Set of function names
     */
    public Set<String> listFunctions();
    
    /**
     * Get function definition by name
     * @param name - Function name
     * @return Optional function definition
     */
    public Optional<FunctionDefinition> getFunctionDefinition(String name);
    
    /**
     * Get Hive version used by this module
     * @return Hive version string
     */
    public String getHiveVersion();
}

Usage Examples:

import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.module.hive.HiveModule;

// Create table environment
TableEnvironment tableEnv = TableEnvironment.create(settings);

// Load HiveModule to access Hive functions
tableEnv.loadModule("hive", new HiveModule());

// Use Hive functions in SQL queries
tableEnv.executeSql(
    "SELECT " +
    "  regexp_replace(name, '[0-9]', '') as clean_name, " +  // Hive regex function
    "  from_unixtime(timestamp) as formatted_time, " +        // Hive date function
    "  size(split(tags, ',')) as tag_count " +                // Hive array functions
    "FROM my_table"
);

// Use Hive aggregate functions
tableEnv.executeSql(
    "SELECT " +
    "  department, " +
    "  percentile_approx(salary, 0.5) as median_salary, " +   // Hive percentile
    "  collect_set(role) as unique_roles " +                  // Hive collection function
    "FROM employees " +
    "GROUP BY department"
);

HiveModuleFactory

Factory for creating HiveModule instances from configuration properties.

/**
 * Factory for creating HiveModule instances
 */
public class HiveModuleFactory implements ModuleFactory {
    
    /**
     * Create HiveModule from configuration
     * @param properties - Configuration properties
     * @return HiveModule instance
     */
    public Module createModule(Map<String, String> properties);
    
    /**
     * Get factory identifier
     * @return Factory identifier string
     */
    public String factoryIdentifier();
    
    /**
     * Get required configuration options
     * @return Set of required options
     */
    public Set<ConfigOption<?>> requiredOptions();
    
    /**
     * Get optional configuration options
     * @return Set of optional options
     */
    public Set<ConfigOption<?>> optionalOptions();
}

Configuration Options

Configuration options for HiveModule behavior and version specification.

/**
 * Configuration options for HiveModule
 */
@PublicEvolving
public class HiveModuleOptions {
    
    /** Hive version for module functionality */
    public static final ConfigOption<String> HIVE_VERSION;
}

SQL Parser Integration

Hive SQL dialect support through Flink's parser factory system.

/**
 * Parser factory for Hive SQL dialect support
 */
public class HiveParserFactory implements ParserFactory {
    
    /**
     * Create parser instance for Hive SQL dialect
     * @return Parser instance
     */
    public Parser create();
    
    /**
     * Get factory identifier
     * @return Factory identifier string
     */
    public String factoryIdentifier();
    
    /**
     * Get required configuration options
     * @return Set of required options
     */
    public Set<ConfigOption<?>> requiredOptions();
    
    /**
     * Get optional configuration options
     * @return Set of optional options
     */
    public Set<ConfigOption<?>> optionalOptions();
}

Available Function Categories

String Functions

Hive provides comprehensive string manipulation functions:

-- Pattern matching and replacement
SELECT regexp_replace(text, 'pattern', 'replacement') FROM table;
SELECT regexp_extract(text, 'pattern', group_index) FROM table;

-- String manipulation
SELECT concat_ws(',', col1, col2, col3) FROM table;
SELECT split(text, ',') FROM table;
SELECT trim(text) FROM table;

-- Case conversion
SELECT upper(text), lower(text) FROM table;

Date and Time Functions

Access to Hive's date and time functions:

-- Date formatting and parsing
SELECT from_unixtime(timestamp_col) FROM table;
SELECT unix_timestamp(date_string, 'yyyy-MM-dd') FROM table;

-- Date arithmetic
SELECT date_add(date_col, 30) FROM table;
SELECT datediff(end_date, start_date) FROM table;

-- Date extraction
SELECT year(date_col), month(date_col), day(date_col) FROM table;

Mathematical Functions

Comprehensive mathematical operations:

-- Aggregation functions
SELECT percentile_approx(value, 0.5) FROM table;
SELECT stddev_pop(value), variance(value) FROM table;

-- Mathematical operations
SELECT round(value, 2), ceil(value), floor(value) FROM table;
SELECT abs(value), pow(base, exponent) FROM table;

Collection Functions

Array and map manipulation functions:

-- Array functions
SELECT size(array_col) FROM table;
SELECT array_contains(array_col, 'value') FROM table;
SELECT sort_array(array_col) FROM table;

-- Map functions
SELECT map_keys(map_col), map_values(map_col) FROM table;
SELECT map_size(map_col) FROM table;

-- Collection aggregation
SELECT collect_list(col), collect_set(col) FROM table GROUP BY key;

Conditional Functions

Control flow and conditional logic:

-- Conditional expressions
SELECT if(condition, true_value, false_value) FROM table;
SELECT coalesce(col1, col2, 'default') FROM table;

-- Case expressions with Hive extensions
SELECT case 
  when col > 100 then 'high'
  when col > 50 then 'medium'
  else 'low'
end FROM table;

Advanced Usage

Module Loading and Configuration

Configure HiveModule with specific versions and options:

// Load HiveModule with specific version
Map<String, String> moduleProperties = new HashMap<>();
moduleProperties.put("hive-version", "3.1.2");

HiveModuleFactory factory = new HiveModuleFactory();
Module hiveModule = factory.createModule(moduleProperties);
tableEnv.loadModule("hive", hiveModule);

// Use modules with priority
tableEnv.loadModule("hive", new HiveModule());
tableEnv.loadModule("core", CoreModule.INSTANCE);

// List loaded modules
String[] modules = tableEnv.listModules();

Function Resolution Order

Understand how Flink resolves functions when multiple modules are loaded:

// Modules are resolved in load order
tableEnv.loadModule("core", CoreModule.INSTANCE);  // First priority
tableEnv.loadModule("hive", new HiveModule());     // Second priority

// Explicitly use Hive functions when name conflicts exist
tableEnv.executeSql("SELECT hive.size(array_col) FROM table");

Custom Function Integration

Combine Hive functions with custom UDFs:

// Register custom UDF
tableEnv.createTemporarySystemFunction("my_func", MyCustomUDF.class);

// Use both Hive and custom functions
tableEnv.executeSql(
    "SELECT " +
    "  my_func(col1) as custom_result, " +
    "  regexp_replace(col2, 'pattern', 'replacement') as hive_result " +
    "FROM table"
);

Performance Considerations

Optimize function usage for better performance:

-- Use Hive functions for complex string operations
SELECT regexp_replace(large_text, complex_pattern, replacement) FROM large_table;

-- Leverage Hive's optimized aggregate functions
SELECT percentile_approx(value, array(0.25, 0.5, 0.75)) FROM table GROUP BY key;

-- Use collect functions for data restructuring
SELECT key, collect_list(struct(col1, col2, col3)) as nested_data 
FROM table GROUP BY key;