Apache Hudi Hadoop common utilities and components that provide core functionality for integrating Apache Hudi with Hadoop ecosystem including file system operations, configuration management, and Hadoop-specific utilities for managing data lakehouse operations
—
DFS-based configuration management providing support for global properties, environment-specific settings, and Hadoop configuration integration. Enables centralized configuration management across distributed Hudi deployments.
Main configuration class providing DFS-based properties management with support for global configuration files and environment-specific overrides.
/**
* DFS-based properties configuration extending PropertiesConfig
* Provides centralized configuration management for Hudi operations
*/
public class DFSPropertiesConfiguration extends PropertiesConfig {
/** Default properties file name */
public static final String DEFAULT_PROPERTIES_FILE = "hudi-defaults.conf";
/** Environment variable for configuration directory */
public static final String CONF_FILE_DIR_ENV_NAME = "HUDI_CONF_DIR";
/** Default configuration file directory */
public static final String DEFAULT_CONF_FILE_DIR = "file:/etc/hudi/conf";
/** Default path for configuration file */
public static final StoragePath DEFAULT_PATH;
/** Create configuration with Hadoop configuration and file path */
public DFSPropertiesConfiguration(Configuration hadoopConf, StoragePath filePath);
/** Create configuration with default settings */
public DFSPropertiesConfiguration();
/** Add properties from file path */
public void addPropsFromFile(StoragePath filePath);
/** Add properties from BufferedReader stream */
public void addPropsFromStream(BufferedReader reader, StoragePath cfgFilePath);
/** Get global properties instance */
public TypedProperties getGlobalProperties();
/** Get instance properties */
public TypedProperties getProps();
/** Get instance properties with global properties option */
public TypedProperties getProps(boolean includeGlobalProps);
}Static methods for managing global configuration properties across the application.
/**
* Load global properties from default configuration location
* Loads properties from HUDI_CONF_DIR or default location
* @return TypedProperties containing global configuration
*/
public static TypedProperties loadGlobalProps();
/**
* Get global properties (cached version)
* Returns cached global properties or loads them if not cached
* @return TypedProperties containing global configuration
*/
public static TypedProperties getGlobalProps();
/**
* Refresh global properties by reloading from file system
* Clears cache and reloads properties from configuration files
*/
public static void refreshGlobalProps();
/**
* Clear global properties cache
* Forces next access to reload properties from files
*/
public static void clearGlobalProps();
/**
* Add property to global properties
* @param key - Property key to add
* @param value - Property value to set
* @return Updated TypedProperties with new property
*/
public static TypedProperties addToGlobalProps(String key, String value);The configuration system supports multiple approaches for locating configuration files:
HUDI_CONF_DIR to specify custom configuration directory/etc/hudi/conf if environment variable not setStoragePath to constructor for custom locationSeamless integration with Hadoop Configuration system for unified configuration management.
/**
* Configuration integration patterns with Hadoop
*/
// Create with existing Hadoop configuration
Configuration hadoopConf = new Configuration();
hadoopConf.addResource("core-site.xml");
hadoopConf.addResource("hdfs-site.xml");
// Custom configuration file location
StoragePath configPath = new StoragePath("hdfs://namenode:8020/config/hudi-custom.conf");
DFSPropertiesConfiguration hudiConf = new DFSPropertiesConfiguration(hadoopConf, configPath);
// Access properties
String tableType = hudiConf.getString("hoodie.table.type", "COPY_ON_WRITE");
int parquetBlockSize = hudiConf.getInt("hoodie.parquet.block.size", 134217728);
boolean asyncCompaction = hudiConf.getBoolean("hoodie.compact.inline", false);Inherited methods from PropertiesConfig for accessing configuration values with type safety and defaults.
/**
* Property access methods (inherited from PropertiesConfig)
*/
// String properties
public String getString(String key);
public String getString(String key, String defaultValue);
// Integer properties
public int getInt(String key);
public int getInt(String key, int defaultValue);
// Long properties
public long getLong(String key);
public long getLong(String key, long defaultValue);
// Boolean properties
public boolean getBoolean(String key);
public boolean getBoolean(String key, boolean defaultValue);
// Double properties
public double getDouble(String key);
public double getDouble(String key, double defaultValue);
// Get all properties as TypedProperties
public TypedProperties getProps();Standard configuration properties commonly used in Hudi operations:
hoodie.table.name - Name of the Hudi tablehoodie.table.type - Table type (COPY_ON_WRITE or MERGE_ON_READ)hoodie.table.base.file.format - Base file format (PARQUET, ORC, etc.)hoodie.write.markers.type - Marker type for write operationshoodie.write.concurrency.mode - Concurrency mode for writeshoodie.datasource.write.operation - Write operation type (INSERT, UPSERT, etc.)hoodie.compact.inline - Enable inline compactionhoodie.compact.inline.max.delta.commits - Max delta commits before compactionhoodie.compact.strategy - Compaction strategyhoodie.filesystem.consistency.check.enabled - Enable consistency checkshoodie.filesystem.operation.retry.enable - Enable operation retrieshoodie.filesystem.operation.retry.initial.interval - Initial retry intervalHudi configuration files use standard Java properties format:
# Hudi configuration file (hudi-defaults.conf)
# Table settings
hoodie.table.type=COPY_ON_WRITE
hoodie.table.base.file.format=PARQUET
# Write settings
hoodie.write.markers.type=TIMELINE_SERVER_BASED
hoodie.write.concurrency.mode=SINGLE_WRITER
hoodie.datasource.write.operation=UPSERT
# Compaction settings
hoodie.compact.inline=false
hoodie.compact.inline.max.delta.commits=10
hoodie.compact.strategy=org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy
# File system settings
hoodie.filesystem.consistency.check.enabled=true
hoodie.filesystem.operation.retry.enable=true
hoodie.filesystem.operation.retry.initial.interval=100
# Parquet settings
hoodie.parquet.block.size=134217728
hoodie.parquet.page.size=1048576
hoodie.parquet.compression.codec=snappyUsage Examples:
import org.apache.hudi.common.config.DFSPropertiesConfiguration;
import org.apache.hadoop.conf.Configuration;
import org.apache.hudi.storage.StoragePath;
import org.apache.hudi.common.config.TypedProperties;
// Using global configuration
TypedProperties globalProps = DFSPropertiesConfiguration.getGlobalProps();
String defaultTableType = globalProps.getString("hoodie.table.type", "COPY_ON_WRITE");
// Using global DFS configuration instance
DFSPropertiesConfiguration globalConfig = DFSPropertiesConfiguration.getGlobalDFSPropsConfiguration();
boolean inlineCompaction = globalConfig.getBoolean("hoodie.compact.inline", false);
// Creating custom configuration with Hadoop integration
Configuration hadoopConf = new Configuration();
hadoopConf.set("fs.defaultFS", "hdfs://namenode:8020");
// Custom configuration file location
StoragePath customConfigPath = new StoragePath("hdfs://namenode:8020/apps/hudi/conf/production.conf");
DFSPropertiesConfiguration customConfig = new DFSPropertiesConfiguration(hadoopConf, customConfigPath);
// Access configuration properties with defaults
String tableName = customConfig.getString("hoodie.table.name", "default_table");
int parquetBlockSize = customConfig.getInt("hoodie.parquet.block.size", 134217728);
boolean consistencyCheck = customConfig.getBoolean("hoodie.filesystem.consistency.check.enabled", true);
// Environment-based configuration directory
// Set environment variable: export HUDI_CONF_DIR=hdfs://namenode:8020/config/hudi
// Configuration will automatically load from: hdfs://namenode:8020/config/hudi/hudi-defaults.conf
DFSPropertiesConfiguration envConfig = new DFSPropertiesConfiguration();
// Working with TypedProperties for bulk operations
TypedProperties allProps = customConfig.getProps();
for (String key : allProps.stringPropertyNames()) {
String value = allProps.getString(key);
System.out.println(key + " = " + value);
}
// Combining with Hadoop configuration for unified setup
Configuration unifiedConf = new Configuration();
unifiedConf.addResource("core-site.xml");
unifiedConf.addResource("hdfs-site.xml");
DFSPropertiesConfiguration hudiConfig = new DFSPropertiesConfiguration(
unifiedConf,
new StoragePath("hdfs://namenode:8020/config/hudi-defaults.conf")
);
// Use in Hudi operations
String recordKeyField = hudiConfig.getString("hoodie.datasource.write.recordkey.field", "_row_key");
String partitionPathField = hudiConfig.getString("hoodie.datasource.write.partitionpath.field", "partition");
String precombineField = hudiConfig.getString("hoodie.datasource.write.precombine.field", "ts");Install with Tessl CLI
npx tessl i tessl/maven-org-apache-hudi--hudi-hadoop-common