Apache Flink SQL Parquet format support package that provides SQL client integration for reading and writing Parquet files in Flink applications.
—
The Parquet format factory provides seamless integration with Flink's table ecosystem, enabling automatic format detection and configuration for SQL table definitions.
Main factory class that implements both bulk reader and writer format factories, providing complete integration with Flink's dynamic table system.
/**
* Parquet format factory for file system connector integration
* Implements both reading and writing capabilities for SQL tables
*/
public class ParquetFileFormatFactory implements BulkReaderFormatFactory, BulkWriterFormatFactory {
/**
* Creates a bulk decoding format for reading Parquet files
* @param context Dynamic table factory context with catalog information
* @param formatOptions Configuration options for the format
* @return BulkDecodingFormat for RowData processing
*/
public BulkDecodingFormat<RowData> createDecodingFormat(
DynamicTableFactory.Context context,
ReadableConfig formatOptions
);
/**
* Creates an encoding format for writing Parquet files
* @param context Dynamic table factory context with catalog information
* @param formatOptions Configuration options for the format
* @return EncodingFormat for BulkWriter factory creation
*/
public EncodingFormat<BulkWriter.Factory<RowData>> createEncodingFormat(
DynamicTableFactory.Context context,
ReadableConfig formatOptions
);
/**
* Returns the unique identifier for this format factory
* @return "parquet" identifier string
*/
public String factoryIdentifier();
/**
* Returns the set of required configuration options
* @return Empty set (no required options)
*/
public Set<ConfigOption<?>> requiredOptions();
/**
* Returns the set of optional configuration options
* @return Set containing UTC_TIMEZONE and other optional configs
*/
public Set<ConfigOption<?>> optionalOptions();
}/**
* Controls timezone handling for timestamp conversion
* Default: false (use local timezone)
* When true: use UTC timezone for epoch time to LocalDateTime conversion
*/
public static final ConfigOption<Boolean> UTC_TIMEZONE =
key("utc-timezone")
.booleanType()
.defaultValue(false)
.withDescription(
"Use UTC timezone or local timezone to the conversion between epoch time and LocalDateTime. " +
"Hive 0.x/1.x/2.x use local timezone. But Hive 3.x use UTC timezone"
);-- Create a Parquet table with UTC timezone handling
CREATE TABLE orders (
order_id BIGINT,
customer_name STRING,
order_date TIMESTAMP(3),
amount DECIMAL(10,2)
) WITH (
'connector' = 'filesystem',
'path' = '/data/orders',
'format' = 'parquet',
'parquet.utc-timezone' = 'true'
);
-- Query the table (format factory handles Parquet reading automatically)
SELECT customer_name, SUM(amount)
FROM orders
WHERE order_date >= TIMESTAMP '2023-01-01 00:00:00'
GROUP BY customer_name;import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.descriptors.FileSystem;
import org.apache.flink.table.descriptors.Parquet;
TableEnvironment tableEnv = // ... get table environment
// The format factory is automatically discovered via service loader
tableEnv.executeSql("""
CREATE TABLE parquet_table (
id BIGINT,
name STRING,
created_at TIMESTAMP(3)
) WITH (
'connector' = 'filesystem',
'path' = '/path/to/data',
'format' = 'parquet',
'parquet.utc-timezone' = 'false'
)
""");The format factory is automatically registered via Java's ServiceLoader mechanism:
# File: META-INF/services/org.apache.flink.table.factories.Factory
org.apache.flink.formats.parquet.ParquetFileFormatFactoryThe factory creates different decoders and encoders based on the table context:
ParquetColumnarRowInputFormat with vectorized readingParquetRowDataBuilder with RowData writing supportThe format factory handles common configuration errors:
Install with Tessl CLI
npx tessl i tessl/maven-org-apache-flink--flink-sql-parquet-2-12