Manual programs for performance benchmarking, scalability testing, and resource usage validation. These standalone programs are designed for manual execution and analysis, providing insights into Flink application performance characteristics under various workloads and configurations.
Large-scale string sorting performance test for batch processing.
/**
* Large-scale string sorting test for batch processing performance
*/
public class MassiveStringSorting {
/**
* Main entry point for massive string sorting performance test
* Tests DataSet sorting performance with large string datasets
* @param args Command line arguments: [numElements] [outputPath]
* @throws Exception if sorting test fails
*/
public static void main(String[] args) throws Exception;
}Usage:
# Sort 10 million strings and output to file
java -cp flink-tests.jar org.apache.flink.test.manual.MassiveStringSorting 10000000 /tmp/sorted-output
# Monitor memory usage and execution time
java -Xmx4g -XX:+PrintGCDetails -cp flink-tests.jar \
org.apache.flink.test.manual.MassiveStringSorting 10000000 /tmp/outputLarge-scale StringValue sorting performance test optimized for Flink Value types.
/**
* Large-scale StringValue sorting test for optimized performance
*/
public class MassiveStringValueSorting {
/**
* Main entry point for massive StringValue sorting performance test
* Tests DataSet sorting performance with Flink StringValue types for reduced GC pressure
* @param args Command line arguments: [numElements] [outputPath]
* @throws Exception if sorting test fails
*/
public static void main(String[] args) throws Exception;
}Usage:
# Sort 10 million StringValues for comparison with regular strings
java -cp flink-tests.jar org.apache.flink.test.manual.MassiveStringValueSorting 10000000 /tmp/stringvalue-output
# Compare GC behavior with regular string sorting
java -Xmx4g -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \
-cp flink-tests.jar org.apache.flink.test.manual.MassiveStringValueSorting 10000000 /tmp/outputComprehensive streaming performance test measuring throughput and latency characteristics.
/**
* Streaming performance test for scalability and latency measurement
*/
public class StreamingScalabilityAndLatency {
/**
* Main entry point for streaming scalability and latency test
* Tests streaming throughput, latency, and scalability under various loads
* @param args Command line arguments: [throughputTarget] [duration] [parallelism]
* @throws Exception if streaming test fails
*/
public static void main(String[] args) throws Exception;
}Usage:
# Test with 100k events/sec for 60 seconds with parallelism 8
java -cp flink-tests.jar org.apache.flink.test.manual.StreamingScalabilityAndLatency 100000 60 8
# High throughput test with monitoring
java -Xmx8g -XX:+UseG1GC -cp flink-tests.jar \
org.apache.flink.test.manual.StreamingScalabilityAndLatency 1000000 300 16Performance test for reduce operations measuring computation efficiency.
/**
* Performance test for reduce operations
*/
public class ReducePerformance {
/**
* Main entry point for reduce performance test
* Tests performance of various reduce operations and aggregations
* @param args Command line arguments: [numElements] [numKeys] [numReduceIterations]
* @throws Exception if reduce test fails
*/
public static void main(String[] args) throws Exception;
}Usage:
# Test reduce with 1M elements, 1000 keys, 10 iterations
java -cp flink-tests.jar org.apache.flink.test.manual.ReducePerformance 1000000 1000 10
# CPU-intensive reduce test
java -Xmx2g -XX:+UseParallelGC -cp flink-tests.jar \
org.apache.flink.test.manual.ReducePerformance 10000000 10000 5Test for object reuse and memory allocation patterns.
/**
* Test for object overwrite and reuse performance
*/
public class OverwriteObjects {
/**
* Main entry point for object overwrite performance test
* Tests impact of object reuse vs allocation on performance and GC
* @param args Command line arguments: [numIterations] [objectSize] [reuseObjects]
* @throws Exception if overwrite test fails
*/
public static void main(String[] args) throws Exception;
}Usage:
# Test with object reuse enabled
java -cp flink-tests.jar org.apache.flink.test.manual.OverwriteObjects 1000000 1024 true
# Test without object reuse for comparison
java -cp flink-tests.jar org.apache.flink.test.manual.OverwriteObjects 1000000 1024 falsePerformance test for hash table operations with various record widths.
/**
* Performance test for hash table operations with different record widths
*/
public class HashTableRecordWidthCombinations {
/**
* Main entry point for hash table record width performance test
* Tests hash table performance with various record sizes and configurations
* @param args Command line arguments: [numRecords] [recordWidth] [numBuckets]
* @throws Exception if hash table test fails
*/
public static void main(String[] args) throws Exception;
}Usage:
# Test hash table with 1M records, width 128 bytes, 1024 buckets
java -cp flink-tests.jar org.apache.flink.test.manual.HashTableRecordWidthCombinations 1000000 128 1024
# Memory-intensive hash table test
java -Xmx4g -XX:+PrintGCDetails -cp flink-tests.jar \
org.apache.flink.test.manual.HashTableRecordWidthCombinations 5000000 512 4096Common patterns for running and analyzing performance tests:
Batch Performance Testing:
#!/bin/bash
# Batch performance test script
# Test different data sizes
for size in 1000000 5000000 10000000 50000000; do
echo "Testing with $size elements..."
# Run string sorting test
start_time=$(date +%s)
java -Xmx4g -XX:+PrintGCDetails \
-cp flink-tests.jar org.apache.flink.test.manual.MassiveStringSorting \
$size /tmp/output-$size.txt 2> gc-$size.log
end_time=$(date +%s)
duration=$((end_time - start_time))
echo "Size: $size, Duration: ${duration}s"
# Extract GC statistics
grep "GC" gc-$size.log | tail -5
doneStreaming Performance Testing:
#!/bin/bash
# Streaming performance test script
# Test different throughput targets
for throughput in 10000 50000 100000 500000 1000000; do
echo "Testing throughput: $throughput events/sec"
# Run streaming test for 60 seconds
java -Xmx8g -XX:+UseG1GC -XX:+PrintGCDetails \
-cp flink-tests.jar org.apache.flink.test.manual.StreamingScalabilityAndLatency \
$throughput 60 8 2> streaming-gc-$throughput.log &
# Monitor system resources
pid=$!
top -p $pid -b -n 60 -d 1 > cpu-$throughput.log &
wait $pid
echo "Throughput $throughput completed"
doneMemory Usage Analysis:
#!/bin/bash
# Memory usage analysis script
# Test object reuse impact
echo "Testing object reuse vs allocation..."
# With object reuse
java -Xmx2g -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \
-cp flink-tests.jar org.apache.flink.test.manual.OverwriteObjects \
1000000 1024 true 2> gc-reuse.log
# Without object reuse
java -Xmx2g -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \
-cp flink-tests.jar org.apache.flink.test.manual.OverwriteObjects \
1000000 1024 false 2> gc-no-reuse.log
# Compare GC frequency and duration
echo "With object reuse:"
grep "Full GC" gc-reuse.log | wc -l
echo "Without object reuse:"
grep "Full GC" gc-no-reuse.log | wc -lPerformance Comparison Script:
#!/bin/bash
# Performance comparison between String and StringValue
sizes=(1000000 5000000 10000000)
for size in "${sizes[@]}"; do
echo "Comparing String vs StringValue for size: $size"
# Test regular strings
echo "Testing String sorting..."
time java -Xmx4g -cp flink-tests.jar \
org.apache.flink.test.manual.MassiveStringSorting \
$size /tmp/string-$size.txt
# Test StringValue
echo "Testing StringValue sorting..."
time java -Xmx4g -cp flink-tests.jar \
org.apache.flink.test.manual.MassiveStringValueSorting \
$size /tmp/stringvalue-$size.txt
echo "---"
doneResource Monitoring Script:
#!/bin/bash
# Comprehensive resource monitoring during performance tests
test_name=$1
test_command=$2
echo "Starting performance test: $test_name"
# Start resource monitoring
vmstat 1 > vmstat-$test_name.log &
vmstat_pid=$!
iostat 1 > iostat-$test_name.log &
iostat_pid=$!
# Run the test
start_time=$(date +%s)
eval $test_command
end_time=$(date +%s)
# Stop monitoring
kill $vmstat_pid $iostat_pid
duration=$((end_time - start_time))
echo "Test completed in ${duration} seconds"
# Generate summary report
echo "Performance Test Report: $test_name" > report-$test_name.txt
echo "Duration: ${duration}s" >> report-$test_name.txt
echo "Average CPU:" >> report-$test_name.txt
awk 'NR>3 {sum+=$15; count++} END {print 100-(sum/count)"%"}' vmstat-$test_name.log >> report-$test_name.txt
echo "Peak Memory Usage:" >> report-$test_name.txt
grep -o "used [0-9]*" vmstat-$test_name.log | cut -d' ' -f2 | sort -n | tail -1 >> report-$test_name.txtThese performance testing programs provide comprehensive benchmarking capabilities for evaluating Flink application performance across different dimensions including throughput, latency, memory usage, CPU utilization, and scalability characteristics.