CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-flink--flink-examples-streaming-2-10

Apache Flink streaming examples demonstrating various stream processing patterns and use cases

Pending
Overview
Eval results
Files

wordcount.mddocs/

Word Count Examples

Basic streaming word count implementations demonstrating fundamental DataStream operations, tuple-based processing, and POJO-based alternatives.

Capabilities

WordCount

Classic streaming word count example processing text files or default sample data.

/**
 * Streaming word count program that computes word occurrence histogram
 * @param args Command line arguments (--input path, --output path)
 */
public class WordCount {
    public static void main(String[] args) throws Exception;
}

Usage Example:

// Running with file input/output
java -cp flink-examples-streaming_2.10-1.3.3.jar \
  org.apache.flink.streaming.examples.wordcount.WordCount \
  --input /path/to/input.txt \
  --output /path/to/output.txt

// Running with default sample data (prints to stdout)
java -cp flink-examples-streaming_2.10-1.3.3.jar \
  org.apache.flink.streaming.examples.wordcount.WordCount

Tokenizer Function

User-defined function that splits text into word-count pairs for stream processing.

/**
 * Splits sentences into words as a FlatMapFunction
 * Takes a line (String) and splits it into multiple (word,1) pairs
 */
public static final class Tokenizer 
    implements FlatMapFunction<String, Tuple2<String, Integer>> {
    
    /**
     * Tokenizes input text and emits word-count pairs
     * @param value Input text line
     * @param out Collector for emitting Tuple2<word, count> pairs
     */
    public void flatMap(String value, Collector<Tuple2<String, Integer>> out) 
        throws Exception;
}

The tokenizer:

  • Normalizes text to lowercase
  • Splits on non-word characters using regex \\W+
  • Emits each word as Tuple2<String, Integer>(word, 1)
  • Filters out empty tokens

PojoExample

Alternative word count implementation using Plain Old Java Objects (POJOs) instead of tuples.

/**
 * Word count example demonstrating POJO usage instead of tuples
 * @param args Command line arguments (--input path, --output path)
 */
public class PojoExample {
    public static void main(String[] args) throws Exception;
}

Usage Example:

// POJO-based word count
java -cp flink-examples-streaming_2.10-1.3.3.jar \
  org.apache.flink.streaming.examples.wordcount.PojoExample \
  --input /path/to/text.txt \
  --output /path/to/results.txt

Key Patterns Demonstrated

Basic Stream Processing

  • Creating execution environment with StreamExecutionEnvironment.getExecutionEnvironment()
  • Reading text files using env.readTextFile(path)
  • Creating streams from collections using env.fromElements()
  • Applying transformations with flatMap(), keyBy(), and sum()

Parameter Handling

final ParameterTool params = ParameterTool.fromArgs(args);
env.getConfig().setGlobalJobParameters(params);

// Check for input parameter
if (params.has("input")) {
    text = env.readTextFile(params.get("input"));
} else {
    // Use default data
    text = env.fromElements(WordCountData.WORDS);
}

Output Configuration

// File output
if (params.has("output")) {
    counts.writeAsText(params.get("output"));
} else {
    // Console output
    counts.print();
}

// Execute the job
env.execute("Streaming WordCount");

Data Transformation Pipeline

DataStream<Tuple2<String, Integer>> counts = text
    .flatMap(new Tokenizer())           // Split into words
    .keyBy(0)                           // Group by word (field 0)
    .sum(1);                            // Sum counts (field 1)

Dependencies

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-java_2.10</artifactId>
    <version>1.3.3</version>
</dependency>

<!-- For WordCountData sample data -->
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-examples-batch_2.10</artifactId>
    <version>1.3.3</version>
</dependency>

Required Imports

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.examples.java.wordcount.util.WordCountData;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-flink--flink-examples-streaming-2-10

docs

async.md

external-systems.md

index.md

iteration.md

joins.md

machine-learning.md

side-output.md

socket.md

utilities.md

windowing.md

wordcount.md

tile.json