0
# Word Count Examples
1
2
Basic streaming word count implementations demonstrating fundamental DataStream operations, tuple-based processing, and POJO-based alternatives.
3
4
## Capabilities
5
6
### WordCount
7
8
Classic streaming word count example processing text files or default sample data.
9
10
```java { .api }
11
/**
12
* Streaming word count program that computes word occurrence histogram
13
* @param args Command line arguments (--input path, --output path)
14
*/
15
public class WordCount {
16
public static void main(String[] args) throws Exception;
17
}
18
```
19
20
**Usage Example:**
21
22
```java
23
// Running with file input/output
24
java -cp flink-examples-streaming_2.10-1.3.3.jar \
25
org.apache.flink.streaming.examples.wordcount.WordCount \
26
--input /path/to/input.txt \
27
--output /path/to/output.txt
28
29
// Running with default sample data (prints to stdout)
30
java -cp flink-examples-streaming_2.10-1.3.3.jar \
31
org.apache.flink.streaming.examples.wordcount.WordCount
32
```
33
34
### Tokenizer Function
35
36
User-defined function that splits text into word-count pairs for stream processing.
37
38
```java { .api }
39
/**
40
* Splits sentences into words as a FlatMapFunction
41
* Takes a line (String) and splits it into multiple (word,1) pairs
42
*/
43
public static final class Tokenizer
44
implements FlatMapFunction<String, Tuple2<String, Integer>> {
45
46
/**
47
* Tokenizes input text and emits word-count pairs
48
* @param value Input text line
49
* @param out Collector for emitting Tuple2<word, count> pairs
50
*/
51
public void flatMap(String value, Collector<Tuple2<String, Integer>> out)
52
throws Exception;
53
}
54
```
55
56
The tokenizer:
57
- Normalizes text to lowercase
58
- Splits on non-word characters using regex `\\W+`
59
- Emits each word as `Tuple2<String, Integer>(word, 1)`
60
- Filters out empty tokens
61
62
### PojoExample
63
64
Alternative word count implementation using Plain Old Java Objects (POJOs) instead of tuples.
65
66
```java { .api }
67
/**
68
* Word count example demonstrating POJO usage instead of tuples
69
* @param args Command line arguments (--input path, --output path)
70
*/
71
public class PojoExample {
72
public static void main(String[] args) throws Exception;
73
}
74
```
75
76
**Usage Example:**
77
78
```java
79
// POJO-based word count
80
java -cp flink-examples-streaming_2.10-1.3.3.jar \
81
org.apache.flink.streaming.examples.wordcount.PojoExample \
82
--input /path/to/text.txt \
83
--output /path/to/results.txt
84
```
85
86
## Key Patterns Demonstrated
87
88
### Basic Stream Processing
89
- Creating execution environment with `StreamExecutionEnvironment.getExecutionEnvironment()`
90
- Reading text files using `env.readTextFile(path)`
91
- Creating streams from collections using `env.fromElements()`
92
- Applying transformations with `flatMap()`, `keyBy()`, and `sum()`
93
94
### Parameter Handling
95
```java
96
final ParameterTool params = ParameterTool.fromArgs(args);
97
env.getConfig().setGlobalJobParameters(params);
98
99
// Check for input parameter
100
if (params.has("input")) {
101
text = env.readTextFile(params.get("input"));
102
} else {
103
// Use default data
104
text = env.fromElements(WordCountData.WORDS);
105
}
106
```
107
108
### Output Configuration
109
```java
110
// File output
111
if (params.has("output")) {
112
counts.writeAsText(params.get("output"));
113
} else {
114
// Console output
115
counts.print();
116
}
117
118
// Execute the job
119
env.execute("Streaming WordCount");
120
```
121
122
### Data Transformation Pipeline
123
```java
124
DataStream<Tuple2<String, Integer>> counts = text
125
.flatMap(new Tokenizer()) // Split into words
126
.keyBy(0) // Group by word (field 0)
127
.sum(1); // Sum counts (field 1)
128
```
129
130
## Dependencies
131
132
```xml
133
<dependency>
134
<groupId>org.apache.flink</groupId>
135
<artifactId>flink-streaming-java_2.10</artifactId>
136
<version>1.3.3</version>
137
</dependency>
138
139
<!-- For WordCountData sample data -->
140
<dependency>
141
<groupId>org.apache.flink</groupId>
142
<artifactId>flink-examples-batch_2.10</artifactId>
143
<version>1.3.3</version>
144
</dependency>
145
```
146
147
## Required Imports
148
149
```java
150
import org.apache.flink.api.common.functions.FlatMapFunction;
151
import org.apache.flink.api.java.tuple.Tuple2;
152
import org.apache.flink.api.java.utils.ParameterTool;
153
import org.apache.flink.examples.java.wordcount.util.WordCountData;
154
import org.apache.flink.streaming.api.datastream.DataStream;
155
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
156
import org.apache.flink.util.Collector;
157
```