or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

async.mdexternal-systems.mdindex.mditeration.mdjoins.mdmachine-learning.mdside-output.mdsocket.mdutilities.mdwindowing.mdwordcount.md

wordcount.mddocs/

0

# Word Count Examples

1

2

Basic streaming word count implementations demonstrating fundamental DataStream operations, tuple-based processing, and POJO-based alternatives.

3

4

## Capabilities

5

6

### WordCount

7

8

Classic streaming word count example processing text files or default sample data.

9

10

```java { .api }

11

/**

12

* Streaming word count program that computes word occurrence histogram

13

* @param args Command line arguments (--input path, --output path)

14

*/

15

public class WordCount {

16

public static void main(String[] args) throws Exception;

17

}

18

```

19

20

**Usage Example:**

21

22

```java

23

// Running with file input/output

24

java -cp flink-examples-streaming_2.10-1.3.3.jar \

25

org.apache.flink.streaming.examples.wordcount.WordCount \

26

--input /path/to/input.txt \

27

--output /path/to/output.txt

28

29

// Running with default sample data (prints to stdout)

30

java -cp flink-examples-streaming_2.10-1.3.3.jar \

31

org.apache.flink.streaming.examples.wordcount.WordCount

32

```

33

34

### Tokenizer Function

35

36

User-defined function that splits text into word-count pairs for stream processing.

37

38

```java { .api }

39

/**

40

* Splits sentences into words as a FlatMapFunction

41

* Takes a line (String) and splits it into multiple (word,1) pairs

42

*/

43

public static final class Tokenizer

44

implements FlatMapFunction<String, Tuple2<String, Integer>> {

45

46

/**

47

* Tokenizes input text and emits word-count pairs

48

* @param value Input text line

49

* @param out Collector for emitting Tuple2<word, count> pairs

50

*/

51

public void flatMap(String value, Collector<Tuple2<String, Integer>> out)

52

throws Exception;

53

}

54

```

55

56

The tokenizer:

57

- Normalizes text to lowercase

58

- Splits on non-word characters using regex `\\W+`

59

- Emits each word as `Tuple2<String, Integer>(word, 1)`

60

- Filters out empty tokens

61

62

### PojoExample

63

64

Alternative word count implementation using Plain Old Java Objects (POJOs) instead of tuples.

65

66

```java { .api }

67

/**

68

* Word count example demonstrating POJO usage instead of tuples

69

* @param args Command line arguments (--input path, --output path)

70

*/

71

public class PojoExample {

72

public static void main(String[] args) throws Exception;

73

}

74

```

75

76

**Usage Example:**

77

78

```java

79

// POJO-based word count

80

java -cp flink-examples-streaming_2.10-1.3.3.jar \

81

org.apache.flink.streaming.examples.wordcount.PojoExample \

82

--input /path/to/text.txt \

83

--output /path/to/results.txt

84

```

85

86

## Key Patterns Demonstrated

87

88

### Basic Stream Processing

89

- Creating execution environment with `StreamExecutionEnvironment.getExecutionEnvironment()`

90

- Reading text files using `env.readTextFile(path)`

91

- Creating streams from collections using `env.fromElements()`

92

- Applying transformations with `flatMap()`, `keyBy()`, and `sum()`

93

94

### Parameter Handling

95

```java

96

final ParameterTool params = ParameterTool.fromArgs(args);

97

env.getConfig().setGlobalJobParameters(params);

98

99

// Check for input parameter

100

if (params.has("input")) {

101

text = env.readTextFile(params.get("input"));

102

} else {

103

// Use default data

104

text = env.fromElements(WordCountData.WORDS);

105

}

106

```

107

108

### Output Configuration

109

```java

110

// File output

111

if (params.has("output")) {

112

counts.writeAsText(params.get("output"));

113

} else {

114

// Console output

115

counts.print();

116

}

117

118

// Execute the job

119

env.execute("Streaming WordCount");

120

```

121

122

### Data Transformation Pipeline

123

```java

124

DataStream<Tuple2<String, Integer>> counts = text

125

.flatMap(new Tokenizer()) // Split into words

126

.keyBy(0) // Group by word (field 0)

127

.sum(1); // Sum counts (field 1)

128

```

129

130

## Dependencies

131

132

```xml

133

<dependency>

134

<groupId>org.apache.flink</groupId>

135

<artifactId>flink-streaming-java_2.10</artifactId>

136

<version>1.3.3</version>

137

</dependency>

138

139

<!-- For WordCountData sample data -->

140

<dependency>

141

<groupId>org.apache.flink</groupId>

142

<artifactId>flink-examples-batch_2.10</artifactId>

143

<version>1.3.3</version>

144

</dependency>

145

```

146

147

## Required Imports

148

149

```java

150

import org.apache.flink.api.common.functions.FlatMapFunction;

151

import org.apache.flink.api.java.tuple.Tuple2;

152

import org.apache.flink.api.java.utils.ParameterTool;

153

import org.apache.flink.examples.java.wordcount.util.WordCountData;

154

import org.apache.flink.streaming.api.datastream.DataStream;

155

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

156

import org.apache.flink.util.Collector;

157

```