or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdsecurity-testing.mdtest-base-classes.mdtest-data.mdtest-environments.mdtest-utilities.md

test-data.mddocs/

0

# Sample Test Data

1

2

The test data package provides pre-built datasets and validation utilities for common Flink algorithms and testing scenarios. These datasets are commonly used in Flink examples and benchmarks.

3

4

## Algorithm Test Data

5

6

### PageRank Data

7

8

Test data for PageRank algorithm implementations.

9

10

```java { .api }

11

public class PageRankData {

12

public static final int NUM_VERTICES = 5;

13

public static final String VERTICES;

14

public static final String EDGES;

15

public static final String RANKS_AFTER_3_ITERATIONS;

16

public static final String RANKS_AFTER_EPSILON_0_0001_CONVERGENCE;

17

}

18

```

19

20

**Usage Example:**

21

22

```java

23

@Test

24

public void testPageRank() throws Exception {

25

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

26

27

// Use provided test data

28

DataSet<String> vertices = env.fromElements(PageRankData.VERTICES.split("\n"));

29

DataSet<String> edges = env.fromElements(PageRankData.EDGES.split("\n"));

30

31

// Run PageRank algorithm

32

// ... PageRank implementation

33

34

// Validate against expected results

35

TestBaseUtils.compareResultsByLinesInMemory(

36

PageRankData.RANKS_AFTER_3_ITERATIONS,

37

resultPath

38

);

39

}

40

```

41

42

**Data Format:**

43

- **Vertices**: `vertexId` (5 vertices total)

44

- **Edges**: `sourceVertexId targetVertexId`

45

- **Results**: `vertexId pageRankValue`

46

47

### Word Count Data

48

49

Test data for WordCount implementations using German text from Goethe's Faust tragedy.

50

51

```java { .api }

52

public class WordCountData {

53

public static final String TEXT;

54

public static final String COUNTS;

55

public static final String STREAMING_COUNTS_AS_TUPLES;

56

public static final String COUNTS_AS_TUPLES;

57

}

58

```

59

60

**Usage Example:**

61

62

```java

63

@Test

64

public void testWordCount() throws Exception {

65

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

66

67

// Use provided German text

68

DataSet<String> text = env.fromElements(WordCountData.TEXT.split("\n"));

69

70

DataSet<Tuple2<String, Integer>> wordCounts = text

71

.flatMap(new Tokenizer())

72

.groupBy(0)

73

.sum(1);

74

75

List<Tuple2<String, Integer>> result = wordCounts.collect();

76

77

// Compare with expected counts

78

TestBaseUtils.compareResultAsTuples(result, WordCountData.COUNTS_AS_TUPLES);

79

}

80

```

81

82

**Data Content:**

83

- **TEXT**: German text from Goethe's Faust tragedy

84

- **COUNTS**: Expected word count results as `word count` format

85

- **COUNTS_AS_TUPLES**: Expected results as `(word,count)` tuples

86

- **STREAMING_COUNTS_AS_TUPLES**: Expected streaming results as tuples

87

88

### K-Means Clustering Data

89

90

Test data for K-Means clustering algorithm with both 2D and 3D datasets.

91

92

```java { .api }

93

public class KMeansData {

94

// 3D clustering data

95

public static final String DATAPOINTS;

96

public static final String INITIAL_CENTERS;

97

public static final String CENTERS_AFTER_ONE_STEP;

98

public static final String CENTERS_AFTER_ONE_STEP_SINGLE_DIGIT;

99

public static final String CENTERS_AFTER_20_ITERATIONS_SINGLE_DIGIT;

100

public static final String CENTERS_AFTER_20_ITERATIONS_DOUBLE_DIGIT;

101

102

// 2D clustering data

103

public static final String DATAPOINTS_2D;

104

public static final String INITIAL_CENTERS_2D;

105

public static final String CENTERS_2D_AFTER_SINGLE_ITERATION_DOUBLE_DIGIT;

106

public static final String CENTERS_2D_AFTER_20_ITERATIONS_DOUBLE_DIGIT;

107

108

// Validation utility

109

public static void checkResultsWithDelta(String expectedResults, List<String> resultLines, double maxDelta);

110

}

111

```

112

113

**Usage Example:**

114

115

```java

116

@Test

117

public void testKMeans3D() throws Exception {

118

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

119

120

// Use 3D test data

121

DataSet<String> points = env.fromElements(KMeansData.DATAPOINTS.split("\n"));

122

DataSet<String> centers = env.fromElements(KMeansData.INITIAL_CENTERS.split("\n"));

123

124

// Run K-Means algorithm for 20 iterations

125

// ... K-Means implementation

126

127

List<String> finalCenters = resultCenters.collect();

128

129

// Validate with delta tolerance for floating point comparison

130

KMeansData.checkResultsWithDelta(

131

KMeansData.CENTERS_AFTER_20_ITERATIONS_DOUBLE_DIGIT,

132

finalCenters,

133

0.01

134

);

135

}

136

137

@Test

138

public void testKMeans2D() throws Exception {

139

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

140

141

// Use 2D test data

142

DataSet<String> points = env.fromElements(KMeansData.DATAPOINTS_2D.split("\n"));

143

DataSet<String> centers = env.fromElements(KMeansData.INITIAL_CENTERS_2D.split("\n"));

144

145

// ... K-Means implementation

146

147

List<String> result = resultCenters.collect();

148

KMeansData.checkResultsWithDelta(

149

KMeansData.CENTERS_2D_AFTER_20_ITERATIONS_DOUBLE_DIGIT,

150

result,

151

0.01

152

);

153

}

154

```

155

156

**Data Format:**

157

- **3D Points**: `pointId x y z` (100 data points)

158

- **2D Points**: `pointId x y`

159

- **Centers**: `centerId x y z` (3D) or `centerId x y` (2D)

160

161

## Graph Algorithm Test Data

162

163

### Connected Components Data

164

165

Test data and validation utilities for Connected Components algorithm.

166

167

```java { .api }

168

public class ConnectedComponentsData {

169

// Generate test vertices

170

public static String getEnumeratingVertices(int num);

171

172

// Generate random edges with odd/even pattern

173

public static String getRandomOddEvenEdges(int numEdges, int numVertices, long seed);

174

175

// Validate connected components results

176

public static void checkOddEvenResult(BufferedReader result) throws IOException;

177

public static void checkOddEvenResult(List<Tuple2<Long, Long>> lines) throws IOException;

178

}

179

```

180

181

**Usage Example:**

182

183

```java

184

@Test

185

public void testConnectedComponents() throws Exception {

186

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

187

188

// Generate test data

189

String vertices = ConnectedComponentsData.getEnumeratingVertices(100);

190

String edges = ConnectedComponentsData.getRandomOddEvenEdges(150, 100, 12345L);

191

192

DataSet<String> vertexData = env.fromElements(vertices.split("\n"));

193

DataSet<String> edgeData = env.fromElements(edges.split("\n"));

194

195

// Run Connected Components algorithm

196

// ... implementation

197

198

List<Tuple2<Long, Long>> components = result.collect();

199

200

// Validate odd/even component structure

201

ConnectedComponentsData.checkOddEvenResult(components);

202

}

203

```

204

205

### Transitive Closure Data

206

207

Test data for Transitive Closure algorithm with validation utilities.

208

209

```java { .api }

210

public class TransitiveClosureData {

211

// Validate transitive closure results

212

public static void checkOddEvenResult(BufferedReader result) throws IOException;

213

}

214

```

215

216

**Usage Example:**

217

218

```java

219

@Test

220

public void testTransitiveClosure() throws Exception {

221

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

222

223

// Create test edges with odd/even pattern

224

DataSet<Tuple2<Long, Long>> edges = env.fromElements(

225

new Tuple2<>(1L, 2L),

226

new Tuple2<>(2L, 3L),

227

new Tuple2<>(3L, 4L)

228

);

229

230

// Run Transitive Closure algorithm

231

// ... implementation

232

233

// Write results to file

234

result.writeAsText(outputPath);

235

env.execute();

236

237

// Validate closure properties

238

BufferedReader reader = new BufferedReader(new FileReader(outputPath));

239

TransitiveClosureData.checkOddEvenResult(reader);

240

reader.close();

241

}

242

```

243

244

### Triangle Enumeration Data

245

246

Test data for triangle enumeration in graphs.

247

248

```java { .api }

249

public class EnumTriangleData {

250

public static final String EDGES;

251

public static final String TRIANGLES_BY_ID;

252

public static final String TRIANGLES_BY_DEGREE;

253

}

254

```

255

256

**Usage Example:**

257

258

```java

259

@Test

260

public void testTriangleEnumeration() throws Exception {

261

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

262

263

// Use provided edge data

264

DataSet<String> edges = env.fromElements(EnumTriangleData.EDGES.split("\n"));

265

266

// Run triangle enumeration algorithm

267

// ... implementation

268

269

List<String> triangles = result.collect();

270

271

// Compare with expected triangles (sorted by ID)

272

TestBaseUtils.compareResultsByLinesInMemory(

273

EnumTriangleData.TRIANGLES_BY_ID,

274

triangles

275

);

276

}

277

```

278

279

**Data Format:**

280

- **EDGES**: `vertexId1 vertexId2` representing undirected edges

281

- **TRIANGLES_BY_ID**: Expected triangle results sorted by vertex ID

282

- **TRIANGLES_BY_DEGREE**: Expected triangle results sorted by vertex degree

283

284

## Web Analytics Test Data

285

286

### Web Log Analysis Data

287

288

Test data for web log analysis and web graph algorithms.

289

290

```java { .api }

291

public class WebLogAnalysisData {

292

public static final String DOCS;

293

public static final String RANKS;

294

public static final String VISITS;

295

public static final String EXCEPTED_RESULT;

296

}

297

```

298

299

**Usage Example:**

300

301

```java

302

@Test

303

public void testWebLogAnalysis() throws Exception {

304

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

305

306

// Load web data

307

DataSet<String> documents = env.fromElements(WebLogAnalysisData.DOCS.split("\n"));

308

DataSet<String> pageRanks = env.fromElements(WebLogAnalysisData.RANKS.split("\n"));

309

DataSet<String> visits = env.fromElements(WebLogAnalysisData.VISITS.split("\n"));

310

311

// Run web log analysis

312

// ... implementation combining docs, ranks, and visits

313

314

List<String> analysis = result.collect();

315

316

// Validate analysis results

317

TestBaseUtils.compareResultsByLinesInMemory(

318

WebLogAnalysisData.EXCEPTED_RESULT,

319

analysis

320

);

321

}

322

```

323

324

**Data Format:**

325

- **DOCS**: `url|content` - Web documents with URL and content

326

- **RANKS**: `url rank` - Page rank values for URLs

327

- **VISITS**: `url visitCount` - Visit statistics for URLs

328

- **EXCEPTED_RESULT**: Expected combined analysis results

329

330

## Data Usage Patterns

331

332

### Loading Test Data

333

334

```java

335

// Split multi-line data into DataSet

336

DataSet<String> dataSet = env.fromElements(TestData.SAMPLE_DATA.split("\n"));

337

338

// Parse structured data

339

DataSet<Tuple2<String, Integer>> tuples = dataSet

340

.map(line -> {

341

String[] parts = line.split(",");

342

return new Tuple2<>(parts[0], Integer.parseInt(parts[1]));

343

});

344

```

345

346

### Validation with Delta Tolerance

347

348

```java

349

// For floating-point comparisons

350

KMeansData.checkResultsWithDelta(expectedResults, actualResults, 0.001);

351

352

// For key-value pairs with tolerance

353

TestBaseUtils.compareKeyValuePairsWithDelta(expected, resultPath, ",", 0.01);

354

```

355

356

### Custom Validation Logic

357

358

```java

359

// Implement custom validation for specific algorithms

360

List<String> results = algorithm.collect();

361

for (String result : results) {

362

// Custom validation logic

363

assertTrue("Result format validation", result.matches("\\d+,\\d+\\.\\d+"));

364

}

365

```

366

367

### Generating Random Test Data

368

369

```java

370

// Use ConnectedComponentsData for reproducible random data

371

String randomEdges = ConnectedComponentsData.getRandomOddEvenEdges(1000, 500, 42L);

372

DataSet<String> edges = env.fromElements(randomEdges.split("\n"));

373

```

374

375

## Integration with Test Frameworks

376

377

These test data classes integrate seamlessly with all test base classes:

378

379

```java

380

public class AlgorithmTest extends JavaProgramTestBase {

381

@Override

382

protected void testProgram() throws Exception {

383

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

384

385

// Use any test data class

386

DataSet<String> input = env.fromElements(WordCountData.TEXT.split("\\s+"));

387

388

// Run algorithm and validate

389

List<String> result = processInput(input).collect();

390

TestBaseUtils.compareResultAsText(result, expectedOutput);

391

}

392

}

393

```