or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-types.mdimage-processing.mdindex.mdinput-sources.mdrecord-readers.mdtransforms.md

record-readers.mddocs/

0

# Record Readers

1

2

Record readers provide the core functionality for reading structured data from various sources in DataVec. They implement a consistent iterator-based pattern and support metadata tracking for data lineage and debugging.

3

4

## Capabilities

5

6

### Core RecordReader Interface

7

8

The base interface that all record readers implement. Provides standard iteration patterns, initialization, and optional batch reading capabilities.

9

10

```java { .api }

11

public interface RecordReader {

12

void initialize(InputSplit split) throws IOException;

13

List<Writable> next();

14

boolean hasNext();

15

void reset();

16

List<String> getLabels();

17

Record nextRecord();

18

boolean batchesSupported();

19

List<Writable> next(int numRecords);

20

}

21

```

22

23

**Usage Example:**

24

25

```java

26

RecordReader reader = new CSVRecordReader();

27

reader.initialize(new FileSplit(new File("data.csv")));

28

29

while (reader.hasNext()) {

30

List<Writable> record = reader.next();

31

// Process record

32

}

33

reader.reset(); // Reset for reuse

34

```

35

36

### CSV Record Reading

37

38

Reads comma-separated values files with configurable delimiters and skip lines. Automatically handles type inference and provides labels for classification tasks.

39

40

```java { .api }

41

public class CSVRecordReader implements RecordReader {

42

public CSVRecordReader();

43

public CSVRecordReader(int skipLines, String delimiter);

44

public CSVRecordReader(int skipLines, String delimiter, String quote);

45

}

46

```

47

48

**Constructor Parameters:**

49

- `skipLines` - Number of header lines to skip (default: 0)

50

- `delimiter` - Field separator character (default: ",")

51

- `quote` - Quote character for escaped fields (default: "\"")

52

53

**Usage Example:**

54

55

```java

56

// Read CSV with header line

57

RecordReader csvReader = new CSVRecordReader(1, ",");

58

csvReader.initialize(new FileSplit(new File("data.csv")));

59

60

while (csvReader.hasNext()) {

61

List<Writable> record = csvReader.next();

62

// First column as integer

63

int id = record.get(0).toInt();

64

// Second column as string

65

String name = record.get(1).toString();

66

// Third column as double

67

double value = record.get(2).toDouble();

68

}

69

```

70

71

### Sequence Record Reading

72

73

Handles sequential or time-series data where each record consists of multiple time steps. Extends RecordReader with sequence-specific methods.

74

75

```java { .api }

76

public interface SequenceRecordReader extends RecordReader {

77

List<List<Writable>> sequenceRecord();

78

List<List<Writable>> sequenceRecord(URI uri, DataInputStream dataInputStream) throws IOException;

79

SequenceRecord nextSequence();

80

}

81

82

public class CSVSequenceRecordReader implements SequenceRecordReader {

83

public CSVSequenceRecordReader();

84

public CSVSequenceRecordReader(int skipLines, String delimiter);

85

}

86

```

87

88

**Usage Example:**

89

90

```java

91

SequenceRecordReader seqReader = new CSVSequenceRecordReader();

92

seqReader.initialize(new FileSplit(new File("sequence_data.csv")));

93

94

while (seqReader.hasNext()) {

95

List<List<Writable>> sequence = seqReader.sequenceRecord();

96

// Process sequence - each inner list is a time step

97

for (List<Writable> timeStep : sequence) {

98

// Process individual time step

99

}

100

}

101

```

102

103

### Collection Record Reading

104

105

Reads data from in-memory Java collections, useful for testing and when data is already loaded in memory.

106

107

```java { .api }

108

public class CollectionRecordReader implements RecordReader {

109

public CollectionRecordReader(Collection<Collection<Writable>> records);

110

public CollectionRecordReader(RecordReader recordReader);

111

}

112

113

public class CollectionSequenceRecordReader implements SequenceRecordReader {

114

public CollectionSequenceRecordReader(Collection<Collection<Collection<Writable>>> sequences);

115

}

116

```

117

118

**Usage Example:**

119

120

```java

121

// Create data in memory

122

List<List<Writable>> data = Arrays.asList(

123

Arrays.asList(new IntWritable(1), new DoubleWritable(2.5)),

124

Arrays.asList(new IntWritable(2), new DoubleWritable(3.7))

125

);

126

127

RecordReader collectionReader = new CollectionRecordReader(data);

128

collectionReader.initialize(new CollectionInputSplit(data));

129

130

while (collectionReader.hasNext()) {

131

List<Writable> record = collectionReader.next();

132

// Process in-memory record

133

}

134

```

135

136

### Record Metadata Support

137

138

All record readers support metadata tracking for data provenance and debugging. Metadata includes source location, line numbers, and transformation history.

139

140

```java { .api }

141

public interface Record {

142

List<Writable> getRecord();

143

RecordMetaData getMetaData();

144

}

145

146

public interface RecordMetaData {

147

String getLocation();

148

URI getURI();

149

Class<?> getReaderClass();

150

}

151

```

152

153

**Usage Example:**

154

155

```java

156

RecordReader reader = new CSVRecordReader();

157

reader.initialize(new FileSplit(new File("data.csv")));

158

159

while (reader.hasNext()) {

160

Record recordWithMeta = reader.nextRecord();

161

List<Writable> data = recordWithMeta.getRecord();

162

RecordMetaData meta = recordWithMeta.getMetaData();

163

164

System.out.println("Data from: " + meta.getLocation());

165

// Process data with metadata context

166

}

167

```

168

169

### Batch Reading Support

170

171

Some record readers support batch reading for improved performance when processing large datasets.

172

173

```java { .api }

174

// Check if batch reading is supported

175

if (reader.batchesSupported()) {

176

List<Writable> batch = reader.next(batchSize);

177

// Process batch of records

178

}

179

```

180

181

### Advanced Metadata Support

182

183

DataVec provides comprehensive metadata tracking through a hierarchy of interfaces and classes that enable data lineage, debugging, and provenance tracking.

184

185

```java { .api }

186

public interface RecordMetaData {

187

String getLocation();

188

URI getURI();

189

Class<?> getReaderClass();

190

}

191

192

public interface RecordMetaDataComposable extends RecordMetaData {

193

List<RecordMetaData> getMeta();

194

}

195

196

public class RecordMetaDataComposableMap implements RecordMetaDataComposable {

197

public RecordMetaDataComposableMap(Map<String, RecordMetaData> meta);

198

public RecordMetaData getMeta(String key);

199

public Set<String> getMetaKeys();

200

}

201

```

202

203

**Usage Example:**

204

205

```java

206

RecordReader reader = new CSVRecordReader();

207

reader.initialize(new FileSplit(new File("data.csv")));

208

209

while (reader.hasNext()) {

210

Record recordWithMeta = reader.nextRecord();

211

List<Writable> data = recordWithMeta.getRecord();

212

RecordMetaData meta = recordWithMeta.getMetaData();

213

214

// Access metadata information

215

String sourceLocation = meta.getLocation();

216

URI sourceURI = meta.getURI();

217

Class<?> readerClass = meta.getReaderClass();

218

219

System.out.println("Processing record from: " + sourceLocation);

220

System.out.println("Read by: " + readerClass.getSimpleName());

221

222

// For composite metadata

223

if (meta instanceof RecordMetaDataComposable) {

224

RecordMetaDataComposable composite = (RecordMetaDataComposable) meta;

225

List<RecordMetaData> allMeta = composite.getMeta();

226

System.out.println("Composite metadata contains " + allMeta.size() + " entries");

227

}

228

}

229

```

230

231

### Exception Handling in DataVec

232

233

DataVec defines specific exceptions for different error conditions during data processing:

234

235

```java { .api }

236

public class WritableConverterException extends Exception {

237

public WritableConverterException(String message);

238

public WritableConverterException(String message, Throwable cause);

239

}

240

241

public class ZeroLengthSequenceException extends RuntimeException {

242

public ZeroLengthSequenceException(String message);

243

}

244

```

245

246

**Common Exception Scenarios:**

247

248

```java

249

try {

250

RecordReader reader = new CSVRecordReader();

251

reader.initialize(new FileSplit(new File("data.csv")));

252

253

while (reader.hasNext()) {

254

List<Writable> record = reader.next();

255

256

// Custom converter may throw WritableConverterException

257

WritableConverter converter = new CustomConverter();

258

for (int i = 0; i < record.size(); i++) {

259

Writable converted = converter.convert(record.get(i));

260

record.set(i, converted);

261

}

262

}

263

} catch (IOException e) {

264

// Handle file I/O errors

265

System.err.println("Error reading file: " + e.getMessage());

266

} catch (WritableConverterException e) {

267

// Handle data conversion errors

268

System.err.println("Error converting data: " + e.getMessage());

269

} catch (ZeroLengthSequenceException e) {

270

// Handle empty sequence errors

271

System.err.println("Empty sequence encountered: " + e.getMessage());

272

}

273

```

274

275

### File-Based Record Reading

276

277

Reads data from general file inputs with customizable parsing logic.

278

279

```java { .api }

280

public class FileRecordReader implements RecordReader {

281

public FileRecordReader();

282

public FileRecordReader(RecordReader wrappedReader);

283

}

284

```

285

286

**Usage Example:**

287

288

```java

289

FileRecordReader fileReader = new FileRecordReader();

290

fileReader.initialize(new FileSplit(new File("data.txt")));

291

292

while (fileReader.hasNext()) {

293

List<Writable> record = fileReader.next();

294

// Process file-based record

295

}

296

```

297

298

### Line-by-Line Record Reading

299

300

Reads text files line by line, treating each line as a single record.

301

302

```java { .api }

303

public class LineRecordReader implements RecordReader {

304

public LineRecordReader();

305

public LineRecordReader(String delimiter);

306

}

307

```

308

309

**Usage Example:**

310

311

```java

312

LineRecordReader lineReader = new LineRecordReader();

313

lineReader.initialize(new FileSplit(new File("textfile.txt")));

314

315

while (lineReader.hasNext()) {

316

List<Writable> record = lineReader.next();

317

String line = record.get(0).toString(); // Each record contains one line as Text

318

}

319

```

320

321

### Composable Record Reading

322

323

Combines multiple record readers for complex data processing workflows.

324

325

```java { .api }

326

public class ComposableRecordReader implements RecordReader {

327

public ComposableRecordReader(RecordReader... readers);

328

public ComposableRecordReader(List<RecordReader> readers);

329

}

330

```

331

332

**Usage Example:**

333

334

```java

335

RecordReader csvReader = new CSVRecordReader();

336

RecordReader imageReader = new ImageRecordReader(64, 64, 3, labelGenerator);

337

338

ComposableRecordReader composableReader = new ComposableRecordReader(csvReader, imageReader);

339

composableReader.initialize(new FileSplit(new File("mixed_data")));

340

341

while (composableReader.hasNext()) {

342

List<Writable> record = composableReader.next();

343

// Process combined record from multiple readers

344

}

345

```

346

347

### Concatenating Record Reading

348

349

Sequentially processes multiple record readers, concatenating their outputs.

350

351

```java { .api }

352

public class ConcatenatingRecordReader implements RecordReader {

353

public ConcatenatingRecordReader(RecordReader... readers);

354

public ConcatenatingRecordReader(List<RecordReader> readers);

355

}

356

```

357

358

**Usage Example:**

359

360

```java

361

RecordReader reader1 = new CSVRecordReader();

362

reader1.initialize(new FileSplit(new File("part1.csv")));

363

364

RecordReader reader2 = new CSVRecordReader();

365

reader2.initialize(new FileSplit(new File("part2.csv")));

366

367

ConcatenatingRecordReader concatReader = new ConcatenatingRecordReader(reader1, reader2);

368

369

while (concatReader.hasNext()) {

370

List<Writable> record = concatReader.next();

371

// Process records from both files sequentially

372

}

373

```

374

375

## Integration Patterns

376

377

### With DataSetIterator

378

379

Record readers integrate seamlessly with DL4J's DataSetIterator for machine learning workflows:

380

381

```java

382

RecordReader recordReader = new CSVRecordReader();

383

recordReader.initialize(new FileSplit(new File("training_data.csv")));

384

385

DataSetIterator iterator = new RecordReaderDataSetIterator(

386

recordReader,

387

batchSize, // Number of examples per batch

388

labelIndex, // Column index of the label

389

numClasses // Number of possible classes

390

);

391

```

392

393

### Error Handling

394

395

Record readers may throw various exceptions during operation:

396

397

- `IOExc`ception - File I/O errors during reading

398

- `NumberFormatException` - Invalid numeric data in CSV files

399

- `IllegalStateException` - Reader not properly initialized

400

401

```java

402

try {

403

reader.initialize(new FileSplit(new File("data.csv")));

404

while (reader.hasNext()) {

405

List<Writable> record = reader.next();

406

// Process record

407

}

408

} catch (IOException e) {

409

// Handle file I/O errors

410

} catch (NumberFormatException e) {

411

// Handle invalid numeric data

412

}

413

```

414

415

## Types

416

417

### Core Interfaces

418

419

```java { .api }

420

public interface RecordReader {

421

void initialize(InputSplit split) throws IOException;

422

List<Writable> next();

423

boolean hasNext();

424

void reset();

425

List<String> getLabels();

426

Record nextRecord();

427

boolean batchesSupported();

428

List<Writable> next(int numRecords);

429

}

430

431

public interface SequenceRecordReader extends RecordReader {

432

List<List<Writable>> sequenceRecord();

433

List<List<Writable>> sequenceRecord(URI uri, DataInputStream dataInputStream) throws IOException;

434

SequenceRecord nextSequence();

435

}

436

437

public interface Record {

438

List<Writable> getRecord();

439

RecordMetaData getMetaData();

440

}

441

442

public interface SequenceRecord {

443

List<List<Writable>> getSequenceRecord();

444

RecordMetaData getMetaData();

445

}

446

```

447

448

### RecordReader Implementations

449

450

```java { .api }

451

// CSV-based readers

452

public class CSVRecordReader implements RecordReader;

453

public class CSVSequenceRecordReader implements SequenceRecordReader;

454

455

// Collection-based readers

456

public class CollectionRecordReader implements RecordReader;

457

public class CollectionSequenceRecordReader implements SequenceRecordReader;

458

459

// File-based readers

460

public class FileRecordReader implements RecordReader;

461

public class LineRecordReader implements RecordReader;

462

463

// Composite readers

464

public class ComposableRecordReader implements RecordReader;

465

public class ConcatenatingRecordReader implements RecordReader;

466

```