or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-datavec--datavec-local

DataVec integration library providing data loading, transformation, and Spark processing capabilities for DeepLearning4j

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.datavec/datavec-local@0.9.x

To install, run

npx @tessl/cli install tessl/maven-org-datavec--datavec-local@0.9.0

0

# DataVec Local Integration

1

2

DataVec Local Integration provides comprehensive data loading, transformation, and processing capabilities for DeepLearning4j. It bridges DataVec's data processing capabilities with DeepLearning4j's neural network training, enabling seamless conversion of various data sources into DataSet and MultiDataSet objects for machine learning workflows.

3

4

## Package Information

5

6

- **Package Name**: org.datavec:datavec-local

7

- **Package Type**: maven

8

- **Language**: Java

9

- **Installation**: Add to Maven dependencies:

10

11

```xml

12

<dependency>

13

<groupId>org.datavec</groupId>

14

<artifactId>datavec-local</artifactId>

15

<version>0.9.1</version>

16

</dependency>

17

```

18

19

## Core Imports

20

21

```java

22

import org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator;

23

import org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator;

24

import org.deeplearning4j.datasets.datavec.SequenceRecordReaderDataSetIterator;

25

```

26

27

For Spark integration:

28

29

```java

30

import org.deeplearning4j.spark.datavec.DataVecDataSetFunction;

31

import org.deeplearning4j.spark.datavec.DataVecSequenceDataSetFunction;

32

```

33

34

## Basic Usage

35

36

```java

37

import org.datavec.api.records.reader.RecordReader;

38

import org.datavec.api.records.reader.impl.csv.CSVRecordReader;

39

import org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator;

40

import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;

41

42

// Create a CSV record reader

43

RecordReader recordReader = new CSVRecordReader();

44

recordReader.initialize(new FileSplit(new File("data.csv")));

45

46

// Create dataset iterator

47

int batchSize = 32;

48

int labelIndex = 4; // Index of label column

49

int numPossibleLabels = 3; // Number of classes

50

boolean regression = false;

51

52

DataSetIterator iterator = new RecordReaderDataSetIterator(

53

recordReader, batchSize, labelIndex, numPossibleLabels);

54

55

// Use with DeepLearning4j training

56

while (iterator.hasNext()) {

57

DataSet dataSet = iterator.next();

58

// Train your model with dataSet

59

}

60

```

61

62

## Architecture

63

64

DataVec integration is built around several key components:

65

66

- **DataSet Iterators**: Convert RecordReader data into DataSet objects for single-input neural networks

67

- **MultiDataSet Support**: Handle complex multi-input/multi-output scenarios through RecordReaderMultiDataSetIterator

68

- **Sequence Processing**: Time series and sequential data handling with alignment modes

69

- **Spark Integration**: Distributed data processing functions for large-scale training

70

- **Metadata Support**: Load specific records by metadata for debugging and reproducibility

71

72

## Capabilities

73

74

### DataSet Iteration

75

76

Core functionality for converting RecordReader data into DataSet objects suitable for DeepLearning4j training. Supports various data sources including CSV, images, and custom formats.

77

78

```java { .api }

79

public class RecordReaderDataSetIterator implements DataSetIterator {

80

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize,

81

int labelIndex, int numPossibleLabels);

82

public DataSet next();

83

public boolean hasNext();

84

public void reset();

85

}

86

```

87

88

[DataSet Iteration](./dataset-iteration.md)

89

90

### Sequence Processing

91

92

Time series and sequential data processing with configurable alignment modes. Handles variable-length sequences and provides multiple alignment strategies for batch processing.

93

94

```java { .api }

95

public class SequenceRecordReaderDataSetIterator implements DataSetIterator {

96

public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader,

97

SequenceRecordReader labelsReader,

98

int miniBatchSize, int numPossibleLabels);

99

public enum AlignmentMode { EQUAL_LENGTH, ALIGN_START, ALIGN_END }

100

}

101

```

102

103

[Sequence Processing](./sequence-processing.md)

104

105

### Multi-Input/Output Support

106

107

Advanced multi-modal data processing for complex neural network architectures with multiple inputs and outputs. Uses builder pattern for flexible configuration.

108

109

```java { .api }

110

public class RecordReaderMultiDataSetIterator implements MultiDataSetIterator {

111

public static class Builder {

112

public Builder addReader(String readerName, RecordReader recordReader);

113

public Builder addInput(String readerName, int columnFirst, int columnLast);

114

public Builder addOutput(String readerName, int column, int numClasses);

115

public RecordReaderMultiDataSetIterator build();

116

}

117

}

118

```

119

120

[Multi-Input/Output](./multi-input-output.md)

121

122

### Spark Integration

123

124

Distributed data processing functions for Apache Spark, enabling large-scale data processing and training across clusters.

125

126

```java { .api }

127

public class DataVecDataSetFunction implements Function<List<Writable>, DataSet> {

128

public DataVecDataSetFunction(int labelIndex, int numPossibleLabels, boolean regression);

129

public DataSet call(List<Writable> currList);

130

}

131

```

132

133

[Spark Integration](./spark-integration.md)

134

135

## Types

136

137

### Core Interfaces

138

139

```java { .api }

140

public interface DataSetIterator extends Iterator<DataSet> {

141

DataSet next(int num);

142

int totalExamples();

143

int inputColumns();

144

int totalOutcomes();

145

boolean resetSupported();

146

void reset();

147

boolean asyncSupported();

148

int batch();

149

int cursor();

150

void setPreProcessor(DataSetPreProcessor preProcessor);

151

DataSetPreProcessor getPreProcessor();

152

List<String> getLabels();

153

DataSet loadFromMetaData(RecordMetaData recordMetaData);

154

DataSet loadFromMetaData(List<RecordMetaData> list);

155

}

156

157

public interface MultiDataSetIterator extends Iterator<MultiDataSet> {

158

MultiDataSet next(int num);

159

boolean resetSupported();

160

boolean asyncSupported();

161

void reset();

162

void setPreProcessor(MultiDataSetPreProcessor preProcessor);

163

MultiDataSetPreProcessor getPreProcessor();

164

MultiDataSet loadFromMetaData(RecordMetaData recordMetaData);

165

MultiDataSet loadFromMetaData(List<RecordMetaData> list);

166

}

167

```

168

169

### Alignment Modes

170

171

```java { .api }

172

public enum AlignmentMode {

173

EQUAL_LENGTH, // Sequences must be same length

174

ALIGN_START, // Align sequences at start, pad end

175

ALIGN_END // Align sequences at end, pad start

176

}

177

```

178

179

### Exception Types

180

181

```java { .api }

182

public class ZeroLengthSequenceException extends RuntimeException {

183

public ZeroLengthSequenceException();

184

public ZeroLengthSequenceException(String type);

185

}

186

```