or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-co-cask-cdap--cdap-etl-proto

Protocol classes and configuration objects for ETL (Extract, Transform, Load) pipelines in CDAP

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/co.cask.cdap/cdap-etl-proto@5.1.x

To install, run

npx @tessl/cli install tessl/maven-co-cask-cdap--cdap-etl-proto@5.1.0

0

# CDAP ETL Protocol

1

2

CDAP ETL Protocol provides protocol classes and configuration objects for defining ETL (Extract, Transform, Load) pipelines in the Cask Data Application Platform (CDAP). It contains core data structures for representing ETL configurations, stages, plugins, and connections with comprehensive backward compatibility through versioned protocol support (v0, v1, v2).

3

4

## Package Information

5

6

- **Package Name**: cdap-etl-proto

7

- **Package Type**: maven

8

- **Language**: Java

9

- **Installation**:

10

```xml

11

<dependency>

12

<groupId>co.cask.cdap</groupId>

13

<artifactId>cdap-etl-proto</artifactId>

14

<version>5.1.2</version>

15

</dependency>

16

```

17

18

## Core Imports

19

20

```java

21

import co.cask.cdap.etl.proto.v2.*;

22

import co.cask.cdap.etl.proto.Connection;

23

import co.cask.cdap.api.Resources;

24

import co.cask.cdap.etl.api.Engine;

25

```

26

27

For legacy compatibility:

28

```java

29

import co.cask.cdap.etl.proto.v1.*;

30

import co.cask.cdap.etl.proto.v0.*;

31

```

32

33

## Basic Usage

34

35

```java

36

import co.cask.cdap.etl.proto.v2.*;

37

import co.cask.cdap.etl.proto.Connection;

38

import co.cask.cdap.api.Resources;

39

40

// Create ETL plugin configuration

41

Map<String, String> sourceProps = new HashMap<>();

42

sourceProps.put("name", "users");

43

sourceProps.put("schema.row.field", "userid");

44

45

ETLPlugin sourcePlugin = new ETLPlugin(

46

"Table",

47

"batchsource",

48

sourceProps

49

);

50

51

Map<String, String> transformProps = new HashMap<>();

52

transformProps.put("script", "function transform(input, emitter, context) { emitter.emit(input); }");

53

54

ETLPlugin transformPlugin = new ETLPlugin(

55

"JavaScript",

56

"transform",

57

transformProps

58

);

59

60

Map<String, String> sinkProps = new HashMap<>();

61

sinkProps.put("name", "processed_users");

62

63

ETLPlugin sinkPlugin = new ETLPlugin(

64

"Table",

65

"batchsink",

66

sinkProps

67

);

68

69

// Create ETL stages

70

ETLStage sourceStage = new ETLStage("users_source", sourcePlugin);

71

ETLStage transformStage = new ETLStage("process_users", transformPlugin);

72

ETLStage sinkStage = new ETLStage("users_sink", sinkPlugin);

73

74

// Create pipeline connections

75

Set<Connection> connections = new HashSet<>();

76

connections.add(new Connection("users_source", "process_users"));

77

connections.add(new Connection("process_users", "users_sink"));

78

79

// Build batch ETL configuration

80

ETLBatchConfig config = ETLBatchConfig.builder()

81

.addStage(sourceStage)

82

.addStage(transformStage)

83

.addStage(sinkStage)

84

.addConnections(connections)

85

.setResources(new Resources(1024, 2))

86

.build();

87

88

// Validate configuration

89

config.validate();

90

```

91

92

## Architecture

93

94

CDAP ETL Protocol is structured around versioned protocol packages that enable backward compatibility and smooth upgrades:

95

96

- **Version Management**: Three protocol versions (v0, v1, v2) with automatic upgrade paths

97

- **Configuration Hierarchy**: Base ETL configuration classes extended for specific pipeline types (batch, streaming)

98

- **Stage System**: Modular pipeline components with plugin-based architecture

99

- **Connection Model**: Explicit data flow definition between pipeline stages

100

- **Resource Management**: Configurable resource allocation for different pipeline components

101

- **Validation Framework**: Comprehensive configuration validation with detailed error reporting

102

103

## Capabilities

104

105

### Pipeline Configuration (v2 - Current)

106

107

Latest version ETL pipeline configuration with comprehensive features for batch and streaming data processing. Includes advanced resource management, stage logging, and extensive property support.

108

109

```java { .api }

110

public class ETLConfig extends Config implements UpgradeableConfig {

111

public String getDescription();

112

public Set<ETLStage> getStages();

113

public Set<Connection> getConnections();

114

public Resources getResources();

115

public Resources getDriverResources();

116

public Resources getClientResources();

117

public int getNumOfRecordsPreview();

118

public boolean isStageLoggingEnabled();

119

public boolean isProcessTimingEnabled();

120

public Map<String, String> getProperties();

121

public void validate();

122

public boolean canUpgrade();

123

public UpgradeableConfig upgrade(UpgradeContext upgradeContext);

124

}

125

126

public final class ETLBatchConfig extends ETLConfig {

127

public List<ETLStage> getPostActions();

128

public Engine getEngine();

129

public String getSchedule();

130

public Integer getMaxConcurrentRuns();

131

public ETLBatchConfig convertOldConfig();

132

public static Builder builder();

133

public static Builder builder(String schedule);

134

}

135

136

public final class DataStreamsConfig extends ETLConfig {

137

public String getBatchInterval();

138

public boolean isUnitTest();

139

public boolean checkpointsDisabled();

140

public String getExtraJavaOpts();

141

public Boolean getStopGracefully();

142

public String getCheckpointDir();

143

public static Builder builder();

144

}

145

```

146

147

[Pipeline Configuration](./pipeline-configuration.md)

148

149

### Stage and Plugin Management

150

151

Core components for defining individual pipeline stages and their associated plugins. Handles plugin configuration, artifact selection, and validation.

152

153

```java { .api }

154

public final class ETLStage {

155

public ETLStage(String name, ETLPlugin plugin);

156

public String getName();

157

public ETLPlugin getPlugin();

158

public void validate();

159

}

160

161

public class ETLPlugin {

162

public ETLPlugin(String name, String type, Map<String, String> properties);

163

public ETLPlugin(String name, String type, Map<String, String> properties, ArtifactSelectorConfig artifact);

164

public String getName();

165

public String getType();

166

public Map<String, String> getProperties();

167

public PluginProperties getPluginProperties();

168

public ArtifactSelectorConfig getArtifactConfig();

169

public void validate();

170

}

171

```

172

173

[Stage and Plugin Management](./stage-plugin-management.md)

174

175

### Connection and Data Flow

176

177

Connection management for defining data flow between pipeline stages, including support for conditional connections and port-based routing.

178

179

```java { .api }

180

public class Connection {

181

public Connection(String from, String to);

182

public Connection(String from, String to, String port);

183

public Connection(String from, String to, Boolean condition);

184

public String getFrom();

185

public String getTo();

186

public String getPort();

187

public Boolean getCondition();

188

}

189

```

190

191

[Connection and Data Flow](./connection-data-flow.md)

192

193

### Pipeline Triggering and Property Mapping

194

195

Advanced pipeline triggering capabilities with property mapping between triggering and triggered pipelines, supporting both argument and plugin property mappings.

196

197

```java { .api }

198

public class TriggeringPropertyMapping {

199

public TriggeringPropertyMapping(List<ArgumentMapping> arguments, List<PluginPropertyMapping> pluginProperties);

200

public List<ArgumentMapping> getArguments();

201

public List<PluginPropertyMapping> getPluginProperties();

202

}

203

204

public class ArgumentMapping {

205

public ArgumentMapping(String source, String target);

206

public String getSource();

207

public String getTarget();

208

}

209

210

public class PluginPropertyMapping extends ArgumentMapping {

211

public PluginPropertyMapping(String stageName, String source, String target);

212

public String getStageName();

213

}

214

```

215

216

[Pipeline Triggering](./pipeline-triggering.md)

217

218

### Legacy Version Support

219

220

Backward compatibility support for v0 and v1 protocol versions with automatic upgrade mechanisms to current v2 format.

221

222

```java { .api }

223

public interface UpgradeableConfig<T extends UpgradeableConfig> {

224

boolean canUpgrade();

225

T upgrade(UpgradeContext upgradeContext);

226

}

227

228

public interface UpgradeContext {

229

ArtifactSelectorConfig getPluginArtifact(String pluginType, String pluginName);

230

}

231

```

232

233

[Legacy Version Support](./legacy-version-support.md)

234

235

### Artifact and Resource Management

236

237

Configuration management for plugin artifacts and pipeline resource allocation, including driver, client, and executor resource specifications.

238

239

```java { .api }

240

public class ArtifactSelectorConfig {

241

public ArtifactSelectorConfig();

242

public ArtifactSelectorConfig(String scope, String name, String version);

243

public String getScope();

244

public String getName();

245

public String getVersion();

246

}

247

```

248

249

[Artifact and Resource Management](./artifact-resource-management.md)

250

251

## Types

252

253

```java { .api }

254

// From co.cask.cdap.etl.api.Engine

255

public enum Engine {

256

MAPREDUCE, SPARK

257

}

258

259

// From co.cask.cdap.api.Resources

260

public class Resources {

261

public Resources(int memoryMB, int virtualCores);

262

public int getMemoryMB();

263

public int getVirtualCores();

264

}

265

266

// Validation error structure (conceptual)

267

public class ValidationError {

268

public String getMessage();

269

public String getField();

270

}

271

```