or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-io-cdap-cdap--cdap

The Cask Data Application Platform (CDAP) is an integrated, open source application development platform for the Hadoop ecosystem that provides developers with data and application abstractions to simplify and accelerate application development.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/io.cdap.cdap/cdap@6.11.x

To install, run

npx @tessl/cli install tessl/maven-io-cdap-cdap--cdap@6.11.0

0

# CDAP - Cask Data Application Platform

1

2

The Cask Data Application Platform (CDAP) is an integrated, open source application development platform for the Hadoop ecosystem that provides developers with data and application abstractions to simplify and accelerate application development, address a broader range of real-time and batch use cases, and deploy applications into production while satisfying enterprise requirements.

3

4

## Package Information

5

6

- **Package Name**: cdap

7

- **Package Type**: maven

8

- **Language**: Java

9

- **Installation**: Add to your Maven dependencies:

10

11

```xml

12

<dependency>

13

<groupId>io.cdap.cdap</groupId>

14

<artifactId>cdap-api</artifactId>

15

<version>6.11.0</version>

16

</dependency>

17

```

18

19

## Overview

20

21

CDAP abstracts the complexity of underlying infrastructure while providing developers with powerful tools for building portable, maintainable data applications. From simple MapReduce jobs to complex ETL pipelines and real-time data processing workflows, CDAP enables enterprise-ready development with features like automatic metadata capture, operational control, security integration, and lineage tracking.

22

23

## Core Imports

24

25

```java

26

// Core application framework

27

import io.cdap.cdap.api.app.*;

28

import io.cdap.cdap.api.*;

29

30

// Configuration

31

import io.cdap.cdap.api.Config;

32

33

// Common utilities

34

import io.cdap.cdap.api.common.*;

35

```

36

37

For specific program types:

38

39

```java

40

// MapReduce

41

import io.cdap.cdap.api.mapreduce.*;

42

43

// Spark (Beta)

44

import io.cdap.cdap.api.spark.*;

45

46

// Services

47

import io.cdap.cdap.api.service.*;

48

import io.cdap.cdap.api.service.http.*;

49

50

// Workers

51

import io.cdap.cdap.api.worker.*;

52

53

// Workflows

54

import io.cdap.cdap.api.workflow.*;

55

```

56

57

## Basic Usage

58

59

```java

60

import io.cdap.cdap.api.app.*;

61

import io.cdap.cdap.api.*;

62

63

// 1. Define application configuration

64

public class MyAppConfig extends Config {

65

private String inputDataset = "input";

66

private String outputDataset = "output";

67

68

// getters and setters...

69

}

70

71

// 2. Create a simple application

72

public class MyDataApp extends AbstractApplication<MyAppConfig> {

73

74

@Override

75

public void configure(ApplicationConfigurer configurer,

76

ApplicationContext<MyAppConfig> context) {

77

MyAppConfig config = context.getConfig();

78

79

// Set application metadata

80

configurer.setName("MyDataProcessingApp");

81

configurer.setDescription("Processes data from input to output");

82

83

// Add a simple MapReduce program

84

configurer.addMapReduce(new MyMapReduce());

85

86

// Create datasets

87

configurer.createDataset(config.inputDataset, Table.class);

88

configurer.createDataset(config.outputDataset, Table.class);

89

}

90

}

91

92

// 3. Simple MapReduce program

93

public class MyMapReduce extends AbstractMapReduce {

94

@Override

95

public void configure(MapReduceConfigurer configurer) {

96

configurer.setName("MyMapReduce");

97

configurer.setDescription("Simple data processing");

98

}

99

}

100

```

101

102

## Package Structure

103

104

The CDAP API is organized into logical functional areas:

105

106

```java { .api }

107

// Core API packages

108

io.cdap.cdap.api.* // Base interfaces and classes

109

io.cdap.cdap.api.annotation.* // Annotations for configuration and metadata

110

111

// Application Framework

112

io.cdap.cdap.api.app.* // Application building blocks

113

io.cdap.cdap.api.service.* // HTTP services and handlers

114

io.cdap.cdap.api.worker.* // Worker programs

115

io.cdap.cdap.api.workflow.* // Workflow orchestration

116

117

// Data Processing

118

io.cdap.cdap.api.mapreduce.* // MapReduce integration

119

io.cdap.cdap.api.spark.* // Spark integration

120

io.cdap.cdap.api.customaction.* // Custom workflow actions

121

122

// Data Management

123

io.cdap.cdap.api.dataset.* // Dataset abstractions and implementations

124

io.cdap.cdap.api.messaging.* // Messaging system integration

125

126

// Plugin System

127

io.cdap.cdap.api.plugin.* // Extensibility framework

128

129

// Operations & Governance

130

io.cdap.cdap.api.metrics.* // Metrics collection

131

io.cdap.cdap.api.metadata.* // Metadata management

132

io.cdap.cdap.api.lineage.* // Data lineage tracking

133

io.cdap.cdap.api.security.* // Authentication and authorization

134

```

135

136

## Architecture Overview

137

138

CDAP applications follow a component-based architecture where different types of programs work together:

139

140

### Core Components

141

142

```java { .api }

143

// Application - Root container

144

public interface Application<T extends Config> {

145

void configure(ApplicationConfigurer configurer, ApplicationContext<T> context);

146

default boolean isUpdateSupported() { return false; }

147

default ApplicationUpdateResult<T> updateConfig(ApplicationUpdateContext applicationUpdateContext)

148

throws Exception {

149

throw new UnsupportedOperationException("Application config update operation is not supported.");

150

}

151

}

152

153

// Base program types

154

public enum ProgramType {

155

MAPREDUCE, // Batch data processing with Hadoop MapReduce

156

SPARK, // Batch/streaming processing with Apache Spark

157

SERVICE, // HTTP services for real-time data access

158

WORKER, // Long-running background processes

159

WORKFLOW // Orchestration of multiple programs

160

}

161

162

// Resource allocation

163

public final class Resources {

164

public static final int DEFAULT_VIRTUAL_CORES = 1;

165

public static final int DEFAULT_MEMORY_MB = 512;

166

167

public Resources() { /* 512MB, 1 core */ }

168

public Resources(int memoryMB) { /* specified memory, 1 core */ }

169

public Resources(int memoryMB, int cores) { /* specified memory and cores */ }

170

171

public int getMemoryMB() { /* returns allocated memory */ }

172

public int getVirtualCores() { /* returns allocated CPU cores */ }

173

}

174

```

175

176

### Application Lifecycle

177

178

```java { .api }

179

// Program lifecycle interface

180

public interface ProgramLifecycle<T extends RuntimeContext> {

181

@TransactionPolicy(TransactionControl.IMPLICIT)

182

void initialize(T context) throws Exception;

183

184

@TransactionPolicy(TransactionControl.IMPLICIT)

185

void destroy();

186

}

187

188

// Program execution states

189

public enum ProgramStatus {

190

INITIALIZING, // Program is starting up

191

RUNNING, // Program is executing

192

STOPPING, // Program is shutting down

193

COMPLETED, // Program finished successfully

194

FAILED, // Program failed with error

195

KILLED; // Program was terminated

196

197

public static final Set<ProgramStatus> TERMINAL_STATES =

198

EnumSet.of(COMPLETED, FAILED, KILLED);

199

}

200

```

201

202

## Getting Started

203

204

### Basic Application Structure

205

206

```java { .api }

207

import io.cdap.cdap.api.app.*;

208

import io.cdap.cdap.api.Config;

209

210

// 1. Define application configuration

211

public class MyAppConfig extends Config {

212

@Description("Input dataset name")

213

private String inputDataset = "input";

214

215

@Description("Output dataset name")

216

private String outputDataset = "output";

217

218

// getters and setters...

219

}

220

221

// 2. Create the application

222

public class MyDataApp extends AbstractApplication<MyAppConfig> {

223

224

@Override

225

public void configure(ApplicationConfigurer configurer,

226

ApplicationContext<MyAppConfig> context) {

227

MyAppConfig config = context.getConfig();

228

229

// Set application metadata

230

configurer.setName("MyDataProcessingApp");

231

configurer.setDescription("Processes data from input to output");

232

233

// Add programs

234

configurer.addMapReduce(new MyMapReduce());

235

configurer.addSpark(new MySparkProgram());

236

configurer.addService(new MyDataService());

237

configurer.addWorkflow(new MyProcessingWorkflow());

238

239

// Create datasets

240

configurer.createDataset(config.inputDataset, Table.class);

241

configurer.createDataset(config.outputDataset, Table.class);

242

}

243

}

244

```

245

246

### Runtime Context Access

247

248

All CDAP programs receive runtime context providing access to system services:

249

250

```java { .api }

251

// Base runtime context

252

public interface RuntimeContext extends FeatureFlagsProvider {

253

String getNamespace();

254

ApplicationSpecification getApplicationSpecification();

255

String getClusterName();

256

long getLogicalStartTime();

257

Map<String, String> getRuntimeArguments();

258

Metrics getMetrics();

259

}

260

261

// Dataset access context

262

public interface DatasetContext {

263

<T extends Dataset> T getDataset(String name) throws DataSetException;

264

<T extends Dataset> T getDataset(String namespace, String name) throws DataSetException;

265

void releaseDataset(Dataset dataset);

266

void discardDataset(Dataset dataset);

267

}

268

269

// Service discovery

270

public interface ServiceDiscoverer {

271

URL getServiceURL(String applicationId, String serviceId);

272

URL getServiceURL(String applicationId, String serviceId, String methodPath);

273

}

274

```

275

276

## Core Concepts

277

278

### Configuration and Plugins

279

280

```java { .api }

281

// Configuration base class

282

public class Config implements Serializable {

283

// Base for all configuration classes

284

}

285

286

// Plugin configuration

287

public abstract class PluginConfig extends Config {

288

// Base for plugin configurations

289

}

290

291

// Plugin interface

292

public interface PluginConfigurer {

293

<T> T usePlugin(String pluginType, String pluginName, String pluginId,

294

PluginProperties properties);

295

<T> Class<T> usePluginClass(String pluginType, String pluginName, String pluginId,

296

PluginProperties properties);

297

}

298

```

299

300

### Transaction Management

301

302

```java { .api }

303

// Transactional interface for explicit transaction control

304

public interface Transactional {

305

<T> T execute(TxRunnable runnable) throws TransactionFailureException;

306

<T> T execute(int timeoutInSeconds, TxRunnable runnable)

307

throws TransactionFailureException;

308

}

309

310

// Transactional operations

311

public interface TxRunnable {

312

void run(DatasetContext context) throws Exception;

313

}

314

315

public interface TxCallable<V> {

316

V call(DatasetContext context) throws Exception;

317

}

318

319

// Utility for transaction operations

320

public final class Transactionals {

321

// Utility methods for transaction management

322

}

323

```

324

325

### Annotations

326

327

Key annotations for configuration and behavior control:

328

329

```java { .api }

330

// Core annotations

331

@Description("Provides descriptive text for API elements")

332

@Name("Specifies custom names for elements")

333

@Property // Marks fields as configuration properties

334

@Macro // Enables macro substitution in field values

335

336

// Plugin annotations

337

@Plugin(type = "source") // Marks classes as plugins of specific types

338

@Category("transform") // Categorizes elements for organization

339

340

// Metadata annotations

341

@Metadata(properties = {@MetadataProperty(key = "author", value = "team")})

342

```

343

344

## Key Features

345

346

### Enterprise Capabilities

347

348

- **Automatic Metadata Capture**: Track data lineage and transformations

349

- **Security Integration**: Role-based access control and secure storage

350

- **Operational Control**: Metrics collection, logging, and monitoring

351

- **Plugin Extensibility**: Custom transformations and connectors

352

- **Multi-tenancy**: Namespace isolation and resource management

353

354

### Hadoop Ecosystem Integration

355

356

- **Apache Spark**: Native integration for batch and streaming processing

357

- **Hadoop MapReduce**: Direct MapReduce job execution and management

358

- **Apache Hive**: SQL-based data processing and analytics

359

- **Apache HBase**: NoSQL database operations and management

360

- **HDFS**: Distributed file system access and operations

361

362

### Development Productivity

363

364

- **Unified API**: Consistent programming model across all components

365

- **Configuration Management**: Type-safe configuration with validation

366

- **Testing Support**: Local execution and testing frameworks

367

- **Deployment Flexibility**: Multi-environment deployment support

368

369

## Quick Reference

370

371

### Import Statements

372

373

```java { .api }

374

// Core application framework

375

import io.cdap.cdap.api.app.*;

376

import io.cdap.cdap.api.*;

377

378

// Data processing

379

import io.cdap.cdap.api.mapreduce.*;

380

import io.cdap.cdap.api.spark.*;

381

382

// Data access

383

import io.cdap.cdap.api.dataset.*;

384

import io.cdap.cdap.api.dataset.lib.*;

385

386

// Services

387

import io.cdap.cdap.api.service.*;

388

import io.cdap.cdap.api.service.http.*;

389

390

// Workflows

391

import io.cdap.cdap.api.workflow.*;

392

393

// Plugin system

394

import io.cdap.cdap.api.plugin.*;

395

396

// Annotations

397

import io.cdap.cdap.api.annotation.*;

398

399

// Utilities

400

import io.cdap.cdap.api.common.*;

401

import io.cdap.cdap.api.metrics.*;

402

```

403

404

### Program Types Summary

405

406

| Program Type | Purpose | Context Interface | Use Cases |

407

|--------------|---------|------------------|-----------|

408

| **MapReduce** | Batch processing | `MapReduceContext` | ETL, data transformation, aggregation |

409

| **Spark** | Batch/Stream processing | `SparkClientContext` | ML, real-time analytics, complex transformations |

410

| **Service** | HTTP endpoints | `HttpServiceContext` | REST APIs, real-time queries, data serving |

411

| **Worker** | Background processing | `WorkerContext` | Data ingestion, monitoring, housekeeping |

412

| **Workflow** | Orchestration | `WorkflowContext` | Multi-step pipelines, conditional logic |

413

414

## Next Steps

415

416

- **[Application Framework](application-framework.md)**: Learn about building applications, services, and workflows

417

- **[Data Processing](data-processing.md)**: Explore MapReduce and Spark integration patterns

418

- **[Data Management](data-management.md)**: Understand datasets, tables, and data access patterns

419

- **[Plugin System](plugin-system.md)**: Build extensible applications with custom plugins

420

- **[Security & Metadata](security-metadata.md)**: Implement security and governance features

421

- **[Operational APIs](operational.md)**: Add metrics, scheduling, and operational control

422

423

The CDAP API provides a complete toolkit for enterprise data application development, combining the power of Hadoop ecosystem tools with enterprise-grade operational features.