Core application programming interface for the Cask Data Application Platform enabling development of scalable data processing applications on Hadoop ecosystems.
npx @tessl/cli install tessl/maven-co-cask-cdap--cdap-api@5.1.00
# CDAP API
1
2
The CDAP API provides a comprehensive set of Java interfaces and abstractions for developing applications on the Cask Data Application Platform (CDAP). CDAP is a unified data platform built on Apache Hadoop that enables developers to create scalable data applications, workflows, services, and batch/real-time processing programs without dealing directly with the complexity of the underlying Hadoop infrastructure.
3
4
## Package Information
5
6
- **Package Name**: cdap-api
7
- **Package Type**: maven
8
- **Language**: Java
9
- **Maven Coordinates**: `co.cask.cdap:cdap-api:5.1.2`
10
- **Installation**: Add to your Maven `pom.xml`:
11
12
```xml
13
<dependency>
14
<groupId>co.cask.cdap</groupId>
15
<artifactId>cdap-api</artifactId>
16
<version>5.1.2</version>
17
</dependency>
18
```
19
20
## Core Imports
21
22
```java
23
import co.cask.cdap.api.app.Application;
24
import co.cask.cdap.api.app.AbstractApplication;
25
import co.cask.cdap.api.app.ApplicationConfigurer;
26
import co.cask.cdap.api.Config;
27
import co.cask.cdap.api.annotation.UseDataSet;
28
import co.cask.cdap.api.dataset.Dataset;
29
```
30
31
## Basic Usage
32
33
```java
34
import co.cask.cdap.api.app.Application;
35
import co.cask.cdap.api.app.AbstractApplication;
36
import co.cask.cdap.api.app.ApplicationConfigurer;
37
import co.cask.cdap.api.Config;
38
39
public class MyApplication extends AbstractApplication<Config> {
40
41
@Override
42
public void configure(ApplicationConfigurer configurer, ApplicationContext<Config> context) {
43
configurer.setName("MyDataApp");
44
configurer.setDescription("A sample CDAP application");
45
46
// Add datasets, programs, services, etc.
47
configurer.addMapReduce(new MyMapReduceJob());
48
configurer.addService(new MyService());
49
}
50
}
51
```
52
53
## Architecture
54
55
The CDAP API is organized around several key architectural concepts:
56
57
- **Applications**: Top-level containers that define the complete data processing solution
58
- **Programs**: Executable components within applications (MapReduce, Spark, Workflows, Services, Workers)
59
- **Datasets**: Abstraction layer for data storage and access
60
- **Plugins**: Extensible components for custom functionality
61
- **Scheduling**: Time-based and event-driven program execution
62
- **Services**: HTTP-based APIs and long-running services
63
64
## Capabilities
65
66
### Application Framework
67
68
Core interfaces and classes for building CDAP applications with configuration, lifecycle management, and program organization.
69
70
```java { .api }
71
public interface Application<T extends Config> {
72
void configure(ApplicationConfigurer configurer, ApplicationContext<T> context);
73
}
74
75
public abstract class AbstractApplication<T extends Config> implements Application<T> {
76
public final void configure(ApplicationConfigurer configurer, ApplicationContext<T> context);
77
protected abstract void configure();
78
protected final void setName(String name);
79
protected final void setDescription(String description);
80
}
81
82
public interface ApplicationConfigurer extends DatasetConfigurer, PluginConfigurer {
83
void setName(String name);
84
void setDescription(String description);
85
void addMapReduce(MapReduce mapReduce);
86
void addSpark(Spark spark);
87
void addWorkflow(Workflow workflow);
88
void addService(Service service);
89
void addWorker(Worker worker);
90
ScheduleBuilder buildSchedule(String scheduleName, ProgramType programType, String programName);
91
TriggerFactory getTriggerFactory();
92
}
93
```
94
95
[Application Framework](./application-framework.md)
96
97
### Program Types
98
99
Support for various program types including MapReduce, Spark, Workflow orchestration, HTTP services, and background workers.
100
101
```java { .api }
102
public interface MapReduce {
103
void configure(MapReduceConfigurer configurer);
104
}
105
106
public interface Spark {
107
void configure(SparkConfigurer configurer);
108
}
109
110
public interface Workflow {
111
void configure(WorkflowConfigurer configurer);
112
}
113
```
114
115
[MapReduce Programs](./mapreduce-programs.md)
116
117
[Spark Programs](./spark-programs.md)
118
119
[Workflow Programs](./workflow-programs.md)
120
121
[Service Programs](./service-programs.md)
122
123
[Worker Programs](./worker-programs.md)
124
125
### Dataset Management
126
127
Comprehensive dataset APIs with built-in types (key-value, indexed tables, file sets) and support for custom dataset implementations.
128
129
```java { .api }
130
public interface Dataset extends Closeable {
131
// Base dataset interface
132
}
133
134
public interface DatasetDefinition<D extends Dataset, A extends DatasetAdmin> {
135
String getName();
136
D getDataset(DatasetContext datasetContext, DatasetSpecification spec,
137
Map<String, String> arguments, ClassLoader classLoader);
138
}
139
```
140
141
[Dataset Management](./dataset-management.md)
142
143
### Plugin Framework
144
145
Extensible plugin architecture for adding custom processing logic, data sources, sinks, and transformations.
146
147
```java { .api }
148
public class PluginConfig {
149
// Base plugin configuration
150
}
151
152
public interface PluginContext {
153
<T> T newPluginInstance(String pluginId);
154
<T> Class<T> loadPluginClass(String pluginId);
155
}
156
157
@Plugin(type = "source")
158
public class MySourcePlugin extends PluginConfig {
159
// Custom plugin implementation
160
}
161
```
162
163
[Plugin Framework](./plugin-framework.md)
164
165
### Scheduling and Triggers
166
167
Flexible scheduling system with time-based triggers, program status triggers, and partition-based triggers for automated program execution.
168
169
```java { .api }
170
public class ScheduleBuilder {
171
public static ScheduleBuilder create(String name, Trigger trigger);
172
public ScheduleBuilder setDescription(String description);
173
public ScheduleBuilder setProperties(Map<String, String> properties);
174
}
175
176
public interface Trigger {
177
// Base trigger interface
178
}
179
```
180
181
[Scheduling and Triggers](./scheduling.md)
182
183
### Transaction Management
184
185
Built-in support for ACID transactions across datasets with declarative transaction control and programmatic transaction management.
186
187
```java { .api }
188
public interface Transactional {
189
void execute(TxRunnable runnable);
190
<T> T execute(Callable<T> callable);
191
}
192
193
@TransactionPolicy(TransactionControl.EXPLICIT)
194
public class MyProgram {
195
// Explicit transaction control
196
}
197
```
198
199
[Transaction Management](./transactions.md)
200
201
### Annotations and Configuration
202
203
Rich annotation-based configuration system for dependency injection, transaction control, data access patterns, and plugin metadata.
204
205
```java { .api }
206
// Flowlet dataset injection
207
@UseDataSet("myDataset")
208
private ObjectStore<Data> dataStore; // In Flowlet context
209
210
@Property
211
@Description("Configuration property description")
212
private String configValue;
213
214
@TransactionPolicy(TransactionControl.IMPLICIT)
215
public class MyTransactionalProgram {
216
// Implicit transaction handling
217
}
218
```
219
220
[Annotations and Configuration](./annotations.md)
221
222
### System Services
223
224
Integration with CDAP system services including metrics collection, service discovery, administrative operations, and artifact management.
225
226
```java { .api }
227
public interface Metrics {
228
void count(String metricName, int delta);
229
void gauge(String metricName, long value);
230
}
231
232
public interface ServiceDiscoverer {
233
Discoverable discover(String serviceName);
234
}
235
```
236
237
[System Services](./system-services.md)
238
239
## Types
240
241
```java { .api }
242
public class Config {
243
// Base configuration class for all configurable components
244
}
245
246
public enum ProgramType {
247
FLOW, MAPREDUCE, WORKFLOW, SERVICE, SPARK, WORKER
248
}
249
250
public interface RuntimeContext {
251
String getNamespace();
252
String getApplicationName();
253
ProgramType getProgramType();
254
String getProgramName();
255
}
256
257
public interface ProgramLifecycle<T extends RuntimeContext> {
258
void initialize(T context);
259
void destroy();
260
}
261
262
public class Resources {
263
private final int virtualCores;
264
private final int memoryMB;
265
266
public Resources(int memoryMB);
267
public Resources(int memoryMB, int virtualCores);
268
}
269
```