or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

annotations.mdapplication-framework.mddataset-management.mdindex.mdmapreduce-programs.mdplugin-framework.mdscheduling.mdservice-programs.mdspark-programs.mdsystem-services.mdtransactions.mdworker-programs.mdworkflow-programs.md
tile.json

tessl/maven-co-cask-cdap--cdap-api

Core application programming interface for the Cask Data Application Platform enabling development of scalable data processing applications on Hadoop ecosystems.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/co.cask.cdap/cdap-api@5.1.x

To install, run

npx @tessl/cli install tessl/maven-co-cask-cdap--cdap-api@5.1.0

index.mddocs/

CDAP API

The CDAP API provides a comprehensive set of Java interfaces and abstractions for developing applications on the Cask Data Application Platform (CDAP). CDAP is a unified data platform built on Apache Hadoop that enables developers to create scalable data applications, workflows, services, and batch/real-time processing programs without dealing directly with the complexity of the underlying Hadoop infrastructure.

Package Information

  • Package Name: cdap-api
  • Package Type: maven
  • Language: Java
  • Maven Coordinates: co.cask.cdap:cdap-api:5.1.2
  • Installation: Add to your Maven pom.xml:
<dependency>
    <groupId>co.cask.cdap</groupId>
    <artifactId>cdap-api</artifactId>
    <version>5.1.2</version>
</dependency>

Core Imports

import co.cask.cdap.api.app.Application;
import co.cask.cdap.api.app.AbstractApplication;
import co.cask.cdap.api.app.ApplicationConfigurer;
import co.cask.cdap.api.Config;
import co.cask.cdap.api.annotation.UseDataSet;
import co.cask.cdap.api.dataset.Dataset;

Basic Usage

import co.cask.cdap.api.app.Application;
import co.cask.cdap.api.app.AbstractApplication;
import co.cask.cdap.api.app.ApplicationConfigurer;
import co.cask.cdap.api.Config;

public class MyApplication extends AbstractApplication<Config> {
    
    @Override
    public void configure(ApplicationConfigurer configurer, ApplicationContext<Config> context) {
        configurer.setName("MyDataApp");
        configurer.setDescription("A sample CDAP application");
        
        // Add datasets, programs, services, etc.
        configurer.addMapReduce(new MyMapReduceJob());
        configurer.addService(new MyService());
    }
}

Architecture

The CDAP API is organized around several key architectural concepts:

  • Applications: Top-level containers that define the complete data processing solution
  • Programs: Executable components within applications (MapReduce, Spark, Workflows, Services, Workers)
  • Datasets: Abstraction layer for data storage and access
  • Plugins: Extensible components for custom functionality
  • Scheduling: Time-based and event-driven program execution
  • Services: HTTP-based APIs and long-running services

Capabilities

Application Framework

Core interfaces and classes for building CDAP applications with configuration, lifecycle management, and program organization.

public interface Application<T extends Config> {
    void configure(ApplicationConfigurer configurer, ApplicationContext<T> context);
}

public abstract class AbstractApplication<T extends Config> implements Application<T> {
    public final void configure(ApplicationConfigurer configurer, ApplicationContext<T> context);
    protected abstract void configure();
    protected final void setName(String name);
    protected final void setDescription(String description);
}

public interface ApplicationConfigurer extends DatasetConfigurer, PluginConfigurer {
    void setName(String name);
    void setDescription(String description);
    void addMapReduce(MapReduce mapReduce);
    void addSpark(Spark spark);
    void addWorkflow(Workflow workflow);
    void addService(Service service);
    void addWorker(Worker worker);
    ScheduleBuilder buildSchedule(String scheduleName, ProgramType programType, String programName);
    TriggerFactory getTriggerFactory();
}

Application Framework

Program Types

Support for various program types including MapReduce, Spark, Workflow orchestration, HTTP services, and background workers.

public interface MapReduce {
    void configure(MapReduceConfigurer configurer);
}

public interface Spark {
    void configure(SparkConfigurer configurer);
}

public interface Workflow {
    void configure(WorkflowConfigurer configurer);
}

MapReduce Programs

Spark Programs

Workflow Programs

Service Programs

Worker Programs

Dataset Management

Comprehensive dataset APIs with built-in types (key-value, indexed tables, file sets) and support for custom dataset implementations.

public interface Dataset extends Closeable {
    // Base dataset interface
}

public interface DatasetDefinition<D extends Dataset, A extends DatasetAdmin> {
    String getName();
    D getDataset(DatasetContext datasetContext, DatasetSpecification spec, 
                 Map<String, String> arguments, ClassLoader classLoader);
}

Dataset Management

Plugin Framework

Extensible plugin architecture for adding custom processing logic, data sources, sinks, and transformations.

public class PluginConfig {
    // Base plugin configuration
}

public interface PluginContext {
    <T> T newPluginInstance(String pluginId);
    <T> Class<T> loadPluginClass(String pluginId);
}

@Plugin(type = "source")
public class MySourcePlugin extends PluginConfig {
    // Custom plugin implementation
}

Plugin Framework

Scheduling and Triggers

Flexible scheduling system with time-based triggers, program status triggers, and partition-based triggers for automated program execution.

public class ScheduleBuilder {
    public static ScheduleBuilder create(String name, Trigger trigger);
    public ScheduleBuilder setDescription(String description);
    public ScheduleBuilder setProperties(Map<String, String> properties);
}

public interface Trigger {
    // Base trigger interface
}

Scheduling and Triggers

Transaction Management

Built-in support for ACID transactions across datasets with declarative transaction control and programmatic transaction management.

public interface Transactional {
    void execute(TxRunnable runnable);
    <T> T execute(Callable<T> callable);
}

@TransactionPolicy(TransactionControl.EXPLICIT)
public class MyProgram {
    // Explicit transaction control
}

Transaction Management

Annotations and Configuration

Rich annotation-based configuration system for dependency injection, transaction control, data access patterns, and plugin metadata.

// Flowlet dataset injection
@UseDataSet("myDataset")
private ObjectStore<Data> dataStore;  // In Flowlet context

@Property
@Description("Configuration property description")
private String configValue;

@TransactionPolicy(TransactionControl.IMPLICIT)
public class MyTransactionalProgram {
    // Implicit transaction handling
}

Annotations and Configuration

System Services

Integration with CDAP system services including metrics collection, service discovery, administrative operations, and artifact management.

public interface Metrics {
    void count(String metricName, int delta);
    void gauge(String metricName, long value);
}

public interface ServiceDiscoverer {
    Discoverable discover(String serviceName);
}

System Services

Types

public class Config {
    // Base configuration class for all configurable components
}

public enum ProgramType {
    FLOW, MAPREDUCE, WORKFLOW, SERVICE, SPARK, WORKER
}

public interface RuntimeContext {
    String getNamespace();
    String getApplicationName();
    ProgramType getProgramType();
    String getProgramName();
}

public interface ProgramLifecycle<T extends RuntimeContext> {
    void initialize(T context);
    void destroy();
}

public class Resources {
    private final int virtualCores;
    private final int memoryMB;
    
    public Resources(int memoryMB);
    public Resources(int memoryMB, int virtualCores);
}