or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-co-cask-cdap--cdap-data-fabric

Core data management capabilities for CDAP including dataset operations, metadata management, lineage tracking, audit functionality, and data registry services for Hadoop-based applications.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/co.cask.cdap/cdap-data-fabric@5.1.x

To install, run

npx @tessl/cli install tessl/maven-co-cask-cdap--cdap-data-fabric@5.1.0

0

# CDAP Data Fabric

1

2

CDAP Data Fabric is a core component of the Cask Data Application Platform that provides essential data management and infrastructure services for Hadoop-based applications. It handles dataset metrics reporting and monitoring, comprehensive metadata management with indexing capabilities for efficient search and discovery, data lineage tracking to understand data flow and transformations, audit trail functionality for compliance and debugging, and a centralized data registry for managing dataset definitions and configurations.

3

4

## Package Information

5

6

- **Package Name**: cdap-data-fabric

7

- **Package Type**: maven

8

- **Group ID**: co.cask.cdap

9

- **Language**: Java

10

- **Installation**: Add to Maven dependencies:

11

```xml

12

<dependency>

13

<groupId>co.cask.cdap</groupId>

14

<artifactId>cdap-data-fabric</artifactId>

15

<version>5.1.2</version>

16

</dependency>

17

```

18

19

## Core Imports

20

21

```java

22

import co.cask.cdap.data2.dataset2.DatasetFramework;

23

import co.cask.cdap.data2.metadata.dataset.MetadataDataset;

24

import co.cask.cdap.data2.registry.UsageRegistry;

25

import co.cask.cdap.store.NamespaceStore;

26

import co.cask.cdap.data2.audit.AuditPublisher;

27

```

28

29

## Basic Usage

30

31

```java

32

// Dataset management example

33

DatasetFramework datasetFramework = // ... obtain from DI container

34

DatasetId datasetId = NamespaceId.DEFAULT.dataset("myDataset");

35

DatasetProperties properties = DatasetProperties.builder().build();

36

37

// Create a dataset instance

38

datasetFramework.addInstance("keyValueTable", datasetId, properties, null);

39

40

// Access the dataset

41

KeyValueTable dataset = datasetFramework.getDataset(

42

datasetId, null, null, null, null, AccessType.READ_WRITE);

43

44

// Metadata management example

45

MetadataDataset metadataDataset = // ... obtain instance

46

MetadataEntity entity = // ... define entity

47

Map<String, String> properties = Map.of("environment", "production", "owner", "team-alpha");

48

49

// Set metadata properties

50

metadataDataset.setProperty(entity, properties);

51

52

// Add tags

53

Set<String> tags = Set.of("production", "critical", "team-alpha");

54

metadataDataset.addTags(entity, tags);

55

56

// Search metadata

57

SearchRequest searchRequest = SearchRequest.of("production").build();

58

SearchResults results = metadataDataset.search(searchRequest);

59

```

60

61

## Architecture

62

63

The CDAP Data Fabric follows a layered architecture that abstracts complex data operations:

64

65

- **Dataset Framework Layer**: Provides unified APIs for dataset lifecycle management across different storage backends (HBase, LevelDB, in-memory)

66

- **Metadata Management Layer**: Handles comprehensive metadata operations including properties, tags, lineage, and search with pluggable indexing strategies

67

- **Transaction Layer**: Integrates with Apache Tephra for ACID transactions across distributed datasets

68

- **Registry Layer**: Tracks usage relationships between programs and datasets for governance and lineage

69

- **Audit Layer**: Provides comprehensive audit trails for compliance and debugging

70

- **Storage Abstraction Layer**: Supports multiple storage backends with consistent APIs

71

72

This architecture enables developers to build scalable data applications without dealing directly with underlying Hadoop complexities while maintaining full transactional guarantees and comprehensive metadata management.

73

74

## Capabilities

75

76

### Dataset Management

77

78

Comprehensive dataset lifecycle management including creation, configuration, access, and administration across multiple storage backends with transaction support and lineage tracking.

79

80

```java { .api }

81

public interface DatasetFramework {

82

void addInstance(String datasetTypeName, DatasetId datasetInstanceId,

83

DatasetProperties props, KerberosPrincipalId ownerPrincipal)

84

throws DatasetManagementException, IOException;

85

<T extends Dataset> T getDataset(DatasetId datasetInstanceId, Map<String, String> arguments,

86

ClassLoader classLoader, DatasetClassLoaderProvider classLoaderProvider,

87

Iterable<? extends EntityId> owners, AccessType accessType)

88

throws DatasetManagementException, IOException;

89

void deleteInstance(DatasetId datasetInstanceId) throws DatasetManagementException, IOException;

90

Collection<DatasetSpecificationSummary> getInstances(NamespaceId namespaceId)

91

throws DatasetManagementException;

92

}

93

```

94

95

[Dataset Management](./dataset-management.md)

96

97

### Metadata Management

98

99

Complete metadata management system for properties, tags, search, and indexing with support for custom indexing strategies and historical snapshots.

100

101

```java { .api }

102

public class MetadataDataset extends AbstractDataset {

103

public MetadataChange setProperty(MetadataEntity metadataEntity, String key, String value);

104

public MetadataChange addTags(MetadataEntity metadataEntity, Set<String> tagsToAdd);

105

public Metadata getMetadata(MetadataEntity metadataEntity);

106

public SearchResults search(SearchRequest request) throws BadRequestException;

107

public Set<Metadata> getSnapshotBeforeTime(Set<MetadataEntity> metadataEntitys, long timeMillis);

108

}

109

```

110

111

[Metadata Management](./metadata-management.md)

112

113

### Usage Registry

114

115

Program-dataset relationship tracking for governance, lineage analysis, and impact assessment with comprehensive query capabilities.

116

117

```java { .api }

118

public interface UsageRegistry extends UsageWriter {

119

void unregister(ApplicationId applicationId);

120

Set<DatasetId> getDatasets(ApplicationId id);

121

Set<ProgramId> getPrograms(DatasetId id);

122

Set<StreamId> getStreams(ProgramId id);

123

}

124

```

125

126

[Usage Registry](./usage-registry.md)

127

128

### Namespace Management

129

130

Namespace lifecycle management for multi-tenancy support with metadata persistence and comprehensive administrative operations.

131

132

```java { .api }

133

public interface NamespaceStore {

134

NamespaceMeta create(NamespaceMeta metadata);

135

void update(NamespaceMeta metadata);

136

NamespaceMeta get(NamespaceId id);

137

NamespaceMeta delete(NamespaceId id);

138

List<NamespaceMeta> list();

139

}

140

```

141

142

[Namespace Management](./namespace-management.md)

143

144

### Audit and Compliance

145

146

Comprehensive audit logging system for compliance, monitoring, and debugging with pluggable publishers and structured payload builders.

147

148

```java { .api }

149

public interface AuditPublisher {

150

void publish(EntityId entityId, AuditType auditType, AuditPayload auditPayload);

151

void publish(MetadataEntity metadataEntity, AuditType auditType, AuditPayload auditPayload);

152

}

153

```

154

155

[Audit and Compliance](./audit-compliance.md)

156

157

### Transaction Management

158

159

Distributed transaction support with retry logic, consumer state management, and integration with Apache Tephra for ACID guarantees.

160

161

```java { .api }

162

public interface TransactionExecutorFactory extends org.apache.tephra.TransactionExecutorFactory {

163

// Transaction executor creation with custom configuration

164

}

165

166

public interface TransactionSystemClient {

167

// Transaction system client operations with distributed coordination

168

}

169

```

170

171

[Transaction Management](./transaction-management.md)

172

173

### Stream Processing

174

175

Real-time stream processing capabilities with coordination, file management, partitioning, and multiple decoder support for various data formats.

176

177

```java { .api }

178

public interface StreamAdmin {

179

// Stream administration and lifecycle operations

180

}

181

182

public interface StreamConsumer extends Closeable, TransactionAware {

183

// Stream consumption with transaction support and state management

184

}

185

```

186

187

[Stream Processing](./stream-processing.md)

188

189

## Common Types

190

191

```java { .api }

192

// Core entity identifiers

193

public final class DatasetId extends EntityId {

194

public static DatasetId of(String namespace, String dataset);

195

}

196

197

public final class NamespaceId extends EntityId {

198

public static final NamespaceId DEFAULT = new NamespaceId("default");

199

public static NamespaceId of(String namespace);

200

}

201

202

public final class ProgramId extends EntityId {

203

// Program identification with application and program type context

204

}

205

206

public final class ApplicationId extends EntityId {

207

// Application identification within namespace context

208

}

209

210

// Metadata entities

211

public interface MetadataEntity {

212

// Metadata entity representation for flexible entity types

213

}

214

215

// Dataset properties and specifications

216

public final class DatasetProperties {

217

public static Builder builder();

218

public Map<String, String> getProperties();

219

}

220

221

public interface DatasetSpecification {

222

String getName();

223

String getType();

224

DatasetProperties getProperties();

225

}

226

227

// Access and security

228

public enum AccessType {

229

READ, WRITE, ADMIN, READ_WRITE

230

}

231

232

public final class KerberosPrincipalId {

233

public static KerberosPrincipalId of(String principal);

234

}

235

236

// Metadata types

237

public enum MetadataScope {

238

USER, SYSTEM

239

}

240

241

public final class MetadataRecordV2 {

242

public MetadataEntity getMetadataEntity();

243

public Map<String, String> getProperties();

244

public Set<String> getTags();

245

public MetadataScope getScope();

246

}

247

248

public final class ViewSpecification {

249

// Stream view configuration specification

250

public String getFormat();

251

public Schema getSchema();

252

public Map<String, String> getSettings();

253

}

254

255

public final class ViewDetail {

256

// Complete view information including metadata

257

public StreamViewId getId();

258

public ViewSpecification getSpec();

259

public Map<String, String> getProperties();

260

}

261

262

public final class StreamViewId extends EntityId {

263

public static StreamViewId of(String namespace, String stream, String view);

264

public StreamId getParent();

265

public String getView();

266

}

267

268

public final class RetryStrategy {

269

// Configurable retry policies for operations

270

public static RetryStrategy noRetry();

271

public static RetryStrategy exponentialDelay(long initialDelay, long maxDelay, int maxAttempts);

272

}

273

274

public final class TransactionContextFactory {

275

// Factory for creating transaction contexts

276

}

277

278

// Exceptions

279

public class DatasetManagementException extends Exception {

280

// Dataset operation failures with detailed error context

281

}

282

283

public class BadRequestException extends Exception {

284

// Invalid request parameter handling

285

}

286

287

public class NotFoundException extends Exception {

288

// Resource not found handling

289

}

290

```