CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-co-cask-cdap--cdap-api-common

Core API classes and utilities for CDAP application development, providing common data schema definitions, data format abstractions, stream event handling, and byte manipulation utilities

Pending
Overview
Eval results
Files

schema-system.mddocs/

Schema System

Comprehensive data schema definition system supporting primitive types, complex nested structures, logical types for dates and timestamps, and schema compatibility checking. The schema system provides type safety and validation for data processing pipelines, serving as the foundation for structured data handling in CDAP applications.

Capabilities

Schema Creation

Create schemas for various data types including primitives, complex structures, and logical types.

/**
 * Create schema for simple/primitive types
 * @param type The primitive type (NULL, BOOLEAN, INT, LONG, FLOAT, DOUBLE, BYTES, STRING)
 * @return Schema for the specified type
 * @throws IllegalArgumentException if type is not a simple type
 */
public static Schema of(Schema.Type type);

/**
 * Create schema for logical types (dates, timestamps)
 * @param logicalType The logical type (DATE, TIMESTAMP_MILLIS, etc.)
 * @return Schema for the specified logical type
 */
public static Schema of(Schema.LogicalType logicalType);

/**
 * Create nullable schema (union with null)
 * @param schema Schema to make nullable
 * @return Union schema of given schema and null
 * @throws IllegalArgumentException if schema is already null type
 */
public static Schema nullableOf(Schema schema);

Usage Examples:

// Simple types
Schema stringSchema = Schema.of(Schema.Type.STRING);
Schema intSchema = Schema.of(Schema.Type.INT);
Schema nullableInt = Schema.nullableOf(intSchema);

// Logical types
Schema dateSchema = Schema.of(Schema.LogicalType.DATE);
Schema timestampSchema = Schema.of(Schema.LogicalType.TIMESTAMP_MILLIS);

Complex Type Creation

Create schemas for complex data structures including arrays, maps, records, unions, and enums.

/**
 * Create array schema
 * @param componentSchema Schema of array elements
 * @return Array schema
 */
public static Schema arrayOf(Schema componentSchema);

/**
 * Create map schema
 * @param keySchema Schema for map keys
 * @param valueSchema Schema for map values  
 * @return Map schema
 */
public static Schema mapOf(Schema keySchema, Schema valueSchema);

/**
 * Create record schema with fields
 * @param name Record name
 * @param fields Record fields
 * @return Record schema
 * @throws IllegalArgumentException if name is null or no fields provided
 */
public static Schema recordOf(String name, Schema.Field... fields);
public static Schema recordOf(String name, Iterable<Schema.Field> fields);

/**
 * Create empty record schema (for forward references)
 * @param name Record name
 * @return Empty record schema
 */
public static Schema recordOf(String name);

/**
 * Create union schema
 * @param schemas Schemas to union
 * @return Union schema
 * @throws IllegalArgumentException if no schemas provided
 */
public static Schema unionOf(Schema... schemas);
public static Schema unionOf(Iterable<Schema> schemas);

/**
 * Create enum schema
 * @param values Enum values
 * @return Enum schema
 * @throws IllegalArgumentException if values are not unique or empty
 */
public static Schema enumWith(String... values);
public static Schema enumWith(Iterable<String> values);
public static Schema enumWith(Class<Enum<?>> enumClass);

Usage Examples:

// Array of strings
Schema stringArraySchema = Schema.arrayOf(Schema.of(Schema.Type.STRING));

// Map from string to int
Schema mapSchema = Schema.mapOf(
    Schema.of(Schema.Type.STRING), 
    Schema.of(Schema.Type.INT)
);

// Record schema
Schema personSchema = Schema.recordOf("Person",
    Schema.Field.of("name", Schema.of(Schema.Type.STRING)),
    Schema.Field.of("age", Schema.of(Schema.Type.INT)),
    Schema.Field.of("emails", Schema.arrayOf(Schema.of(Schema.Type.STRING)))
);

// Union schema
Schema stringOrInt = Schema.unionOf(
    Schema.of(Schema.Type.STRING),
    Schema.of(Schema.Type.INT)
);

// Enum schema
Schema statusSchema = Schema.enumWith("ACTIVE", "INACTIVE", "PENDING");

Schema Parsing

Parse schemas from JSON and SQL-like string representations.

/**
 * Parse schema from JSON representation
 * @param schemaJson JSON string representation
 * @return Parsed schema
 * @throws IOException if parsing fails
 */
public static Schema parseJson(String schemaJson) throws IOException;

/**
 * Parse schema from reader containing JSON
 * @param reader Reader for JSON schema
 * @return Parsed schema
 * @throws IOException if parsing fails
 */
public static Schema parseJson(Reader reader) throws IOException;

/**
 * Parse schema from SQL-like representation
 * @param schemaString SQL-like schema string
 * @return Parsed schema
 * @throws IOException if parsing fails
 */
public static Schema parseSQL(String schemaString) throws IOException;

Usage Examples:

// Parse from JSON
String jsonSchema = "{\"type\":\"record\",\"name\":\"User\"," +
                   "\"fields\":[{\"name\":\"id\",\"type\":\"long\"}," +
                   "{\"name\":\"name\",\"type\":\"string\"}]}";
Schema schema = Schema.parseJson(jsonSchema);

// Parse from SQL-like syntax
String sqlSchema = "id long, name string, active boolean";
Schema recordSchema = Schema.parseSQL(sqlSchema);

Schema Information Access

Access schema type information, structure details, and metadata.

/**
 * Get schema type
 * @return Schema type enum value
 */
public Schema.Type getType();

/**
 * Get logical type (if applicable)
 * @return Logical type or null
 */
public Schema.LogicalType getLogicalType();

// For ENUM schemas
public Set<String> getEnumValues();
public int getEnumIndex(String value);
public String getEnumValue(int idx);

// For ARRAY schemas
public Schema getComponentSchema();

// For MAP schemas  
public Map.Entry<Schema, Schema> getMapSchema();

// For RECORD schemas
public String getRecordName();
public List<Schema.Field> getFields();
public Schema.Field getField(String name);
public Schema.Field getField(String name, boolean ignoreCase);

// For UNION schemas
public List<Schema> getUnionSchemas();
public Schema getUnionSchema(int idx);

Usage Examples:

Schema recordSchema = Schema.recordOf("Person",
    Schema.Field.of("name", Schema.of(Schema.Type.STRING)),
    Schema.Field.of("age", Schema.of(Schema.Type.INT))
);

// Access record information
String recordName = recordSchema.getRecordName(); // "Person"
List<Schema.Field> fields = recordSchema.getFields();
Schema.Field nameField = recordSchema.getField("name");
Schema nameFieldSchema = nameField.getSchema(); // STRING schema

// Check schema properties
boolean isSimple = nameFieldSchema.getType().isSimpleType(); // true

Schema Validation and Compatibility

Validate schema compatibility and check for nullable types.

/**
 * Check if this schema is compatible with target schema
 * @param target Target schema to check compatibility against
 * @return true if compatible, false otherwise
 */
public boolean isCompatible(Schema target);

/**
 * Check if schema is nullable (union of null and one other type)
 * @return true if nullable union, false otherwise
 */
public boolean isNullable();

/**
 * Check if schema is nullable simple type
 * @return true if nullable simple type, false otherwise
 */
public boolean isNullableSimple();

/**
 * Check if schema is simple or nullable simple type
 * @return true if simple or nullable simple, false otherwise
 */
public boolean isSimpleOrNullableSimple();

/**
 * Get non-null schema from nullable union
 * @return Non-null schema from union
 * @throws IllegalStateException if not a nullable union
 */
public Schema getNonNullable();

Usage Examples:

Schema intSchema = Schema.of(Schema.Type.INT);
Schema nullableInt = Schema.nullableOf(intSchema);
Schema longSchema = Schema.of(Schema.Type.LONG);

// Compatibility checking
boolean compatible = intSchema.isCompatible(longSchema); // true (int -> long)

// Nullable checking
boolean isNullable = nullableInt.isNullable(); // true
Schema nonNull = nullableInt.getNonNullable(); // returns intSchema

Schema Utilities

Generate schema hashes and string representations.

/**
 * Get MD5 hash of schema
 * @return Schema hash object
 */
public SchemaHash getSchemaHash();

/**
 * Get JSON string representation of schema
 * @return JSON representation
 */
public String toString();

Usage Examples:

Schema schema = Schema.of(Schema.Type.STRING);
SchemaHash hash = schema.getSchemaHash();
String hashString = hash.toString(); // Hex representation

String jsonRepresentation = schema.toString(); // "\"string\""

Types

Schema.Type Enum

public enum Schema.Type {
    // Simple/primitive types
    NULL, BOOLEAN, INT, LONG, FLOAT, DOUBLE, BYTES, STRING,
    
    // Complex types  
    ENUM, ARRAY, MAP, RECORD, UNION;
    
    /**
     * Check if this type is a simple/primitive type
     * @return true if simple type, false if complex type
     */
    public boolean isSimpleType();
}

Schema.LogicalType Enum

public enum Schema.LogicalType {
    DATE,              // Based on INT (days since epoch)
    TIMESTAMP_MILLIS,  // Based on LONG (milliseconds since epoch)
    TIMESTAMP_MICROS,  // Based on LONG (microseconds since epoch) 
    TIME_MILLIS,       // Based on INT (milliseconds since midnight)
    TIME_MICROS;       // Based on LONG (microseconds since midnight)
    
    /**
     * Get string token for logical type
     * @return Token string
     */
    public String getToken();
    
    /**
     * Get logical type from token string
     * @param token Token string
     * @return Logical type
     * @throws IllegalArgumentException if unknown token
     */
    public static LogicalType fromToken(String token);
}

Schema.Field Class

public static final class Schema.Field {
    /**
     * Create field with name and schema
     * @param name Field name
     * @param schema Field schema
     * @return Field instance
     */
    public static Field of(String name, Schema schema);
    
    /**
     * Get field name
     * @return Field name
     */
    public String getName();
    
    /**
     * Get field schema
     * @return Field schema
     */
    public Schema getSchema();
}

SchemaHash Class

public final class SchemaHash {
    /**
     * Create hash from schema
     * @param schema Schema to hash
     */
    public SchemaHash(Schema schema);
    
    /**
     * Create hash from byte buffer
     * @param bytes Byte buffer containing hash
     */
    public SchemaHash(ByteBuffer bytes);
    
    /**
     * Get raw hash bytes
     * @return Hash as byte array
     */
    public byte[] toByteArray();
    
    /**
     * Get hex string representation
     * @return Hex string of hash
     */
    public String toString();
}

Schema Compatibility Rules

Primitive Type Compatibility

  • intlong, float, double, string
  • longfloat, double, string
  • floatdouble, string
  • doublestring
  • booleanstring
  • nullnull only
  • bytesbytes only
  • stringstring only

Complex Type Compatibility

  • Arrays: Compatible if component schemas are compatible
  • Maps: Compatible if both key and value schemas are compatible
  • Records: Compatible if all common fields (by name) have compatible schemas
  • Unions: Compatible if at least one schema pair between unions is compatible
  • Enums: Target enum must contain all values from source enum

Union Compatibility

  • Union → Union: At least one pair of schemas must be compatible
  • Union → Non-union: At least one union schema must be compatible with target
  • Non-union → Union: Source must be compatible with at least one union schema

Install with Tessl CLI

npx tessl i tessl/maven-co-cask-cdap--cdap-api-common

docs

byte-utilities.md

data-format-system.md

index.md

schema-system.md

stream-processing.md

structured-records.md

tile.json