CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-co-cask-cdap--cdap-api-common

Core API classes and utilities for CDAP application development, providing common data schema definitions, data format abstractions, stream event handling, and byte manipulation utilities

Pending
Overview
Eval results
Files

structured-records.mddocs/

Structured Records

Type-safe record instances that conform to defined schemas, providing runtime data containers with builder pattern construction and specialized accessors for date/time logical types. StructuredRecord is essential for data pipeline processing, validation, and type-safe data manipulation in CDAP applications.

Capabilities

Record Creation

Create structured records using the builder pattern with schema validation.

/**
 * Create builder for constructing structured records
 * @param schema Record schema (must be RECORD type with at least one field)
 * @return Builder instance
 * @throws UnexpectedFormatException if schema is not a valid record schema
 */
public static StructuredRecord.Builder builder(Schema schema);

Usage Example:

Schema schema = Schema.recordOf("Person",
    Schema.Field.of("name", Schema.of(Schema.Type.STRING)),
    Schema.Field.of("age", Schema.of(Schema.Type.INT)),
    Schema.Field.of("active", Schema.of(Schema.Type.BOOLEAN))
);

StructuredRecord.Builder builder = StructuredRecord.builder(schema);

Data Access

Access field data from structured records with type safety and specialized accessors.

/**
 * Get schema of the record
 * @return Record schema
 */
public Schema getSchema();

/**
 * Get value of a field (generic accessor)
 * @param fieldName Field name
 * @param <T> Expected type of field value
 * @return Field value or null
 */
public <T> T get(String fieldName);

/**
 * Get LocalDate from DATE logical type field
 * @param fieldName Date field name
 * @return LocalDate value or null
 * @throws UnexpectedFormatException if field is not DATE logical type
 */
public LocalDate getDate(String fieldName);

/**
 * Get LocalTime from TIME logical type field
 * @param fieldName Time field name  
 * @return LocalTime value or null
 * @throws UnexpectedFormatException if field is not TIME_MILLIS or TIME_MICROS
 */
public LocalTime getTime(String fieldName);

/**
 * Get ZonedDateTime with UTC timezone from TIMESTAMP logical type field
 * @param fieldName Timestamp field name
 * @return ZonedDateTime value or null
 * @throws UnexpectedFormatException if field is not TIMESTAMP logical type
 */
public ZonedDateTime getTimestamp(String fieldName);

/**
 * Get ZonedDateTime with specified timezone from TIMESTAMP logical type field
 * @param fieldName Timestamp field name
 * @param zoneId Timezone for result
 * @return ZonedDateTime value or null
 * @throws UnexpectedFormatException if field is not TIMESTAMP logical type
 */
public ZonedDateTime getTimestamp(String fieldName, ZoneId zoneId);

Usage Examples:

// Create record with date/time fields
Schema schema = Schema.recordOf("Event",
    Schema.Field.of("name", Schema.of(Schema.Type.STRING)),
    Schema.Field.of("count", Schema.of(Schema.Type.INT)),
    Schema.Field.of("eventDate", Schema.of(Schema.LogicalType.DATE)),
    Schema.Field.of("timestamp", Schema.of(Schema.LogicalType.TIMESTAMP_MILLIS))
);

StructuredRecord record = StructuredRecord.builder(schema)
    .set("name", "user_signup")
    .set("count", 42)
    .setDate("eventDate", LocalDate.of(2023, 6, 15))
    .setTimestamp("timestamp", ZonedDateTime.now())
    .build();

// Access field data
String name = record.get("name");                    // "user_signup"
Integer count = record.get("count");                 // 42
LocalDate eventDate = record.getDate("eventDate");  // 2023-06-15
ZonedDateTime timestamp = record.getTimestamp("timestamp");

// Access with specific timezone
ZonedDateTime pstTime = record.getTimestamp("timestamp", 
    ZoneId.of("America/Los_Angeles"));

Builder Pattern

Basic Field Assignment

Set field values with type checking and nullable field handling.

/**
 * Set field to given value
 * @param fieldName Field name (must exist in schema)
 * @param value Field value (type must match schema)
 * @return Builder instance for chaining
 * @throws UnexpectedFormatException if field not in schema or invalid value
 */
public StructuredRecord.Builder set(String fieldName, Object value);

Usage Examples:

Schema schema = Schema.recordOf("Product",
    Schema.Field.of("id", Schema.of(Schema.Type.LONG)),
    Schema.Field.of("name", Schema.of(Schema.Type.STRING)),
    Schema.Field.of("price", Schema.nullableOf(Schema.of(Schema.Type.DOUBLE))),
    Schema.Field.of("active", Schema.of(Schema.Type.BOOLEAN))
);

StructuredRecord product = StructuredRecord.builder(schema)
    .set("id", 12345L)
    .set("name", "Widget")
    .set("price", 29.99)           // Can be null for nullable fields
    .set("active", true)
    .build();

// Nullable field example
StructuredRecord productNoPrice = StructuredRecord.builder(schema)
    .set("id", 67890L)
    .set("name", "Gadget") 
    .set("price", null)            // Explicitly set null
    .set("active", false)
    .build();

Date and Time Field Assignment

Set date and time fields using Java 8 time types with automatic conversion to underlying storage format.

/**
 * Set DATE logical type field
 * @param fieldName Field name (must be DATE logical type)
 * @param localDate Date value
 * @return Builder instance
 * @throws UnexpectedFormatException if field is not DATE type or date too large
 */
public StructuredRecord.Builder setDate(String fieldName, LocalDate localDate);

/**
 * Set TIME logical type field  
 * @param fieldName Field name (must be TIME_MILLIS or TIME_MICROS)
 * @param localTime Time value
 * @return Builder instance
 * @throws UnexpectedFormatException if field is not TIME type or time too large
 */
public StructuredRecord.Builder setTime(String fieldName, LocalTime localTime);

/**
 * Set TIMESTAMP logical type field
 * @param fieldName Field name (must be TIMESTAMP_MILLIS or TIMESTAMP_MICROS)
 * @param zonedDateTime Timestamp value
 * @return Builder instance
 * @throws UnexpectedFormatException if field is not TIMESTAMP type or timestamp too large
 */
public StructuredRecord.Builder setTimestamp(String fieldName, ZonedDateTime zonedDateTime);

Usage Examples:

Schema eventSchema = Schema.recordOf("Event",
    Schema.Field.of("eventDate", Schema.of(Schema.LogicalType.DATE)),
    Schema.Field.of("eventTime", Schema.of(Schema.LogicalType.TIME_MILLIS)),
    Schema.Field.of("timestamp", Schema.of(Schema.LogicalType.TIMESTAMP_MILLIS))
);

LocalDate today = LocalDate.now();
LocalTime noon = LocalTime.of(12, 0, 0);
ZonedDateTime now = ZonedDateTime.now();

StructuredRecord event = StructuredRecord.builder(eventSchema)
    .setDate("eventDate", today)
    .setTime("eventTime", noon)
    .setTimestamp("timestamp", now)
    .build();

// Nullable date/time fields
Schema nullableEventSchema = Schema.recordOf("Event",
    Schema.Field.of("optionalDate", Schema.nullableOf(Schema.of(Schema.LogicalType.DATE)))
);

StructuredRecord eventWithNull = StructuredRecord.builder(nullableEventSchema)
    .setDate("optionalDate", null)  // Null value for nullable field
    .build();

String Conversion and Legacy Date Support

Convert string values to appropriate field types and handle legacy Date objects.

/**
 * Convert string to field type and set value
 * @param fieldName Field name
 * @param strVal String value to convert
 * @return Builder instance
 * @throws UnexpectedFormatException if conversion fails or field invalid
 */
public StructuredRecord.Builder convertAndSet(String fieldName, String strVal);

/**
 * Convert Date to field type and set value (deprecated)
 * @param fieldName Field name  
 * @param date Date value
 * @return Builder instance
 * @throws UnexpectedFormatException if conversion fails
 * @deprecated Use setDate, setTime, setTimestamp instead
 */
@Deprecated
public StructuredRecord.Builder convertAndSet(String fieldName, Date date);

/**
 * Convert Date with format to field type and set value (deprecated)
 * @param fieldName Field name
 * @param date Date value
 * @param dateFormat Format for string conversion
 * @return Builder instance  
 * @throws UnexpectedFormatException if conversion fails
 * @deprecated Use setDate, setTime, setTimestamp instead
 */
@Deprecated
public StructuredRecord.Builder convertAndSet(String fieldName, Date date, DateFormat dateFormat);

Usage Examples:

Schema schema = Schema.recordOf("Data",
    Schema.Field.of("id", Schema.of(Schema.Type.LONG)),
    Schema.Field.of("score", Schema.of(Schema.Type.DOUBLE)),
    Schema.Field.of("active", Schema.of(Schema.Type.BOOLEAN)),
    Schema.Field.of("name", Schema.of(Schema.Type.STRING))
);

// String conversion automatically handles type conversion
StructuredRecord record = StructuredRecord.builder(schema)
    .convertAndSet("id", "12345")          // String "12345" -> Long 12345
    .convertAndSet("score", "98.5")        // String "98.5" -> Double 98.5
    .convertAndSet("active", "true")       // String "true" -> Boolean true
    .convertAndSet("name", "John Doe")     // String -> String (no conversion)
    .build();

// Nullable field string conversion
Schema nullableSchema = Schema.recordOf("Data",
    Schema.Field.of("optionalValue", Schema.nullableOf(Schema.of(Schema.Type.INT)))
);

StructuredRecord withNull = StructuredRecord.builder(nullableSchema)
    .convertAndSet("optionalValue", null)  // null string -> null value
    .build();

Record Finalization

Build the final structured record with validation.

/**
 * Build final StructuredRecord with validation
 * @return Completed StructuredRecord
 * @throws UnexpectedFormatException if non-nullable fields missing values
 */
public StructuredRecord build();

Usage Example:

Schema schema = Schema.recordOf("User",
    Schema.Field.of("id", Schema.of(Schema.Type.LONG)),              // Required
    Schema.Field.of("name", Schema.of(Schema.Type.STRING)),          // Required  
    Schema.Field.of("email", Schema.nullableOf(Schema.of(Schema.Type.STRING))) // Optional
);

// Valid record - all required fields set
StructuredRecord validUser = StructuredRecord.builder(schema)
    .set("id", 123L)
    .set("name", "Alice")
    // email not set, but nullable so gets null value
    .build();

// Invalid record - missing required field will throw exception
try {
    StructuredRecord invalidUser = StructuredRecord.builder(schema)
        .set("id", 123L)
        // Missing required "name" field
        .build(); // Throws UnexpectedFormatException
} catch (UnexpectedFormatException e) {
    // Handle validation error
}

Type Conversion Rules

String to Type Conversion

The convertAndSet(String fieldName, String strVal) method supports automatic conversion:

  • BOOLEAN: Boolean.parseBoolean(strVal)
  • INT: Integer.parseInt(strVal)
  • LONG: Long.parseLong(strVal)
  • FLOAT: Float.parseFloat(strVal)
  • DOUBLE: Double.parseDouble(strVal)
  • BYTES: Bytes.toBytesBinary(strVal) (binary-escaped format)
  • STRING: No conversion (direct assignment)
  • NULL: Always returns null

Nullable Field Handling

  • Nullable fields (union with null) accept null values
  • Non-nullable fields throw UnexpectedFormatException for null values
  • Missing fields in build() get null for nullable fields or throw exception for required fields

Date/Time Type Storage

  • DATE: Stored as INT (days since Unix epoch, max value ~2038-01-01)
  • TIME_MILLIS: Stored as INT (milliseconds since midnight)
  • TIME_MICROS: Stored as LONG (microseconds since midnight)
  • TIMESTAMP_MILLIS: Stored as LONG (milliseconds since Unix epoch)
  • TIMESTAMP_MICROS: Stored as LONG (microseconds since Unix epoch)

Validation and Error Handling

Field Validation

  • Field names must exist in the record schema
  • Field values must be compatible with schema types
  • Null values only allowed for nullable (union with null) fields
  • Date/time values must fit within storage type ranges

Common Exceptions

// Thrown when field not in schema
throw new UnexpectedFormatException("field " + fieldName + " is not in the schema.");

// Thrown when setting null to non-nullable field  
throw new UnexpectedFormatException("field " + fieldName + " cannot be set to a null value.");

// Thrown when required field missing in build()
throw new UnexpectedFormatException("Field " + fieldName + " must contain a value.");

// Thrown when date/time value too large for storage type
throw new UnexpectedFormatException("Field " + fieldName + " was set to a date that is too large.");

Performance Considerations

  • StructuredRecord instances are immutable after construction
  • Builder pattern allows efficient field-by-field construction
  • Schema validation occurs during builder operations, not at record access time
  • Date/time conversions handle precision and timezone conversions automatically
  • Field access by name uses efficient hash-based lookup

Install with Tessl CLI

npx tessl i tessl/maven-co-cask-cdap--cdap-api-common

docs

byte-utilities.md

data-format-system.md

index.md

schema-system.md

stream-processing.md

structured-records.md

tile.json