Low-level unsafe operations and optimized data structures for Apache Spark's internal memory management and performance-critical operations.
—
Specialized data types including calendar intervals, byte array utilities, bitset operations, and date/time constants for temporal calculations. These utilities provide foundational support for time-based operations, efficient bit manipulation, and data type conversions commonly needed in data processing applications.
Calendar interval representation supporting months, days, and microseconds for time-based calculations and date arithmetic in data processing scenarios.
public final class CalendarInterval implements Serializable {
public final int months;
public final int days;
public final long microseconds;
public CalendarInterval(int months, int days, long microseconds);
public boolean equals(Object o);
public int hashCode();
public String toString();
public Period extractAsPeriod();
public Duration extractAsDuration();
}Usage Examples:
// Create calendar intervals
CalendarInterval interval1 = new CalendarInterval(2, 15, 3600000000L); // 2 months, 15 days, 1 hour
CalendarInterval interval2 = new CalendarInterval(0, 7, 0); // 1 week
CalendarInterval interval3 = new CalendarInterval(12, 0, 0); // 1 year
// Access interval components
int months = interval1.months; // 2
int days = interval1.days; // 15
long micros = interval1.microseconds; // 3600000000 (1 hour in microseconds)
// String representation
String description = interval1.toString(); // "2 months 15 days 3600000000 microseconds"
// Equality and hashing
boolean equal = interval1.equals(interval2); // false
int hash = interval1.hashCode();
// Convert to Java 8 time types
Period period = interval1.extractAsPeriod(); // P2M15D
Duration duration = interval1.extractAsDuration(); // PT1HComprehensive byte array manipulation utilities with sorting support, SQL-style operations, and padding functionality for text and binary data processing.
public final class ByteArray {
public static final byte[] EMPTY_BYTE;
public static void writeToMemory(byte[] src, Object target, long targetOffset);
public static long getPrefix(byte[] bytes);
public static int compareBinary(byte[] leftBase, byte[] rightBase);
public static byte[] subStringSQL(byte[] bytes, int pos, int len);
public static byte[] concat(byte[]... inputs);
public static byte[] lpad(byte[] bytes, int len, byte[] pad);
public static byte[] rpad(byte[] bytes, int len, byte[] pad);
}Usage Examples:
// Empty byte array constant
byte[] empty = ByteArray.EMPTY_BYTE; // []
// Write bytes to memory
byte[] source = "Hello World".getBytes();
byte[] target = new byte[source.length];
ByteArray.writeToMemory(source, target, Platform.BYTE_ARRAY_OFFSET);
// Get sorting prefix (first 8 bytes as long)
byte[] data = "This is a long string for testing".getBytes();
long prefix = ByteArray.getPrefix(data); // First 8 bytes as long for sorting
// Binary comparison
byte[] array1 = "apple".getBytes();
byte[] array2 = "banana".getBytes();
int comparison = ByteArray.compareBinary(array1, array2); // negative (apple < banana)
// SQL-style substring (1-indexed)
byte[] text = "Hello World".getBytes();
byte[] substring = ByteArray.subStringSQL(text, 7, 5); // "World" (start at pos 7, length 5)
// Concatenation
byte[] part1 = "Hello".getBytes();
byte[] part2 = " ".getBytes();
byte[] part3 = "World".getBytes();
byte[] combined = ByteArray.concat(part1, part2, part3); // "Hello World"
// Padding operations
byte[] shortText = "Hi".getBytes();
byte[] leftPadded = ByteArray.lpad(shortText, 10, "*".getBytes()); // "********Hi"
byte[] rightPadded = ByteArray.rpad(shortText, 10, "-".getBytes()); // "Hi--------"High-performance bitset operations on fixed-size uncompressed bitsets with word-aligned data, optimized for columnar data processing and efficient bit manipulation.
public final class BitSetMethods {
public static void set(Object baseObject, long baseOffset, int index);
public static void unset(Object baseObject, long baseOffset, int index);
public static boolean isSet(Object baseObject, long baseOffset, int index);
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInWords);
public static int nextSetBit(Object baseObject, long baseOffset, int fromIndex, int bitsetSizeInWords);
}Usage Examples:
// Create bitset storage (8 bytes = 64 bits)
long[] bitsetStorage = new long[4]; // 256 bits total
Object baseObj = bitsetStorage;
long baseOffset = Platform.LONG_ARRAY_OFFSET;
// Set individual bits
BitSetMethods.set(baseObj, baseOffset, 0); // Set bit 0
BitSetMethods.set(baseObj, baseOffset, 5); // Set bit 5
BitSetMethods.set(baseObj, baseOffset, 64); // Set bit 64 (second word)
// Check bits
boolean bit0Set = BitSetMethods.isSet(baseObj, baseOffset, 0); // true
boolean bit1Set = BitSetMethods.isSet(baseObj, baseOffset, 1); // false
boolean bit5Set = BitSetMethods.isSet(baseObj, baseOffset, 5); // true
// Unset bits
BitSetMethods.unset(baseObj, baseOffset, 5);
boolean bit5After = BitSetMethods.isSet(baseObj, baseOffset, 5); // false
// Check if any bits are set
boolean hasAnyBits = BitSetMethods.anySet(baseObj, baseOffset, 4); // true (bit 0 and 64 still set)
// Find next set bit
int nextBit = BitSetMethods.nextSetBit(baseObj, baseOffset, 0, 4); // 0 (first set bit)
int afterBit0 = BitSetMethods.nextSetBit(baseObj, baseOffset, 1, 4); // 64 (next set bit after 0)Comprehensive date and time constants for calendar calculations, time unit conversions, and temporal arithmetic operations.
public class DateTimeConstants {
// Calendar constants
public static final int MONTHS_PER_YEAR = 12;
public static final byte DAYS_PER_WEEK = 7;
public static final long HOURS_PER_DAY = 24L;
public static final long MINUTES_PER_HOUR = 60L;
public static final long SECONDS_PER_MINUTE = 60L;
// Computed time constants
public static final long SECONDS_PER_HOUR; // 3600
public static final long SECONDS_PER_DAY; // 86400
// Millisecond conversions
public static final long MILLIS_PER_SECOND = 1000L;
public static final long MILLIS_PER_MINUTE; // 60000
public static final long MILLIS_PER_HOUR; // 3600000
public static final long MILLIS_PER_DAY; // 86400000
// Microsecond conversions
public static final long MICROS_PER_MILLIS = 1000L;
public static final long MICROS_PER_SECOND; // 1000000
public static final long MICROS_PER_MINUTE; // 60000000
public static final long MICROS_PER_HOUR; // 3600000000
public static final long MICROS_PER_DAY; // 86400000000
// Nanosecond conversions
public static final long NANOS_PER_MICROS = 1000L;
public static final long NANOS_PER_MILLIS; // 1000000
public static final long NANOS_PER_SECOND; // 1000000000
}Usage Examples:
// Time unit calculations
long hoursInWeek = DateTimeConstants.DAYS_PER_WEEK * DateTimeConstants.HOURS_PER_DAY; // 168
long secondsInDay = DateTimeConstants.SECONDS_PER_DAY; // 86400
// Convert between time units
long currentTimeMillis = System.currentTimeMillis();
long currentTimeMicros = currentTimeMillis * DateTimeConstants.MICROS_PER_MILLIS;
long currentTimeNanos = currentTimeMillis * DateTimeConstants.NANOS_PER_MILLIS;
// Duration calculations
long durationMinutes = 45;
long durationMillis = durationMinutes * DateTimeConstants.MILLIS_PER_MINUTE; // 2700000
long durationMicros = durationMinutes * DateTimeConstants.MICROS_PER_MINUTE; // 2700000000
// Calendar arithmetic
int totalDaysInYear = DateTimeConstants.MONTHS_PER_YEAR * 30; // Approximate
long totalSecondsInYear = totalDaysInYear * DateTimeConstants.SECONDS_PER_DAY;
// Timestamp precision conversion
long timestampNanos = 1234567890123456789L;
long timestampMicros = timestampNanos / DateTimeConstants.NANOS_PER_MICROS;
long timestampMillis = timestampNanos / DateTimeConstants.NANOS_PER_MILLIS;
long timestampSeconds = timestampNanos / DateTimeConstants.NANOS_PER_SECOND;Common patterns for working with temporal data using the date/time constants and calendar intervals in data processing scenarios.
Temporal Aggregation Example:
// Process time-series data with different granularities
public class TemporalProcessor {
public long roundToMinute(long timestampMicros) {
long microsPerMinute = DateTimeConstants.MICROS_PER_MINUTE;
return (timestampMicros / microsPerMinute) * microsPerMinute;
}
public long roundToHour(long timestampMicros) {
long microsPerHour = DateTimeConstants.MICROS_PER_HOUR;
return (timestampMicros / microsPerHour) * microsPerHour;
}
public long roundToDay(long timestampMicros) {
long microsPerDay = DateTimeConstants.MICROS_PER_DAY;
return (timestampMicros / microsPerDay) * microsPerDay;
}
public CalendarInterval calculateAge(long birthTimeMicros, long currentTimeMicros) {
long ageMicros = currentTimeMicros - birthTimeMicros;
// Convert to approximate months (30 days each)
long microsPerMonth = 30L * DateTimeConstants.MICROS_PER_DAY;
int months = (int) (ageMicros / microsPerMonth);
ageMicros %= microsPerMonth;
// Convert remaining to days
int days = (int) (ageMicros / DateTimeConstants.MICROS_PER_DAY);
ageMicros %= DateTimeConstants.MICROS_PER_DAY;
return new CalendarInterval(months, days, ageMicros);
}
}Advanced binary data processing using byte array utilities for data serialization, comparison, and formatting operations.
Binary Data Processing Example:
// Process binary log data with byte array utilities
public class BinaryLogProcessor {
public boolean isValidLogEntry(byte[] logEntry) {
// Check for minimum length and magic header
if (logEntry.length < 16) return false;
byte[] magicHeader = {(byte)0xCA, (byte)0xFE, (byte)0xBA, (byte)0xBE};
return ByteArrayMethods.startsWith(logEntry, magicHeader);
}
public byte[] extractPayload(byte[] logEntry) {
// Skip 16-byte header, extract rest
return ByteArray.subStringSQL(logEntry, 17, logEntry.length - 16);
}
public byte[] formatLogEntry(byte[] payload, int entryType) {
// Create formatted log entry with header
byte[] header = new byte[16];
header[0] = (byte)0xCA; header[1] = (byte)0xFE;
header[2] = (byte)0xBA; header[3] = (byte)0xBE;
// Write entry type and payload length
Platform.putInt(header, Platform.BYTE_ARRAY_OFFSET + 4, entryType);
Platform.putInt(header, Platform.BYTE_ARRAY_OFFSET + 8, payload.length);
return ByteArray.concat(header, payload);
}
public int compareLogEntries(byte[] entry1, byte[] entry2) {
// Extract timestamps from entries for comparison
long prefix1 = ByteArray.getPrefix(entry1);
long prefix2 = ByteArray.getPrefix(entry2);
return Long.compare(prefix1, prefix2);
}
}Efficient bitset operations for columnar data processing, null value tracking, and selective data access patterns common in analytical workloads.
Columnar Processing Example:
// Process columnar data with bitset for null tracking
public class ColumnarProcessor {
private final Object nullBitset;
private final long nullBitsetOffset;
private final int numRows;
public ColumnarProcessor(int numRows) {
this.numRows = numRows;
// Allocate bitset (1 bit per row, rounded up to words)
int wordsNeeded = (numRows + 63) / 64; // Round up to nearest 64-bit word
long[] storage = new long[wordsNeeded];
this.nullBitset = storage;
this.nullBitsetOffset = Platform.LONG_ARRAY_OFFSET;
}
public void markNull(int rowIndex) {
BitSetMethods.set(nullBitset, nullBitsetOffset, rowIndex);
}
public boolean isNull(int rowIndex) {
return BitSetMethods.isSet(nullBitset, nullBitsetOffset, rowIndex);
}
public boolean hasAnyNulls() {
int wordsNeeded = (numRows + 63) / 64;
return BitSetMethods.anySet(nullBitset, nullBitsetOffset, wordsNeeded);
}
public int findNextNull(int fromIndex) {
int wordsNeeded = (numRows + 63) / 64;
return BitSetMethods.nextSetBit(nullBitset, nullBitsetOffset, fromIndex, wordsNeeded);
}
public int countNulls() {
int count = 0;
for (int i = 0; i < numRows; i++) {
if (isNull(i)) count++;
}
return count;
}
}Install with Tessl CLI
npx tessl i tessl/maven-org-apache-spark--spark-unsafe_2-13