CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-spark--spark-unsafe_2-13

Low-level unsafe operations and optimized data structures for Apache Spark's internal memory management and performance-critical operations.

Pending
Overview
Eval results
Files

data-types-utilities.mddocs/

Data Types and Utilities

Specialized data types including calendar intervals, byte array utilities, bitset operations, and date/time constants for temporal calculations. These utilities provide foundational support for time-based operations, efficient bit manipulation, and data type conversions commonly needed in data processing applications.

Capabilities

Calendar Interval Support

Calendar interval representation supporting months, days, and microseconds for time-based calculations and date arithmetic in data processing scenarios.

public final class CalendarInterval implements Serializable {
    public final int months;
    public final int days;
    public final long microseconds;
    
    public CalendarInterval(int months, int days, long microseconds);
    public boolean equals(Object o);
    public int hashCode();
    public String toString();
    public Period extractAsPeriod();
    public Duration extractAsDuration();
}

Usage Examples:

// Create calendar intervals
CalendarInterval interval1 = new CalendarInterval(2, 15, 3600000000L); // 2 months, 15 days, 1 hour
CalendarInterval interval2 = new CalendarInterval(0, 7, 0);            // 1 week
CalendarInterval interval3 = new CalendarInterval(12, 0, 0);           // 1 year

// Access interval components
int months = interval1.months;        // 2
int days = interval1.days;           // 15
long micros = interval1.microseconds; // 3600000000 (1 hour in microseconds)

// String representation
String description = interval1.toString(); // "2 months 15 days 3600000000 microseconds"

// Equality and hashing
boolean equal = interval1.equals(interval2);  // false
int hash = interval1.hashCode();

// Convert to Java 8 time types
Period period = interval1.extractAsPeriod();     // P2M15D
Duration duration = interval1.extractAsDuration(); // PT1H

Byte Array Processing Utilities

Comprehensive byte array manipulation utilities with sorting support, SQL-style operations, and padding functionality for text and binary data processing.

public final class ByteArray {
    public static final byte[] EMPTY_BYTE;
    
    public static void writeToMemory(byte[] src, Object target, long targetOffset);
    public static long getPrefix(byte[] bytes);
    public static int compareBinary(byte[] leftBase, byte[] rightBase);
    public static byte[] subStringSQL(byte[] bytes, int pos, int len);
    public static byte[] concat(byte[]... inputs);
    public static byte[] lpad(byte[] bytes, int len, byte[] pad);
    public static byte[] rpad(byte[] bytes, int len, byte[] pad);
}

Usage Examples:

// Empty byte array constant
byte[] empty = ByteArray.EMPTY_BYTE;  // []

// Write bytes to memory
byte[] source = "Hello World".getBytes();
byte[] target = new byte[source.length];
ByteArray.writeToMemory(source, target, Platform.BYTE_ARRAY_OFFSET);

// Get sorting prefix (first 8 bytes as long)
byte[] data = "This is a long string for testing".getBytes();
long prefix = ByteArray.getPrefix(data);  // First 8 bytes as long for sorting

// Binary comparison
byte[] array1 = "apple".getBytes();
byte[] array2 = "banana".getBytes();
int comparison = ByteArray.compareBinary(array1, array2);  // negative (apple < banana)

// SQL-style substring (1-indexed)
byte[] text = "Hello World".getBytes();
byte[] substring = ByteArray.subStringSQL(text, 7, 5);  // "World" (start at pos 7, length 5)

// Concatenation
byte[] part1 = "Hello".getBytes();
byte[] part2 = " ".getBytes();
byte[] part3 = "World".getBytes();
byte[] combined = ByteArray.concat(part1, part2, part3);  // "Hello World"

// Padding operations
byte[] shortText = "Hi".getBytes();
byte[] leftPadded = ByteArray.lpad(shortText, 10, "*".getBytes());   // "********Hi"
byte[] rightPadded = ByteArray.rpad(shortText, 10, "-".getBytes());  // "Hi--------"

Bitset Operations

High-performance bitset operations on fixed-size uncompressed bitsets with word-aligned data, optimized for columnar data processing and efficient bit manipulation.

public final class BitSetMethods {
    public static void set(Object baseObject, long baseOffset, int index);
    public static void unset(Object baseObject, long baseOffset, int index);
    public static boolean isSet(Object baseObject, long baseOffset, int index);
    public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInWords);
    public static int nextSetBit(Object baseObject, long baseOffset, int fromIndex, int bitsetSizeInWords);
}

Usage Examples:

// Create bitset storage (8 bytes = 64 bits)
long[] bitsetStorage = new long[4];  // 256 bits total
Object baseObj = bitsetStorage;
long baseOffset = Platform.LONG_ARRAY_OFFSET;

// Set individual bits
BitSetMethods.set(baseObj, baseOffset, 0);    // Set bit 0
BitSetMethods.set(baseObj, baseOffset, 5);    // Set bit 5  
BitSetMethods.set(baseObj, baseOffset, 64);   // Set bit 64 (second word)

// Check bits
boolean bit0Set = BitSetMethods.isSet(baseObj, baseOffset, 0);   // true
boolean bit1Set = BitSetMethods.isSet(baseObj, baseOffset, 1);   // false
boolean bit5Set = BitSetMethods.isSet(baseObj, baseOffset, 5);   // true

// Unset bits
BitSetMethods.unset(baseObj, baseOffset, 5);
boolean bit5After = BitSetMethods.isSet(baseObj, baseOffset, 5); // false

// Check if any bits are set
boolean hasAnyBits = BitSetMethods.anySet(baseObj, baseOffset, 4); // true (bit 0 and 64 still set)

// Find next set bit
int nextBit = BitSetMethods.nextSetBit(baseObj, baseOffset, 0, 4);  // 0 (first set bit)
int afterBit0 = BitSetMethods.nextSetBit(baseObj, baseOffset, 1, 4); // 64 (next set bit after 0)

Date and Time Constants

Comprehensive date and time constants for calendar calculations, time unit conversions, and temporal arithmetic operations.

public class DateTimeConstants {
    // Calendar constants
    public static final int MONTHS_PER_YEAR = 12;
    public static final byte DAYS_PER_WEEK = 7;
    public static final long HOURS_PER_DAY = 24L;
    public static final long MINUTES_PER_HOUR = 60L;
    public static final long SECONDS_PER_MINUTE = 60L;
    
    // Computed time constants
    public static final long SECONDS_PER_HOUR;      // 3600
    public static final long SECONDS_PER_DAY;       // 86400
    
    // Millisecond conversions
    public static final long MILLIS_PER_SECOND = 1000L;
    public static final long MILLIS_PER_MINUTE;     // 60000
    public static final long MILLIS_PER_HOUR;       // 3600000
    public static final long MILLIS_PER_DAY;        // 86400000
    
    // Microsecond conversions
    public static final long MICROS_PER_MILLIS = 1000L;
    public static final long MICROS_PER_SECOND;     // 1000000
    public static final long MICROS_PER_MINUTE;     // 60000000
    public static final long MICROS_PER_HOUR;       // 3600000000
    public static final long MICROS_PER_DAY;        // 86400000000
    
    // Nanosecond conversions
    public static final long NANOS_PER_MICROS = 1000L;
    public static final long NANOS_PER_MILLIS;      // 1000000
    public static final long NANOS_PER_SECOND;      // 1000000000
}

Usage Examples:

// Time unit calculations
long hoursInWeek = DateTimeConstants.DAYS_PER_WEEK * DateTimeConstants.HOURS_PER_DAY;  // 168
long secondsInDay = DateTimeConstants.SECONDS_PER_DAY;  // 86400

// Convert between time units
long currentTimeMillis = System.currentTimeMillis();
long currentTimeMicros = currentTimeMillis * DateTimeConstants.MICROS_PER_MILLIS;
long currentTimeNanos = currentTimeMillis * DateTimeConstants.NANOS_PER_MILLIS;

// Duration calculations
long durationMinutes = 45;
long durationMillis = durationMinutes * DateTimeConstants.MILLIS_PER_MINUTE;  // 2700000
long durationMicros = durationMinutes * DateTimeConstants.MICROS_PER_MINUTE;  // 2700000000

// Calendar arithmetic
int totalDaysInYear = DateTimeConstants.MONTHS_PER_YEAR * 30;  // Approximate
long totalSecondsInYear = totalDaysInYear * DateTimeConstants.SECONDS_PER_DAY;

// Timestamp precision conversion
long timestampNanos = 1234567890123456789L;
long timestampMicros = timestampNanos / DateTimeConstants.NANOS_PER_MICROS;
long timestampMillis = timestampNanos / DateTimeConstants.NANOS_PER_MILLIS;
long timestampSeconds = timestampNanos / DateTimeConstants.NANOS_PER_SECOND;

Time-Based Data Processing

Common patterns for working with temporal data using the date/time constants and calendar intervals in data processing scenarios.

Temporal Aggregation Example:

// Process time-series data with different granularities
public class TemporalProcessor {
    
    public long roundToMinute(long timestampMicros) {
        long microsPerMinute = DateTimeConstants.MICROS_PER_MINUTE;
        return (timestampMicros / microsPerMinute) * microsPerMinute;
    }
    
    public long roundToHour(long timestampMicros) {
        long microsPerHour = DateTimeConstants.MICROS_PER_HOUR;
        return (timestampMicros / microsPerHour) * microsPerHour;
    }
    
    public long roundToDay(long timestampMicros) {
        long microsPerDay = DateTimeConstants.MICROS_PER_DAY;
        return (timestampMicros / microsPerDay) * microsPerDay;
    }
    
    public CalendarInterval calculateAge(long birthTimeMicros, long currentTimeMicros) {
        long ageMicros = currentTimeMicros - birthTimeMicros;
        
        // Convert to approximate months (30 days each)
        long microsPerMonth = 30L * DateTimeConstants.MICROS_PER_DAY;
        int months = (int) (ageMicros / microsPerMonth);
        ageMicros %= microsPerMonth;
        
        // Convert remaining to days
        int days = (int) (ageMicros / DateTimeConstants.MICROS_PER_DAY);
        ageMicros %= DateTimeConstants.MICROS_PER_DAY;
        
        return new CalendarInterval(months, days, ageMicros);
    }
}

Binary Data Processing Utilities

Advanced binary data processing using byte array utilities for data serialization, comparison, and formatting operations.

Binary Data Processing Example:

// Process binary log data with byte array utilities
public class BinaryLogProcessor {
    
    public boolean isValidLogEntry(byte[] logEntry) {
        // Check for minimum length and magic header
        if (logEntry.length < 16) return false;
        
        byte[] magicHeader = {(byte)0xCA, (byte)0xFE, (byte)0xBA, (byte)0xBE};
        return ByteArrayMethods.startsWith(logEntry, magicHeader);
    }
    
    public byte[] extractPayload(byte[] logEntry) {
        // Skip 16-byte header, extract rest
        return ByteArray.subStringSQL(logEntry, 17, logEntry.length - 16);
    }
    
    public byte[] formatLogEntry(byte[] payload, int entryType) {
        // Create formatted log entry with header
        byte[] header = new byte[16];
        header[0] = (byte)0xCA; header[1] = (byte)0xFE;
        header[2] = (byte)0xBA; header[3] = (byte)0xBE;
        
        // Write entry type and payload length
        Platform.putInt(header, Platform.BYTE_ARRAY_OFFSET + 4, entryType);
        Platform.putInt(header, Platform.BYTE_ARRAY_OFFSET + 8, payload.length);
        
        return ByteArray.concat(header, payload);
    }
    
    public int compareLogEntries(byte[] entry1, byte[] entry2) {
        // Extract timestamps from entries for comparison
        long prefix1 = ByteArray.getPrefix(entry1);
        long prefix2 = ByteArray.getPrefix(entry2);
        return Long.compare(prefix1, prefix2);
    }
}

Columnar Data Bitset Processing

Efficient bitset operations for columnar data processing, null value tracking, and selective data access patterns common in analytical workloads.

Columnar Processing Example:

// Process columnar data with bitset for null tracking
public class ColumnarProcessor {
    private final Object nullBitset;
    private final long nullBitsetOffset;
    private final int numRows;
    
    public ColumnarProcessor(int numRows) {
        this.numRows = numRows;
        // Allocate bitset (1 bit per row, rounded up to words)
        int wordsNeeded = (numRows + 63) / 64;  // Round up to nearest 64-bit word
        long[] storage = new long[wordsNeeded];
        this.nullBitset = storage;
        this.nullBitsetOffset = Platform.LONG_ARRAY_OFFSET;
    }
    
    public void markNull(int rowIndex) {
        BitSetMethods.set(nullBitset, nullBitsetOffset, rowIndex);
    }
    
    public boolean isNull(int rowIndex) {
        return BitSetMethods.isSet(nullBitset, nullBitsetOffset, rowIndex);
    }
    
    public boolean hasAnyNulls() {
        int wordsNeeded = (numRows + 63) / 64;
        return BitSetMethods.anySet(nullBitset, nullBitsetOffset, wordsNeeded);
    }
    
    public int findNextNull(int fromIndex) {
        int wordsNeeded = (numRows + 63) / 64;
        return BitSetMethods.nextSetBit(nullBitset, nullBitsetOffset, fromIndex, wordsNeeded);
    }
    
    public int countNulls() {
        int count = 0;
        for (int i = 0; i < numRows; i++) {
            if (isNull(i)) count++;
        }
        return count;
    }
}

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-spark--spark-unsafe_2-13

docs

array-operations.md

bitset-operations.md

byte-array-utilities.md

data-types-utilities.md

hash-functions.md

index.md

kv-iterator.md

memory-management.md

platform-operations.md

utf8-string-processing.md

tile.json