or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-unsafe_2.13@3.5.x

docs

array-operations.mdbitset-operations.mdbyte-array-utilities.mddata-types-utilities.mdhash-functions.mdindex.mdkv-iterator.mdmemory-management.mdplatform-operations.mdutf8-string-processing.md
tile.json

tessl/maven-org-apache-spark--spark-unsafe_2-13

tessl install tessl/maven-org-apache-spark--spark-unsafe_2-13@3.5.0

Low-level unsafe operations and optimized data structures for Apache Spark's internal memory management and performance-critical operations.

array-operations.mddocs/

Array Operations

Optimized array implementations and manipulation utilities including long arrays supporting both on-heap and off-heap memory, and byte array operations with pattern matching. These operations provide high-performance array processing capabilities that bypass Java's bounds checking for maximum speed.

Capabilities

Long Array Implementation

High-performance long array implementation supporting both on-heap and off-heap memory without bounds checking, designed for maximum performance in memory-constrained scenarios.

public final class LongArray {
    public LongArray(MemoryBlock memory);
    public MemoryBlock memoryBlock();
    public Object getBaseObject();
    public long getBaseOffset();
    public long size();
    public void zeroOut();
    public void set(int index, long value);
    public long get(int index);
}

Usage Examples:

// Create long array from memory block
MemoryAllocator allocator = MemoryAllocator.HEAP;
MemoryBlock block = allocator.allocate(8 * 100); // 100 longs
LongArray longArray = new LongArray(block);

// Access array properties
long elementCount = longArray.size();        // 100
Object baseObj = longArray.getBaseObject();  // underlying object
long baseOffset = longArray.getBaseOffset(); // base offset

// Initialize array with zeros
longArray.zeroOut();

// Set and get values (no bounds checking for performance)
longArray.set(0, 12345L);
longArray.set(99, 67890L);
long firstValue = longArray.get(0);   // 12345L
long lastValue = longArray.get(99);   // 67890L

// Access underlying memory block
MemoryBlock underlyingBlock = longArray.memoryBlock();

Byte Array Utilities

Comprehensive byte array manipulation utilities including optimized equality checking, pattern matching, and mathematical operations for high-performance byte processing.

public class ByteArrayMethods {
    public static final int MAX_ROUNDED_ARRAY_LENGTH;
    
    public static long nextPowerOf2(long num);
    public static int roundNumberOfBytesToNearestWord(int numBytes);
    public static long roundNumberOfBytesToNearestWord(long numBytes);
    public static boolean arrayEquals(Object leftBase, long leftOffset, 
                                    Object rightBase, long rightOffset, 
                                    long length);
    public static boolean contains(byte[] arr, byte[] sub);
    public static boolean startsWith(byte[] array, byte[] target);
    public static boolean endsWith(byte[] array, byte[] target);
    public static boolean matchAt(byte[] arr, byte[] sub, int pos);
}

Usage Examples:

// Mathematical utilities
long nextPower = ByteArrayMethods.nextPowerOf2(100);  // 128
int wordAligned = ByteArrayMethods.roundNumberOfBytesToNearestWord(13);  // 16

// Array comparison (optimized for large arrays)
byte[] array1 = "Hello World".getBytes();
byte[] array2 = "Hello World".getBytes();
boolean equal = ByteArrayMethods.arrayEquals(
    array1, Platform.BYTE_ARRAY_OFFSET,
    array2, Platform.BYTE_ARRAY_OFFSET,
    array1.length
);  // true

// Pattern matching operations
byte[] text = "The quick brown fox jumps over the lazy dog".getBytes();
byte[] pattern = "quick".getBytes();

boolean hasPattern = ByteArrayMethods.contains(text, pattern);  // true
boolean startsWithThe = ByteArrayMethods.startsWith(text, "The".getBytes());  // true
boolean endsWithDog = ByteArrayMethods.endsWith(text, "dog".getBytes());  // true
boolean matchesAt4 = ByteArrayMethods.matchAt(text, pattern, 4);  // true

// Maximum safe array length
int maxLength = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH;

Memory-Efficient Array Operations

Direct memory access operations for arrays that work with both on-heap objects and off-heap memory addresses, enabling zero-copy operations and efficient data processing.

Memory Layout Optimization:

// Word alignment for optimal memory access
int unalignedSize = 13;
int alignedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(unalignedSize);
// alignedSize = 16 (rounded up to nearest 8-byte boundary)

// Power of 2 sizing for hash tables and buffers
long desiredSize = 1000;
long optimalSize = ByteArrayMethods.nextPowerOf2(desiredSize);  // 1024

// Create optimally sized arrays
MemoryAllocator allocator = MemoryAllocator.UNSAFE;
MemoryBlock optimizedBlock = allocator.allocate(optimalSize);

High-Performance Array Comparison:

// Compare memory regions directly (on-heap to off-heap)
byte[] heapArray = "Large data set".getBytes();
long offHeapAddress = Platform.allocateMemory(heapArray.length);
Platform.copyMemory(heapArray, Platform.BYTE_ARRAY_OFFSET, 
                   null, offHeapAddress, heapArray.length);

// Direct memory comparison without copying
boolean identical = ByteArrayMethods.arrayEquals(
    heapArray, Platform.BYTE_ARRAY_OFFSET,  // on-heap source
    null, offHeapAddress,                   // off-heap target
    heapArray.length
);

Platform.freeMemory(offHeapAddress);

Pattern Matching for Text Processing

Optimized substring search and pattern matching operations designed for text processing scenarios in data analytics and string manipulation tasks.

Complex Pattern Matching:

// Process log data with pattern matching
String logLine = "2023-10-15 14:30:22 ERROR Failed to process request ID 12345";
byte[] logBytes = logLine.getBytes();

// Check for different log levels
boolean isError = ByteArrayMethods.contains(logBytes, "ERROR".getBytes());
boolean isWarning = ByteArrayMethods.contains(logBytes, "WARNING".getBytes());
boolean isInfo = ByteArrayMethods.contains(logBytes, "INFO".getBytes());

// Extract timestamp prefix
boolean hasTimestamp = ByteArrayMethods.startsWith(logBytes, "2023".getBytes());

// Check for specific error patterns
byte[] errorPattern = "Failed to process".getBytes();
boolean hasProcessingError = ByteArrayMethods.contains(logBytes, errorPattern);

// Find pattern at specific position
if (ByteArrayMethods.matchAt(logBytes, "ERROR".getBytes(), 20)) {
    // ERROR found at expected position 20
    System.out.println("Error at expected position");
}

Array Factory Methods and Integration

Integration utilities for creating arrays from existing data structures and interfacing with Java's standard collections and memory management systems.

Long Array from Standard Arrays:

// Create LongArray from standard Java long array
long[] standardArray = {1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L};
MemoryBlock arrayBlock = MemoryBlock.fromLongArray(standardArray);
LongArray sparkArray = new LongArray(arrayBlock);

// Both arrays now share the same underlying memory
sparkArray.set(0, 100L);
System.out.println(standardArray[0]);  // 100L (modified through LongArray)

// Access statistics
long totalElements = sparkArray.size();  // 8
long totalBytes = arrayBlock.size();     // 64 (8 * 8 bytes per long)

Performance-Critical Array Operations

Operations designed specifically for performance-critical code paths where bounds checking and safety mechanisms are deliberately bypassed for maximum speed.

Unsafe Array Access Patterns:

// High-performance array processing without bounds checking
MemoryBlock block = MemoryAllocator.HEAP.allocate(8 * 1000000); // 1M longs
LongArray bigArray = new LongArray(block);

// Bulk initialization (no bounds checking)
for (int i = 0; i < bigArray.size(); i++) {
    bigArray.set(i, i * 2L);
}

// Bulk processing (maximum performance)
long sum = 0;
for (int i = 0; i < bigArray.size(); i++) {
    sum += bigArray.get(i);
}

// Zero out array efficiently
bigArray.zeroOut();  // Optimized bulk zero fill

Memory Region Operations:

// Process large byte arrays with minimal overhead
byte[] largeDataset = new byte[1024 * 1024]; // 1MB
byte[] searchPattern = "target_pattern".getBytes();

// Fill with test data
Arrays.fill(largeDataset, (byte) 'A');
System.arraycopy(searchPattern, 0, largeDataset, 1000, searchPattern.length);

// High-performance search across entire dataset
boolean found = ByteArrayMethods.contains(largeDataset, searchPattern);  // true

// Memory-efficient comparison of large regions
byte[] backup = largeDataset.clone();
boolean identical = ByteArrayMethods.arrayEquals(
    largeDataset, Platform.BYTE_ARRAY_OFFSET,
    backup, Platform.BYTE_ARRAY_OFFSET,
    largeDataset.length
);  // true

Iterator Pattern Support

While the module includes KVIterator as an abstract base class, it provides a foundation for implementing custom array-based iterators with proper resource management.

public abstract class KVIterator<K, V> {
    public abstract boolean next() throws IOException;
    public abstract K getKey();
    public abstract V getValue();
    public abstract void close();
}

Usage Pattern Example:

// Example implementation for array-based key-value iteration
public class LongArrayIterator extends KVIterator<Integer, Long> {
    private final LongArray array;
    private int currentIndex = -1;
    
    public LongArrayIterator(LongArray array) {
        this.array = array;
    }
    
    @Override
    public boolean next() {
        currentIndex++;
        return currentIndex < array.size();
    }
    
    @Override
    public Integer getKey() {
        return currentIndex;
    }
    
    @Override
    public Long getValue() {
        return array.get(currentIndex);
    }
    
    @Override
    public void close() {
        // Cleanup resources if needed
    }
}

// Usage
LongArray data = new LongArray(MemoryAllocator.HEAP.allocate(8 * 10));
// ... populate data ...

try (LongArrayIterator iterator = new LongArrayIterator(data)) {
    while (iterator.next()) {
        Integer index = iterator.getKey();
        Long value = iterator.getValue();
        System.out.println("Index " + index + ": " + value);
    }
}