Apache Spark Unsafe module provides low-level unsafe operations for memory management, array operations, bitset operations, hash functions, and high-performance data types.
npx @tessl/cli install tessl/maven-org-apache-spark--spark-unsafe_2-13@4.0.0Apache Spark Unsafe module provides low-level unsafe operations for memory management, array operations, bitset operations, hash functions, and high-performance data types. It enables direct memory access through sun.misc.Unsafe, offering high-performance array operations, bitset manipulations, memory allocation strategies, hash function implementations, and optimized data types like UTF8String and CalendarInterval used throughout the Spark engine.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-unsafe_2.13</artifactId>
<version>4.0.0</version>
</dependency>import org.apache.spark.unsafe.Platform;
import org.apache.spark.unsafe.types.UTF8String;
import org.apache.spark.unsafe.memory.MemoryBlock;
import org.apache.spark.unsafe.memory.MemoryAllocator;import org.apache.spark.unsafe.Platform;
import org.apache.spark.unsafe.types.UTF8String;
import org.apache.spark.unsafe.memory.MemoryAllocator;
import org.apache.spark.unsafe.memory.MemoryBlock;
// Memory allocation and management
MemoryAllocator allocator = MemoryAllocator.UNSAFE;
MemoryBlock block = allocator.allocate(1024);
// Direct memory access
Platform.putLong(block.getBaseObject(), block.getBaseOffset(), 42L);
long value = Platform.getLong(block.getBaseObject(), block.getBaseOffset());
// UTF-8 string operations
UTF8String str1 = UTF8String.fromString("Hello");
UTF8String str2 = UTF8String.fromString(" World");
UTF8String result = UTF8String.concat(str1, str2);
// Clean up
allocator.free(block);The module is organized into several key packages providing different aspects of low-level functionality:
Direct memory access and platform-specific operations using sun.misc.Unsafe for maximum performance in big data processing scenarios.
public static int getInt(Object object, long offset);
public static void putInt(Object object, long offset, int value);
public static long allocateMemory(long size);
public static void freeMemory(long address);
public static void copyMemory(Object src, long srcOffset, Object dst, long dstOffset, long length);
public static boolean unaligned();Memory allocation and management supporting both heap and off-heap memory with object pooling for large allocations and debugging capabilities.
public abstract MemoryBlock allocate(long size) throws OutOfMemoryError;
public abstract void free(MemoryBlock memory);
public MemoryBlock(Object obj, long offset, long length);
public long size();
public void fill(byte value);Optimized byte and long array operations supporting both on-heap and off-heap memory without bounds checking for maximum performance.
public static boolean arrayEquals(Object leftBase, long leftOffset, Object rightBase, long rightOffset, long length);
public static long nextPowerOf2(long num);
public static int roundNumberOfBytesToNearestWord(int numBytes);
public void set(int index, long value);
public long get(int index);Comprehensive UTF-8 string manipulation capabilities with extensive string operations, collation support, and optimized storage for internal Spark use.
public static UTF8String fromString(String str);
public static UTF8String concat(UTF8String... inputs);
public int numBytes();
public int numChars();
public UTF8String substring(int start, int until);
public boolean contains(UTF8String substring);
public UTF8String toUpperCase();
public UTF8String trim();Fast hash function implementations and bitset manipulation methods for efficient data processing and boolean operations.
public static int hashInt(int input, int seed);
public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed);
public static void set(Object baseObject, long baseOffset, int index);
public static boolean isSet(Object baseObject, long baseOffset, int index);
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInWords);Hash Functions and Bitset Operations
Specialized data types including calendar intervals, variant values, and utility classes for date/time operations and collation support.
public CalendarInterval(int months, int days, long microseconds);
public VariantVal(byte[] value, byte[] metadata);
public String toJson(ZoneId zoneId);
public static UTF8String getCollationKey(UTF8String input, int collationId);
public static boolean isCaseInsensitive(int collationId);