CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-spark--spark-unsafe-2-13

Apache Spark Unsafe module provides low-level unsafe operations for memory management, array operations, bitset operations, hash functions, and high-performance data types.

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

index.mddocs/

Apache Spark Unsafe

Apache Spark Unsafe module provides low-level unsafe operations for memory management, array operations, bitset operations, hash functions, and high-performance data types. It enables direct memory access through sun.misc.Unsafe, offering high-performance array operations, bitset manipulations, memory allocation strategies, hash function implementations, and optimized data types like UTF8String and CalendarInterval used throughout the Spark engine.

Package Information

  • Package Name: spark-unsafe_2.13
  • Package Type: Maven
  • Language: Java
  • Installation:
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-unsafe_2.13</artifactId>
  <version>4.0.0</version>
</dependency>

Core Imports

import org.apache.spark.unsafe.Platform;
import org.apache.spark.unsafe.types.UTF8String;
import org.apache.spark.unsafe.memory.MemoryBlock;
import org.apache.spark.unsafe.memory.MemoryAllocator;

Basic Usage

import org.apache.spark.unsafe.Platform;
import org.apache.spark.unsafe.types.UTF8String;
import org.apache.spark.unsafe.memory.MemoryAllocator;
import org.apache.spark.unsafe.memory.MemoryBlock;

// Memory allocation and management
MemoryAllocator allocator = MemoryAllocator.UNSAFE;
MemoryBlock block = allocator.allocate(1024);

// Direct memory access
Platform.putLong(block.getBaseObject(), block.getBaseOffset(), 42L);
long value = Platform.getLong(block.getBaseObject(), block.getBaseOffset());

// UTF-8 string operations
UTF8String str1 = UTF8String.fromString("Hello");
UTF8String str2 = UTF8String.fromString(" World");
UTF8String result = UTF8String.concat(str1, str2);

// Clean up
allocator.free(block);

Architecture

The module is organized into several key packages providing different aspects of low-level functionality:

  • Platform Operations: Direct memory access and platform-specific optimizations using sun.misc.Unsafe
  • Memory Management: Heap and off-heap memory allocation with pooling and debugging support
  • Array Operations: High-performance byte and long array operations without bounds checking
  • String Processing: Comprehensive UTF-8 string manipulation with collation support
  • Hash Functions: Fast hash implementations (Murmur3, Hive-compatible) for data processing
  • Bitset Operations: Fixed-size bitset manipulation for efficient boolean operations
  • Data Types: Specialized data structures like calendar intervals and variant values

Capabilities

Platform Operations

Direct memory access and platform-specific operations using sun.misc.Unsafe for maximum performance in big data processing scenarios.

public static int getInt(Object object, long offset);
public static void putInt(Object object, long offset, int value);
public static long allocateMemory(long size);
public static void freeMemory(long address);
public static void copyMemory(Object src, long srcOffset, Object dst, long dstOffset, long length);
public static boolean unaligned();

Platform Operations

Memory Management

Memory allocation and management supporting both heap and off-heap memory with object pooling for large allocations and debugging capabilities.

public abstract MemoryBlock allocate(long size) throws OutOfMemoryError;
public abstract void free(MemoryBlock memory);
public MemoryBlock(Object obj, long offset, long length);
public long size();
public void fill(byte value);

Memory Management

Array Operations

Optimized byte and long array operations supporting both on-heap and off-heap memory without bounds checking for maximum performance.

public static boolean arrayEquals(Object leftBase, long leftOffset, Object rightBase, long rightOffset, long length);
public static long nextPowerOf2(long num);
public static int roundNumberOfBytesToNearestWord(int numBytes);
public void set(int index, long value);
public long get(int index);

Array Operations

UTF-8 String Processing

Comprehensive UTF-8 string manipulation capabilities with extensive string operations, collation support, and optimized storage for internal Spark use.

public static UTF8String fromString(String str);
public static UTF8String concat(UTF8String... inputs);
public int numBytes();
public int numChars();
public UTF8String substring(int start, int until);
public boolean contains(UTF8String substring);
public UTF8String toUpperCase();
public UTF8String trim();

UTF-8 String Processing

Hash Functions and Bitset Operations

Fast hash function implementations and bitset manipulation methods for efficient data processing and boolean operations.

public static int hashInt(int input, int seed);
public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed);
public static void set(Object baseObject, long baseOffset, int index);
public static boolean isSet(Object baseObject, long baseOffset, int index);
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInWords);

Hash Functions and Bitset Operations

Data Types and Utilities

Specialized data types including calendar intervals, variant values, and utility classes for date/time operations and collation support.

public CalendarInterval(int months, int days, long microseconds);
public VariantVal(byte[] value, byte[] metadata);
public String toJson(ZoneId zoneId);
public static UTF8String getCollationKey(UTF8String input, int collationId);
public static boolean isCaseInsensitive(int collationId);

Data Types and Utilities

docs

array-operations.md

data-types-utilities.md

hash-bitset-operations.md

index.md

memory-management.md

platform-operations.md

utf8-string-processing.md

tile.json