or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

array-operations.mddata-types-utilities.mdhash-bitset-operations.mdindex.mdmemory-management.mdplatform-operations.mdutf8-string-processing.md
tile.json

tessl/maven-org-apache-spark--spark-unsafe_2-13

Apache Spark Unsafe module provides low-level unsafe operations for memory management, array operations, bitset operations, hash functions, and high-performance data types.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-unsafe_2.13@4.0.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-unsafe_2-13@4.0.0

index.mddocs/

Apache Spark Unsafe

Apache Spark Unsafe module provides low-level unsafe operations for memory management, array operations, bitset operations, hash functions, and high-performance data types. It enables direct memory access through sun.misc.Unsafe, offering high-performance array operations, bitset manipulations, memory allocation strategies, hash function implementations, and optimized data types like UTF8String and CalendarInterval used throughout the Spark engine.

Package Information

  • Package Name: spark-unsafe_2.13
  • Package Type: Maven
  • Language: Java
  • Installation:
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-unsafe_2.13</artifactId>
  <version>4.0.0</version>
</dependency>

Core Imports

import org.apache.spark.unsafe.Platform;
import org.apache.spark.unsafe.types.UTF8String;
import org.apache.spark.unsafe.memory.MemoryBlock;
import org.apache.spark.unsafe.memory.MemoryAllocator;

Basic Usage

import org.apache.spark.unsafe.Platform;
import org.apache.spark.unsafe.types.UTF8String;
import org.apache.spark.unsafe.memory.MemoryAllocator;
import org.apache.spark.unsafe.memory.MemoryBlock;

// Memory allocation and management
MemoryAllocator allocator = MemoryAllocator.UNSAFE;
MemoryBlock block = allocator.allocate(1024);

// Direct memory access
Platform.putLong(block.getBaseObject(), block.getBaseOffset(), 42L);
long value = Platform.getLong(block.getBaseObject(), block.getBaseOffset());

// UTF-8 string operations
UTF8String str1 = UTF8String.fromString("Hello");
UTF8String str2 = UTF8String.fromString(" World");
UTF8String result = UTF8String.concat(str1, str2);

// Clean up
allocator.free(block);

Architecture

The module is organized into several key packages providing different aspects of low-level functionality:

  • Platform Operations: Direct memory access and platform-specific optimizations using sun.misc.Unsafe
  • Memory Management: Heap and off-heap memory allocation with pooling and debugging support
  • Array Operations: High-performance byte and long array operations without bounds checking
  • String Processing: Comprehensive UTF-8 string manipulation with collation support
  • Hash Functions: Fast hash implementations (Murmur3, Hive-compatible) for data processing
  • Bitset Operations: Fixed-size bitset manipulation for efficient boolean operations
  • Data Types: Specialized data structures like calendar intervals and variant values

Capabilities

Platform Operations

Direct memory access and platform-specific operations using sun.misc.Unsafe for maximum performance in big data processing scenarios.

public static int getInt(Object object, long offset);
public static void putInt(Object object, long offset, int value);
public static long allocateMemory(long size);
public static void freeMemory(long address);
public static void copyMemory(Object src, long srcOffset, Object dst, long dstOffset, long length);
public static boolean unaligned();

Platform Operations

Memory Management

Memory allocation and management supporting both heap and off-heap memory with object pooling for large allocations and debugging capabilities.

public abstract MemoryBlock allocate(long size) throws OutOfMemoryError;
public abstract void free(MemoryBlock memory);
public MemoryBlock(Object obj, long offset, long length);
public long size();
public void fill(byte value);

Memory Management

Array Operations

Optimized byte and long array operations supporting both on-heap and off-heap memory without bounds checking for maximum performance.

public static boolean arrayEquals(Object leftBase, long leftOffset, Object rightBase, long rightOffset, long length);
public static long nextPowerOf2(long num);
public static int roundNumberOfBytesToNearestWord(int numBytes);
public void set(int index, long value);
public long get(int index);

Array Operations

UTF-8 String Processing

Comprehensive UTF-8 string manipulation capabilities with extensive string operations, collation support, and optimized storage for internal Spark use.

public static UTF8String fromString(String str);
public static UTF8String concat(UTF8String... inputs);
public int numBytes();
public int numChars();
public UTF8String substring(int start, int until);
public boolean contains(UTF8String substring);
public UTF8String toUpperCase();
public UTF8String trim();

UTF-8 String Processing

Hash Functions and Bitset Operations

Fast hash function implementations and bitset manipulation methods for efficient data processing and boolean operations.

public static int hashInt(int input, int seed);
public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed);
public static void set(Object baseObject, long baseOffset, int index);
public static boolean isSet(Object baseObject, long baseOffset, int index);
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInWords);

Hash Functions and Bitset Operations

Data Types and Utilities

Specialized data types including calendar intervals, variant values, and utility classes for date/time operations and collation support.

public CalendarInterval(int months, int days, long microseconds);
public VariantVal(byte[] value, byte[] metadata);
public String toJson(ZoneId zoneId);
public static UTF8String getCollationKey(UTF8String input, int collationId);
public static boolean isCaseInsensitive(int collationId);

Data Types and Utilities