0
# Apache Spark Unsafe
1
2
Apache Spark Unsafe module provides low-level unsafe operations for memory management, array operations, bitset operations, hash functions, and high-performance data types. It enables direct memory access through sun.misc.Unsafe, offering high-performance array operations, bitset manipulations, memory allocation strategies, hash function implementations, and optimized data types like UTF8String and CalendarInterval used throughout the Spark engine.
3
4
## Package Information
5
6
- **Package Name**: spark-unsafe_2.13
7
- **Package Type**: Maven
8
- **Language**: Java
9
- **Installation**:
10
```xml
11
<dependency>
12
<groupId>org.apache.spark</groupId>
13
<artifactId>spark-unsafe_2.13</artifactId>
14
<version>4.0.0</version>
15
</dependency>
16
```
17
18
## Core Imports
19
20
```java
21
import org.apache.spark.unsafe.Platform;
22
import org.apache.spark.unsafe.types.UTF8String;
23
import org.apache.spark.unsafe.memory.MemoryBlock;
24
import org.apache.spark.unsafe.memory.MemoryAllocator;
25
```
26
27
## Basic Usage
28
29
```java
30
import org.apache.spark.unsafe.Platform;
31
import org.apache.spark.unsafe.types.UTF8String;
32
import org.apache.spark.unsafe.memory.MemoryAllocator;
33
import org.apache.spark.unsafe.memory.MemoryBlock;
34
35
// Memory allocation and management
36
MemoryAllocator allocator = MemoryAllocator.UNSAFE;
37
MemoryBlock block = allocator.allocate(1024);
38
39
// Direct memory access
40
Platform.putLong(block.getBaseObject(), block.getBaseOffset(), 42L);
41
long value = Platform.getLong(block.getBaseObject(), block.getBaseOffset());
42
43
// UTF-8 string operations
44
UTF8String str1 = UTF8String.fromString("Hello");
45
UTF8String str2 = UTF8String.fromString(" World");
46
UTF8String result = UTF8String.concat(str1, str2);
47
48
// Clean up
49
allocator.free(block);
50
```
51
52
## Architecture
53
54
The module is organized into several key packages providing different aspects of low-level functionality:
55
56
- **Platform Operations**: Direct memory access and platform-specific optimizations using sun.misc.Unsafe
57
- **Memory Management**: Heap and off-heap memory allocation with pooling and debugging support
58
- **Array Operations**: High-performance byte and long array operations without bounds checking
59
- **String Processing**: Comprehensive UTF-8 string manipulation with collation support
60
- **Hash Functions**: Fast hash implementations (Murmur3, Hive-compatible) for data processing
61
- **Bitset Operations**: Fixed-size bitset manipulation for efficient boolean operations
62
- **Data Types**: Specialized data structures like calendar intervals and variant values
63
64
## Capabilities
65
66
### Platform Operations
67
68
Direct memory access and platform-specific operations using sun.misc.Unsafe for maximum performance in big data processing scenarios.
69
70
```java { .api }
71
public static int getInt(Object object, long offset);
72
public static void putInt(Object object, long offset, int value);
73
public static long allocateMemory(long size);
74
public static void freeMemory(long address);
75
public static void copyMemory(Object src, long srcOffset, Object dst, long dstOffset, long length);
76
public static boolean unaligned();
77
```
78
79
[Platform Operations](./platform-operations.md)
80
81
### Memory Management
82
83
Memory allocation and management supporting both heap and off-heap memory with object pooling for large allocations and debugging capabilities.
84
85
```java { .api }
86
public abstract MemoryBlock allocate(long size) throws OutOfMemoryError;
87
public abstract void free(MemoryBlock memory);
88
public MemoryBlock(Object obj, long offset, long length);
89
public long size();
90
public void fill(byte value);
91
```
92
93
[Memory Management](./memory-management.md)
94
95
### Array Operations
96
97
Optimized byte and long array operations supporting both on-heap and off-heap memory without bounds checking for maximum performance.
98
99
```java { .api }
100
public static boolean arrayEquals(Object leftBase, long leftOffset, Object rightBase, long rightOffset, long length);
101
public static long nextPowerOf2(long num);
102
public static int roundNumberOfBytesToNearestWord(int numBytes);
103
public void set(int index, long value);
104
public long get(int index);
105
```
106
107
[Array Operations](./array-operations.md)
108
109
### UTF-8 String Processing
110
111
Comprehensive UTF-8 string manipulation capabilities with extensive string operations, collation support, and optimized storage for internal Spark use.
112
113
```java { .api }
114
public static UTF8String fromString(String str);
115
public static UTF8String concat(UTF8String... inputs);
116
public int numBytes();
117
public int numChars();
118
public UTF8String substring(int start, int until);
119
public boolean contains(UTF8String substring);
120
public UTF8String toUpperCase();
121
public UTF8String trim();
122
```
123
124
[UTF-8 String Processing](./utf8-string-processing.md)
125
126
### Hash Functions and Bitset Operations
127
128
Fast hash function implementations and bitset manipulation methods for efficient data processing and boolean operations.
129
130
```java { .api }
131
public static int hashInt(int input, int seed);
132
public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed);
133
public static void set(Object baseObject, long baseOffset, int index);
134
public static boolean isSet(Object baseObject, long baseOffset, int index);
135
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInWords);
136
```
137
138
[Hash Functions and Bitset Operations](./hash-bitset-operations.md)
139
140
### Data Types and Utilities
141
142
Specialized data types including calendar intervals, variant values, and utility classes for date/time operations and collation support.
143
144
```java { .api }
145
public CalendarInterval(int months, int days, long microseconds);
146
public VariantVal(byte[] value, byte[] metadata);
147
public String toJson(ZoneId zoneId);
148
public static UTF8String getCollationKey(UTF8String input, int collationId);
149
public static boolean isCaseInsensitive(int collationId);
150
```
151
152
[Data Types and Utilities](./data-types-utilities.md)