0
# Apache Spark Unsafe
1
2
Apache Spark Unsafe provides low-level memory operations and high-performance data structures for Apache Spark's internal operations. It includes unsafe memory operations, specialized data types for efficient string and interval handling, memory allocators (heap and off-heap), array manipulation utilities, hashing functions, and bitset operations.
3
4
**Important**: This library is designed for internal Spark use to achieve maximum performance by bypassing Java's safety mechanisms for direct memory access. It is not suitable for general application development but is critical for Spark's core engine performance.
5
6
## Package Information
7
8
- **Package Name**: org.apache.spark:spark-unsafe_2.11
9
- **Package Type**: maven
10
- **Language**: Java
11
- **Version**: 2.4.8
12
- **Installation**: Add to your `pom.xml`: `<dependency><groupId>org.apache.spark</groupId><artifactId>spark-unsafe_2.11</artifactId><version>2.4.8</version></dependency>`
13
14
## Core Imports
15
16
```java
17
import org.apache.spark.unsafe.Platform;
18
import org.apache.spark.unsafe.UnsafeAlignedOffset;
19
import org.apache.spark.unsafe.KVIterator;
20
import org.apache.spark.unsafe.types.UTF8String;
21
import org.apache.spark.unsafe.types.CalendarInterval;
22
import org.apache.spark.unsafe.types.ByteArray;
23
import org.apache.spark.unsafe.memory.MemoryAllocator;
24
import org.apache.spark.unsafe.memory.MemoryBlock;
25
```
26
27
## Basic Usage
28
29
```java
30
import org.apache.spark.unsafe.Platform;
31
import org.apache.spark.unsafe.UnsafeAlignedOffset;
32
import org.apache.spark.unsafe.types.UTF8String;
33
import org.apache.spark.unsafe.types.ByteArray;
34
import org.apache.spark.unsafe.memory.HeapMemoryAllocator;
35
import org.apache.spark.unsafe.memory.MemoryBlock;
36
37
// Basic unsafe memory operations
38
long address = Platform.allocateMemory(1024);
39
Platform.putLong(null, address, 42L);
40
long value = Platform.getLong(null, address);
41
Platform.freeMemory(address);
42
43
// UTF8String operations
44
UTF8String str = UTF8String.fromString("Hello, World!");
45
UTF8String upper = str.toUpperCase();
46
boolean contains = str.contains(UTF8String.fromString("World"));
47
48
// Memory allocation with aligned offset
49
HeapMemoryAllocator allocator = new HeapMemoryAllocator();
50
int headerSize = UnsafeAlignedOffset.getUaoSize();
51
MemoryBlock block = allocator.allocate(headerSize + 1024);
52
UnsafeAlignedOffset.putSize(block.getBaseObject(), block.getBaseOffset(), 1024);
53
allocator.free(block);
54
55
// ByteArray operations
56
byte[] data1 = "Hello".getBytes();
57
byte[] data2 = "World".getBytes();
58
byte[] combined = ByteArray.concat(data1, " ".getBytes(), data2);
59
```
60
61
## Architecture
62
63
The Spark Unsafe module is organized into several key functional areas:
64
65
- **Core Platform Operations**: Direct memory access via `Platform` class
66
- **Memory Management**: Allocators and memory blocks for heap and off-heap storage
67
- **Specialized Data Types**: High-performance UTF-8 strings and calendar intervals
68
- **Array Operations**: Optimized utilities for byte arrays and long arrays
69
- **Hashing and Bitsets**: High-performance hash functions and bitset operations
70
71
## Capabilities
72
73
### Core Platform Operations
74
75
The core unsafe platform operations provide direct memory access capabilities that bypass Java's safety mechanisms for maximum performance.
76
77
```java { .api }
78
public final class Platform {
79
// Array base offsets
80
public static final int BOOLEAN_ARRAY_OFFSET;
81
public static final int BYTE_ARRAY_OFFSET;
82
public static final int SHORT_ARRAY_OFFSET;
83
public static final int INT_ARRAY_OFFSET;
84
public static final int LONG_ARRAY_OFFSET;
85
public static final int FLOAT_ARRAY_OFFSET;
86
public static final int DOUBLE_ARRAY_OFFSET;
87
88
// Platform capabilities
89
public static boolean unaligned();
90
91
// Memory operations
92
public static long allocateMemory(long size);
93
public static void freeMemory(long address);
94
public static long reallocateMemory(long address, long oldSize, long newSize);
95
public static java.nio.ByteBuffer allocateDirectBuffer(int size);
96
}
97
```
98
99
[Core Platform Operations](./platform.md)
100
101
### Memory Management
102
103
Memory management capabilities include both heap and off-heap allocation strategies with debug support.
104
105
```java { .api }
106
public interface MemoryAllocator {
107
public static final MemoryAllocator UNSAFE;
108
public static final MemoryAllocator HEAP;
109
110
MemoryBlock allocate(long size);
111
void free(MemoryBlock memory);
112
}
113
114
public class MemoryBlock {
115
public int pageNumber;
116
117
public MemoryBlock(Object obj, long offset, long length);
118
public long size();
119
public void fill(byte value);
120
}
121
```
122
123
[Memory Management](./memory.md)
124
125
### UTF8 String Operations
126
127
High-performance UTF-8 string implementation optimized for Spark SQL operations with comprehensive string manipulation capabilities.
128
129
```java { .api }
130
public final class UTF8String implements Comparable<UTF8String> {
131
public static final UTF8String EMPTY_UTF8;
132
133
// Creation methods
134
public static UTF8String fromString(String str);
135
public static UTF8String fromBytes(byte[] bytes);
136
public static UTF8String concat(UTF8String... inputs);
137
138
// Core operations
139
public int numBytes();
140
public int numChars();
141
public UTF8String substring(int start, int until);
142
public boolean contains(UTF8String substring);
143
public UTF8String toUpperCase();
144
public UTF8String toLowerCase();
145
}
146
```
147
148
[UTF8 String Operations](./utf8-strings.md)
149
150
### Calendar Intervals
151
152
Calendar interval representation for handling time periods with month and microsecond precision.
153
154
```java { .api }
155
public final class CalendarInterval {
156
// Time constants
157
public static final long MICROS_PER_SECOND = 1000000L;
158
public static final long MICROS_PER_MINUTE = 60000000L;
159
public static final long MICROS_PER_HOUR = 3600000000L;
160
public static final long MICROS_PER_DAY = 86400000000L;
161
162
public int months;
163
public long microseconds;
164
165
public CalendarInterval(int months, long microseconds);
166
public static CalendarInterval fromString(String s);
167
public CalendarInterval add(CalendarInterval that);
168
}
169
```
170
171
[Calendar Intervals](./intervals.md)
172
173
### Array Operations
174
175
High-performance utilities for byte arrays and memory-backed long arrays with optimized equality checking and alignment operations.
176
177
```java { .api }
178
public class ByteArrayMethods {
179
public static final int MAX_ROUNDED_ARRAY_LENGTH;
180
181
public static long nextPowerOf2(long num);
182
public static int roundNumberOfBytesToNearestWord(int numBytes);
183
public static boolean arrayEquals(Object leftBase, long leftOffset,
184
Object rightBase, long rightOffset, long length);
185
}
186
187
public final class LongArray {
188
public LongArray(MemoryBlock memory);
189
public long size();
190
public void set(int index, long value);
191
public long get(int index);
192
}
193
```
194
195
[Array Operations](./arrays.md)
196
197
### Hashing and Bitsets
198
199
Murmur3 hash implementation and bitset manipulation utilities for high-performance data processing.
200
201
```java { .api }
202
public final class Murmur3_x86_32 {
203
public Murmur3_x86_32(int seed);
204
public int hashInt(int input);
205
public int hashLong(long input);
206
public static int hashInt(int input, int seed);
207
public static int hashLong(long input, int seed);
208
}
209
210
public final class BitSetMethods {
211
public static void set(Object baseObject, long baseOffset, int index);
212
public static boolean isSet(Object baseObject, long baseOffset, int index);
213
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInWords);
214
}
215
```
216
217
[Hashing and Bitsets](./hashing-bitsets.md)
218
219
## Types
220
221
### Core Memory Types
222
223
```java { .api }
224
public class MemoryLocation {
225
public MemoryLocation(Object obj, long offset);
226
public Object getBaseObject();
227
public long getBaseOffset();
228
public void setObjAndOffset(Object newObj, long newOffset);
229
}
230
231
public abstract class KVIterator<K, V> {
232
public abstract boolean next();
233
public abstract K getKey();
234
public abstract V getValue();
235
public abstract void close();
236
}
237
```
238
239
### Specialized Memory Allocators
240
241
```java { .api }
242
public class HeapMemoryAllocator implements MemoryAllocator {
243
public MemoryBlock allocate(long size);
244
public void free(MemoryBlock memory);
245
}
246
247
public class UnsafeMemoryAllocator implements MemoryAllocator {
248
public MemoryBlock allocate(long size);
249
public void free(MemoryBlock memory);
250
}
251
```
252
253
### Utility Classes
254
255
```java { .api }
256
public class UnsafeAlignedOffset {
257
public static int getUaoSize();
258
public static int getSize(Object object, long offset);
259
public static void putSize(Object object, long offset, int value);
260
}
261
```