or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-operations.mdhashing-utilities.mdindex.mdmemory-management.mdplatform-operations.mdutf8-string-operations.md

index.mddocs/

0

# Apache Spark Unsafe

1

2

Apache Spark Unsafe is a low-level memory management and unsafe operations library that provides direct memory access, optimized data structures, and platform-specific functionality for high-performance data processing within Apache Spark. This library enables bypassing standard Java memory safety mechanisms when necessary, offering direct access to off-heap memory, efficient serialization operations, and optimized data structures for large-scale distributed computing workloads.

3

4

## Package Information

5

6

- **Package Name**: spark-unsafe_2.12

7

- **Package Type**: Maven

8

- **Language**: Java

9

- **Group ID**: org.apache.spark

10

- **Artifact ID**: spark-unsafe_2.12

11

- **Installation**: Add to Maven `pom.xml`:

12

13

```xml

14

<dependency>

15

<groupId>org.apache.spark</groupId>

16

<artifactId>spark-unsafe_2.12</artifactId>

17

<version>3.5.6</version>

18

</dependency>

19

```

20

21

## Core Imports

22

23

```java

24

// Memory operations and platform utilities

25

import org.apache.spark.unsafe.Platform;

26

import org.apache.spark.unsafe.memory.MemoryAllocator;

27

import org.apache.spark.unsafe.memory.MemoryBlock;

28

import org.apache.spark.unsafe.memory.MemoryLocation;

29

import org.apache.spark.unsafe.memory.HeapMemoryAllocator;

30

import org.apache.spark.unsafe.memory.UnsafeMemoryAllocator;

31

32

// Data structures and arrays

33

import org.apache.spark.unsafe.array.LongArray;

34

import org.apache.spark.unsafe.array.ByteArrayMethods;

35

import org.apache.spark.unsafe.types.UTF8String;

36

import org.apache.spark.unsafe.types.UTF8StringBuilder;

37

import org.apache.spark.unsafe.types.CalendarInterval;

38

import org.apache.spark.unsafe.types.ByteArray;

39

40

// Hashing and utilities

41

import org.apache.spark.unsafe.hash.Murmur3_x86_32;

42

import org.apache.spark.unsafe.bitset.BitSetMethods;

43

import org.apache.spark.unsafe.UnsafeAlignedOffset;

44

import org.apache.spark.sql.catalyst.expressions.HiveHasher;

45

import org.apache.spark.sql.catalyst.util.DateTimeConstants;

46

```

47

48

## Basic Usage

49

50

```java

51

import org.apache.spark.unsafe.Platform;

52

import org.apache.spark.unsafe.memory.*;

53

import org.apache.spark.unsafe.types.UTF8String;

54

55

// Memory allocation example

56

MemoryAllocator allocator = MemoryAllocator.HEAP;

57

MemoryBlock block = allocator.allocate(1024);

58

59

// Direct memory operations

60

Platform.putLong(block.getBaseObject(), block.getBaseOffset(), 42L);

61

long value = Platform.getLong(block.getBaseObject(), block.getBaseOffset());

62

63

// UTF8String operations

64

UTF8String str = UTF8String.fromString("Hello Spark");

65

UTF8String upper = str.toUpperCase();

66

int length = str.numChars();

67

68

// Clean up

69

allocator.free(block);

70

```

71

72

## Architecture

73

74

Apache Spark Unsafe is built around several key components:

75

76

- **Memory Management**: Allocation strategies for heap and off-heap memory with automatic pooling

77

- **Platform Layer**: Low-level unsafe operations that provide direct memory access across different platforms

78

- **Data Structures**: Memory-efficient arrays, strings, and bitsets optimized for Spark's use cases

79

- **Type System**: Specialized types like UTF8String and CalendarInterval designed for internal Spark operations

80

- **Hashing**: Fast hashing implementations for data distribution and partitioning

81

- **Safety Features**: Debug modes and alignment handling for platform-specific memory requirements

82

83

## Capabilities

84

85

### Memory Management

86

87

Core memory allocation and management functionality providing both heap and off-heap memory strategies with automatic pooling and debug support.

88

89

```java { .api }

90

interface MemoryAllocator {

91

MemoryBlock allocate(long size) throws OutOfMemoryError;

92

void free(MemoryBlock memory);

93

94

static final MemoryAllocator HEAP = new HeapMemoryAllocator();

95

static final MemoryAllocator UNSAFE = new UnsafeMemoryAllocator();

96

}

97

98

class MemoryBlock extends MemoryLocation {

99

public MemoryBlock(Object obj, long offset, long length);

100

public long size();

101

public void fill(byte value);

102

public static MemoryBlock fromLongArray(long[] array);

103

}

104

```

105

106

[Memory Management](./memory-management.md)

107

108

### Platform Operations

109

110

Low-level unsafe memory operations and platform-specific functionality for direct memory access, copying, and platform feature detection.

111

112

```java { .api }

113

final class Platform {

114

// Memory access methods for all primitive types

115

public static int getInt(Object object, long offset);

116

public static void putInt(Object object, long offset, int value);

117

public static long getLong(Object object, long offset);

118

public static void putLong(Object object, long offset, long value);

119

public static byte getByte(Object object, long offset);

120

public static void putByte(Object object, long offset, byte value);

121

public static short getShort(Object object, long offset);

122

public static void putShort(Object object, long offset, short value);

123

public static float getFloat(Object object, long offset);

124

public static void putFloat(Object object, long offset, float value);

125

public static double getDouble(Object object, long offset);

126

public static void putDouble(Object object, long offset, double value);

127

public static boolean getBoolean(Object object, long offset);

128

public static void putBoolean(Object object, long offset, boolean value);

129

130

// Volatile operations

131

public static Object getObjectVolatile(Object object, long offset);

132

public static void putObjectVolatile(Object object, long offset, Object value);

133

134

// Memory allocation and management

135

public static long allocateMemory(long size);

136

public static void freeMemory(long address);

137

public static long reallocateMemory(long address, long oldSize, long newSize);

138

public static void copyMemory(Object src, long srcOffset, Object dst, long dstOffset, long length);

139

public static void setMemory(Object object, long offset, long size, byte value);

140

public static void setMemory(long address, byte value, long size);

141

142

// Platform detection and utilities

143

public static boolean unaligned();

144

public static boolean cleanerCreateMethodIsDefined();

145

public static void throwException(Throwable t);

146

public static ByteBuffer allocateDirectBuffer(int size);

147

}

148

```

149

150

[Platform Operations](./platform-operations.md)

151

152

### UTF8 String Operations

153

154

Memory-efficient UTF-8 string implementation with comprehensive string manipulation, parsing, and comparison operations optimized for Spark's internal use.

155

156

```java { .api }

157

final class UTF8String implements Comparable<UTF8String>, Externalizable, KryoSerializable, Cloneable {

158

// Factory methods

159

public static UTF8String fromString(String str);

160

public static UTF8String fromBytes(byte[] bytes);

161

public static UTF8String concat(UTF8String... inputs);

162

163

// String operations

164

public UTF8String substring(int start, int until);

165

public UTF8String toUpperCase();

166

public UTF8String trim();

167

public boolean contains(UTF8String substring);

168

169

// Parsing

170

public long toLongExact();

171

public int toIntExact();

172

}

173

```

174

175

[UTF8 String Operations](./utf8-string-operations.md)

176

177

### Array Operations

178

179

Efficient array implementations and utility methods for byte arrays and long arrays with support for both heap and off-heap memory storage.

180

181

```java { .api }

182

final class LongArray {

183

public LongArray(MemoryBlock memory);

184

public long get(int index);

185

public void set(int index, long value);

186

public long size();

187

public void zeroOut();

188

}

189

190

class ByteArrayMethods {

191

public static boolean arrayEquals(Object leftBase, long leftOffset, Object rightBase, long rightOffset, long length);

192

public static boolean contains(byte[] arr, byte[] sub);

193

public static long nextPowerOf2(long num);

194

}

195

```

196

197

[Array Operations](./array-operations.md)

198

199

### Hashing and Utilities

200

201

High-performance hashing implementations and utility classes including Murmur3 hashing, bitset operations, date/time constants, and platform alignment utilities.

202

203

```java { .api }

204

final class Murmur3_x86_32 {

205

public Murmur3_x86_32(int seed);

206

public static int hashInt(int input, int seed);

207

public static int hashLong(long input, int seed);

208

public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, int seed);

209

}

210

211

final class BitSetMethods {

212

public static void set(Object baseObject, long baseOffset, int index);

213

public static boolean isSet(Object baseObject, long baseOffset, int index);

214

public static int nextSetBit(Object baseObject, long baseOffset, int fromIndex, int bitsetSizeInWords);

215

}

216

217

class HiveHasher {

218

public static int hashInt(int input);

219

public static int hashLong(long input);

220

public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes);

221

}

222

223

class DateTimeConstants {

224

public static final int MONTHS_PER_YEAR = 12;

225

public static final long HOURS_PER_DAY = 24L;

226

public static final long SECONDS_PER_MINUTE = 60L;

227

public static final long MILLIS_PER_SECOND = 1000L;

228

public static final long MICROS_PER_MILLIS = 1000L;

229

public static final long NANOS_PER_MICROS = 1000L;

230

}

231

232

class UnsafeAlignedOffset {

233

public static int getUaoSize();

234

public static int getSize(Object object, long offset);

235

public static void putSize(Object object, long offset, int value);

236

}

237

```

238

239

[Hashing and Utilities](./hashing-utilities.md)

240

241

## Types

242

243

### Core Memory Types

244

245

```java { .api }

246

class MemoryLocation {

247

public MemoryLocation(Object obj, long offset);

248

public final Object getBaseObject();

249

public final long getBaseOffset();

250

}

251

252

abstract class KVIterator<K, V> {

253

public abstract boolean next() throws IOException;

254

public abstract K getKey();

255

public abstract V getValue();

256

public abstract void close();

257

}

258

```

259

260

### Specialized Data Types

261

262

```java { .api }

263

final class CalendarInterval implements Serializable {

264

public final int months;

265

public final int days;

266

public final long microseconds;

267

268

public CalendarInterval(int months, int days, long microseconds);

269

}

270

271

class UTF8StringBuilder {

272

public UTF8StringBuilder();

273

public UTF8StringBuilder(int initialSize);

274

public void append(UTF8String value);

275

public UTF8String build();

276

}

277

278

final class ByteArray {

279

public static final byte[] EMPTY_BYTE;

280

public static void writeToMemory(byte[] src, Object target, long targetOffset);

281

public static long getPrefix(byte[] bytes);

282

public static int compareBinary(byte[] leftBase, byte[] rightBase);

283

public static byte[] subStringSQL(byte[] bytes, int pos, int len);

284

public static byte[] concat(byte[]... inputs);

285

}

286

```

287

288

## Error Handling

289

290

The library throws standard Java exceptions:

291

- `OutOfMemoryError` - When memory allocation fails

292

- `IOException` - During iterator operations

293

- `NumberFormatException` - During string parsing operations

294

- `IndexOutOfBoundsException` - For array access violations

295

296

Most operations are designed to be fail-fast and do not perform bounds checking for performance reasons. Users should validate inputs before calling unsafe operations.

297

298

## Performance Considerations

299

300

- **Zero-Copy Operations**: Many string and array operations avoid memory copying

301

- **Direct Memory Access**: Bypasses JVM safety checks for maximum performance

302

- **Memory Pooling**: Automatic pooling for large heap allocations

303

- **Platform Optimization**: Platform-specific optimizations for memory alignment

304

- **Lazy Evaluation**: Some operations use lazy evaluation patterns for efficiency