or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

arrays.mdhashing-bitsets.mdindex.mdintervals.mdmemory.mdplatform.mdutf8-strings.md

index.mddocs/

0

# Apache Spark Unsafe

1

2

Apache Spark Unsafe provides low-level memory operations and high-performance data structures for Apache Spark's internal operations. It includes unsafe memory operations, specialized data types for efficient string and interval handling, memory allocators (heap and off-heap), array manipulation utilities, hashing functions, and bitset operations.

3

4

**Important**: This library is designed for internal Spark use to achieve maximum performance by bypassing Java's safety mechanisms for direct memory access. It is not suitable for general application development but is critical for Spark's core engine performance.

5

6

## Package Information

7

8

- **Package Name**: org.apache.spark:spark-unsafe_2.11

9

- **Package Type**: maven

10

- **Language**: Java

11

- **Version**: 2.4.8

12

- **Installation**: Add to your `pom.xml`: `<dependency><groupId>org.apache.spark</groupId><artifactId>spark-unsafe_2.11</artifactId><version>2.4.8</version></dependency>`

13

14

## Core Imports

15

16

```java

17

import org.apache.spark.unsafe.Platform;

18

import org.apache.spark.unsafe.UnsafeAlignedOffset;

19

import org.apache.spark.unsafe.KVIterator;

20

import org.apache.spark.unsafe.types.UTF8String;

21

import org.apache.spark.unsafe.types.CalendarInterval;

22

import org.apache.spark.unsafe.types.ByteArray;

23

import org.apache.spark.unsafe.memory.MemoryAllocator;

24

import org.apache.spark.unsafe.memory.MemoryBlock;

25

```

26

27

## Basic Usage

28

29

```java

30

import org.apache.spark.unsafe.Platform;

31

import org.apache.spark.unsafe.UnsafeAlignedOffset;

32

import org.apache.spark.unsafe.types.UTF8String;

33

import org.apache.spark.unsafe.types.ByteArray;

34

import org.apache.spark.unsafe.memory.HeapMemoryAllocator;

35

import org.apache.spark.unsafe.memory.MemoryBlock;

36

37

// Basic unsafe memory operations

38

long address = Platform.allocateMemory(1024);

39

Platform.putLong(null, address, 42L);

40

long value = Platform.getLong(null, address);

41

Platform.freeMemory(address);

42

43

// UTF8String operations

44

UTF8String str = UTF8String.fromString("Hello, World!");

45

UTF8String upper = str.toUpperCase();

46

boolean contains = str.contains(UTF8String.fromString("World"));

47

48

// Memory allocation with aligned offset

49

HeapMemoryAllocator allocator = new HeapMemoryAllocator();

50

int headerSize = UnsafeAlignedOffset.getUaoSize();

51

MemoryBlock block = allocator.allocate(headerSize + 1024);

52

UnsafeAlignedOffset.putSize(block.getBaseObject(), block.getBaseOffset(), 1024);

53

allocator.free(block);

54

55

// ByteArray operations

56

byte[] data1 = "Hello".getBytes();

57

byte[] data2 = "World".getBytes();

58

byte[] combined = ByteArray.concat(data1, " ".getBytes(), data2);

59

```

60

61

## Architecture

62

63

The Spark Unsafe module is organized into several key functional areas:

64

65

- **Core Platform Operations**: Direct memory access via `Platform` class

66

- **Memory Management**: Allocators and memory blocks for heap and off-heap storage

67

- **Specialized Data Types**: High-performance UTF-8 strings and calendar intervals

68

- **Array Operations**: Optimized utilities for byte arrays and long arrays

69

- **Hashing and Bitsets**: High-performance hash functions and bitset operations

70

71

## Capabilities

72

73

### Core Platform Operations

74

75

The core unsafe platform operations provide direct memory access capabilities that bypass Java's safety mechanisms for maximum performance.

76

77

```java { .api }

78

public final class Platform {

79

// Array base offsets

80

public static final int BOOLEAN_ARRAY_OFFSET;

81

public static final int BYTE_ARRAY_OFFSET;

82

public static final int SHORT_ARRAY_OFFSET;

83

public static final int INT_ARRAY_OFFSET;

84

public static final int LONG_ARRAY_OFFSET;

85

public static final int FLOAT_ARRAY_OFFSET;

86

public static final int DOUBLE_ARRAY_OFFSET;

87

88

// Platform capabilities

89

public static boolean unaligned();

90

91

// Memory operations

92

public static long allocateMemory(long size);

93

public static void freeMemory(long address);

94

public static long reallocateMemory(long address, long oldSize, long newSize);

95

public static java.nio.ByteBuffer allocateDirectBuffer(int size);

96

}

97

```

98

99

[Core Platform Operations](./platform.md)

100

101

### Memory Management

102

103

Memory management capabilities include both heap and off-heap allocation strategies with debug support.

104

105

```java { .api }

106

public interface MemoryAllocator {

107

public static final MemoryAllocator UNSAFE;

108

public static final MemoryAllocator HEAP;

109

110

MemoryBlock allocate(long size);

111

void free(MemoryBlock memory);

112

}

113

114

public class MemoryBlock {

115

public int pageNumber;

116

117

public MemoryBlock(Object obj, long offset, long length);

118

public long size();

119

public void fill(byte value);

120

}

121

```

122

123

[Memory Management](./memory.md)

124

125

### UTF8 String Operations

126

127

High-performance UTF-8 string implementation optimized for Spark SQL operations with comprehensive string manipulation capabilities.

128

129

```java { .api }

130

public final class UTF8String implements Comparable<UTF8String> {

131

public static final UTF8String EMPTY_UTF8;

132

133

// Creation methods

134

public static UTF8String fromString(String str);

135

public static UTF8String fromBytes(byte[] bytes);

136

public static UTF8String concat(UTF8String... inputs);

137

138

// Core operations

139

public int numBytes();

140

public int numChars();

141

public UTF8String substring(int start, int until);

142

public boolean contains(UTF8String substring);

143

public UTF8String toUpperCase();

144

public UTF8String toLowerCase();

145

}

146

```

147

148

[UTF8 String Operations](./utf8-strings.md)

149

150

### Calendar Intervals

151

152

Calendar interval representation for handling time periods with month and microsecond precision.

153

154

```java { .api }

155

public final class CalendarInterval {

156

// Time constants

157

public static final long MICROS_PER_SECOND = 1000000L;

158

public static final long MICROS_PER_MINUTE = 60000000L;

159

public static final long MICROS_PER_HOUR = 3600000000L;

160

public static final long MICROS_PER_DAY = 86400000000L;

161

162

public int months;

163

public long microseconds;

164

165

public CalendarInterval(int months, long microseconds);

166

public static CalendarInterval fromString(String s);

167

public CalendarInterval add(CalendarInterval that);

168

}

169

```

170

171

[Calendar Intervals](./intervals.md)

172

173

### Array Operations

174

175

High-performance utilities for byte arrays and memory-backed long arrays with optimized equality checking and alignment operations.

176

177

```java { .api }

178

public class ByteArrayMethods {

179

public static final int MAX_ROUNDED_ARRAY_LENGTH;

180

181

public static long nextPowerOf2(long num);

182

public static int roundNumberOfBytesToNearestWord(int numBytes);

183

public static boolean arrayEquals(Object leftBase, long leftOffset,

184

Object rightBase, long rightOffset, long length);

185

}

186

187

public final class LongArray {

188

public LongArray(MemoryBlock memory);

189

public long size();

190

public void set(int index, long value);

191

public long get(int index);

192

}

193

```

194

195

[Array Operations](./arrays.md)

196

197

### Hashing and Bitsets

198

199

Murmur3 hash implementation and bitset manipulation utilities for high-performance data processing.

200

201

```java { .api }

202

public final class Murmur3_x86_32 {

203

public Murmur3_x86_32(int seed);

204

public int hashInt(int input);

205

public int hashLong(long input);

206

public static int hashInt(int input, int seed);

207

public static int hashLong(long input, int seed);

208

}

209

210

public final class BitSetMethods {

211

public static void set(Object baseObject, long baseOffset, int index);

212

public static boolean isSet(Object baseObject, long baseOffset, int index);

213

public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInWords);

214

}

215

```

216

217

[Hashing and Bitsets](./hashing-bitsets.md)

218

219

## Types

220

221

### Core Memory Types

222

223

```java { .api }

224

public class MemoryLocation {

225

public MemoryLocation(Object obj, long offset);

226

public Object getBaseObject();

227

public long getBaseOffset();

228

public void setObjAndOffset(Object newObj, long newOffset);

229

}

230

231

public abstract class KVIterator<K, V> {

232

public abstract boolean next();

233

public abstract K getKey();

234

public abstract V getValue();

235

public abstract void close();

236

}

237

```

238

239

### Specialized Memory Allocators

240

241

```java { .api }

242

public class HeapMemoryAllocator implements MemoryAllocator {

243

public MemoryBlock allocate(long size);

244

public void free(MemoryBlock memory);

245

}

246

247

public class UnsafeMemoryAllocator implements MemoryAllocator {

248

public MemoryBlock allocate(long size);

249

public void free(MemoryBlock memory);

250

}

251

```

252

253

### Utility Classes

254

255

```java { .api }

256

public class UnsafeAlignedOffset {

257

public static int getUaoSize();

258

public static int getSize(Object object, long offset);

259

public static void putSize(Object object, long offset, int value);

260

}

261

```