or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-operations.mddata-types-utilities.mdhash-bitset-operations.mdindex.mdmemory-management.mdplatform-operations.mdutf8-string-processing.md

index.mddocs/

0

# Apache Spark Unsafe

1

2

Apache Spark Unsafe module provides low-level unsafe operations for memory management, array operations, bitset operations, hash functions, and high-performance data types. It enables direct memory access through sun.misc.Unsafe, offering high-performance array operations, bitset manipulations, memory allocation strategies, hash function implementations, and optimized data types like UTF8String and CalendarInterval used throughout the Spark engine.

3

4

## Package Information

5

6

- **Package Name**: spark-unsafe_2.13

7

- **Package Type**: Maven

8

- **Language**: Java

9

- **Installation**:

10

```xml

11

<dependency>

12

<groupId>org.apache.spark</groupId>

13

<artifactId>spark-unsafe_2.13</artifactId>

14

<version>4.0.0</version>

15

</dependency>

16

```

17

18

## Core Imports

19

20

```java

21

import org.apache.spark.unsafe.Platform;

22

import org.apache.spark.unsafe.types.UTF8String;

23

import org.apache.spark.unsafe.memory.MemoryBlock;

24

import org.apache.spark.unsafe.memory.MemoryAllocator;

25

```

26

27

## Basic Usage

28

29

```java

30

import org.apache.spark.unsafe.Platform;

31

import org.apache.spark.unsafe.types.UTF8String;

32

import org.apache.spark.unsafe.memory.MemoryAllocator;

33

import org.apache.spark.unsafe.memory.MemoryBlock;

34

35

// Memory allocation and management

36

MemoryAllocator allocator = MemoryAllocator.UNSAFE;

37

MemoryBlock block = allocator.allocate(1024);

38

39

// Direct memory access

40

Platform.putLong(block.getBaseObject(), block.getBaseOffset(), 42L);

41

long value = Platform.getLong(block.getBaseObject(), block.getBaseOffset());

42

43

// UTF-8 string operations

44

UTF8String str1 = UTF8String.fromString("Hello");

45

UTF8String str2 = UTF8String.fromString(" World");

46

UTF8String result = UTF8String.concat(str1, str2);

47

48

// Clean up

49

allocator.free(block);

50

```

51

52

## Architecture

53

54

The module is organized into several key packages providing different aspects of low-level functionality:

55

56

- **Platform Operations**: Direct memory access and platform-specific optimizations using sun.misc.Unsafe

57

- **Memory Management**: Heap and off-heap memory allocation with pooling and debugging support

58

- **Array Operations**: High-performance byte and long array operations without bounds checking

59

- **String Processing**: Comprehensive UTF-8 string manipulation with collation support

60

- **Hash Functions**: Fast hash implementations (Murmur3, Hive-compatible) for data processing

61

- **Bitset Operations**: Fixed-size bitset manipulation for efficient boolean operations

62

- **Data Types**: Specialized data structures like calendar intervals and variant values

63

64

## Capabilities

65

66

### Platform Operations

67

68

Direct memory access and platform-specific operations using sun.misc.Unsafe for maximum performance in big data processing scenarios.

69

70

```java { .api }

71

public static int getInt(Object object, long offset);

72

public static void putInt(Object object, long offset, int value);

73

public static long allocateMemory(long size);

74

public static void freeMemory(long address);

75

public static void copyMemory(Object src, long srcOffset, Object dst, long dstOffset, long length);

76

public static boolean unaligned();

77

```

78

79

[Platform Operations](./platform-operations.md)

80

81

### Memory Management

82

83

Memory allocation and management supporting both heap and off-heap memory with object pooling for large allocations and debugging capabilities.

84

85

```java { .api }

86

public abstract MemoryBlock allocate(long size) throws OutOfMemoryError;

87

public abstract void free(MemoryBlock memory);

88

public MemoryBlock(Object obj, long offset, long length);

89

public long size();

90

public void fill(byte value);

91

```

92

93

[Memory Management](./memory-management.md)

94

95

### Array Operations

96

97

Optimized byte and long array operations supporting both on-heap and off-heap memory without bounds checking for maximum performance.

98

99

```java { .api }

100

public static boolean arrayEquals(Object leftBase, long leftOffset, Object rightBase, long rightOffset, long length);

101

public static long nextPowerOf2(long num);

102

public static int roundNumberOfBytesToNearestWord(int numBytes);

103

public void set(int index, long value);

104

public long get(int index);

105

```

106

107

[Array Operations](./array-operations.md)

108

109

### UTF-8 String Processing

110

111

Comprehensive UTF-8 string manipulation capabilities with extensive string operations, collation support, and optimized storage for internal Spark use.

112

113

```java { .api }

114

public static UTF8String fromString(String str);

115

public static UTF8String concat(UTF8String... inputs);

116

public int numBytes();

117

public int numChars();

118

public UTF8String substring(int start, int until);

119

public boolean contains(UTF8String substring);

120

public UTF8String toUpperCase();

121

public UTF8String trim();

122

```

123

124

[UTF-8 String Processing](./utf8-string-processing.md)

125

126

### Hash Functions and Bitset Operations

127

128

Fast hash function implementations and bitset manipulation methods for efficient data processing and boolean operations.

129

130

```java { .api }

131

public static int hashInt(int input, int seed);

132

public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed);

133

public static void set(Object baseObject, long baseOffset, int index);

134

public static boolean isSet(Object baseObject, long baseOffset, int index);

135

public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInWords);

136

```

137

138

[Hash Functions and Bitset Operations](./hash-bitset-operations.md)

139

140

### Data Types and Utilities

141

142

Specialized data types including calendar intervals, variant values, and utility classes for date/time operations and collation support.

143

144

```java { .api }

145

public CalendarInterval(int months, int days, long microseconds);

146

public VariantVal(byte[] value, byte[] metadata);

147

public String toJson(ZoneId zoneId);

148

public static UTF8String getCollationKey(UTF8String input, int collationId);

149

public static boolean isCaseInsensitive(int collationId);

150

```

151

152

[Data Types and Utilities](./data-types-utilities.md)