or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-types.mdencoders.mderror-handling.mdindex.mdrow-operations.mdstreaming-operations.mdutilities.md

index.mddocs/

0

# Apache Spark SQL API

1

2

Apache Spark SQL API provides the core SQL data types, row representations, and foundational APIs for Spark SQL operations. This library serves as the foundation for DataFrame and Dataset operations, SQL query execution, and structured streaming in Apache Spark's distributed computing framework.

3

4

## Package Information

5

6

- **Package Name**: org.apache.spark:spark-sql-api_2.12

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Installation**: `maven: org.apache.spark:spark-sql-api_2.12:3.5.6`

10

11

## Core Imports

12

13

```scala

14

// Core row and type imports

15

import org.apache.spark.sql.Row

16

import org.apache.spark.sql.types._

17

18

// Streaming state management

19

import org.apache.spark.sql.streaming.GroupState

20

21

// Encoding support

22

import org.apache.spark.sql.Encoder

23

24

// Error handling

25

import org.apache.spark.sql.AnalysisException

26

```

27

28

## Basic Usage

29

30

```scala

31

import org.apache.spark.sql.{Row, AnalysisException}

32

import org.apache.spark.sql.types._

33

34

// Create a schema for structured data

35

val schema = StructType(Array(

36

StructField("name", StringType, nullable = false),

37

StructField("age", IntegerType, nullable = false),

38

StructField("salary", DecimalType(10, 2), nullable = true)

39

))

40

41

// Create rows of data

42

val row1 = Row("Alice", 25, BigDecimal("55000.00"))

43

val row2 = Row.fromSeq(Seq("Bob", 30, BigDecimal("65000.00")))

44

45

// Access row data

46

val name: String = row1.getAs[String]("name")

47

val age: Int = row1.getInt(1)

48

val hasNullSalary: Boolean = row1.isNullAt(2)

49

50

// Work with complex data types

51

val arrayType = ArrayType(StringType, containsNull = true)

52

val mapType = MapType(StringType, IntegerType, valueContainsNull = false)

53

val nestedSchema = StructType(Array(

54

StructField("addresses", arrayType, nullable = true),

55

StructField("scores", mapType, nullable = false)

56

))

57

```

58

59

## Architecture

60

61

The Spark SQL API is built around several key components:

62

63

- **Type System**: Comprehensive data type hierarchy supporting primitives, collections, and user-defined types

64

- **Row Interface**: Structured data representation with type-safe access methods

65

- **Schema Management**: Dynamic schema creation, validation, and evolution support

66

- **Streaming State**: Stateful operations for complex streaming analytics

67

- **Encoding Framework**: Type-safe conversion between JVM objects and Spark SQL representations

68

- **Error Handling**: Structured exception hierarchy with detailed error reporting

69

70

## Capabilities

71

72

### Core Data Types

73

74

Comprehensive type system including primitives, collections, and complex nested structures. Essential for defining schemas and working with structured data.

75

76

```scala { .api }

77

// Base type hierarchy

78

abstract class DataType extends AbstractDataType

79

abstract class AbstractDataType

80

81

// Primitive types

82

case object StringType extends StringType

83

case object IntegerType extends IntegerType

84

case object LongType extends LongType

85

case object DoubleType extends DoubleType

86

case object BooleanType extends BooleanType

87

88

// Complex types

89

case class DecimalType(precision: Int, scale: Int) extends FractionalType

90

case class ArrayType(elementType: DataType, containsNull: Boolean) extends DataType

91

case class MapType(keyType: DataType, valueType: DataType, valueContainsNull: Boolean) extends DataType

92

case class StructType(fields: Array[StructField]) extends DataType

93

```

94

95

[Data Types](./data-types.md)

96

97

### Row Operations

98

99

Structured data representation and manipulation with type-safe access methods for distributed data processing.

100

101

```scala { .api }

102

trait Row extends Serializable {

103

def length: Int

104

def apply(i: Int): Any

105

def get(i: Int): Any

106

def isNullAt(i: Int): Boolean

107

def getAs[T](i: Int): T

108

def getAs[T](fieldName: String): T

109

def getString(i: Int): String

110

def getInt(i: Int): Int

111

def getLong(i: Int): Long

112

def getDouble(i: Int): Double

113

def getBoolean(i: Int): Boolean

114

}

115

116

object Row {

117

def apply(values: Any*): Row

118

def fromSeq(values: Seq[Any]): Row

119

def fromTuple(tuple: Product): Row

120

}

121

```

122

123

[Row Operations](./row-operations.md)

124

125

### Streaming State Management

126

127

Stateful operations for complex streaming analytics with timeout support and watermark handling.

128

129

```scala { .api }

130

trait GroupState[S] extends LogicalGroupState[S] {

131

def exists: Boolean

132

def get: S

133

def getOption: Option[S]

134

def update(newState: S): Unit

135

def remove(): Unit

136

def hasTimedOut: Boolean

137

def setTimeoutDuration(durationMs: Long): Unit

138

def setTimeoutTimestamp(timestampMs: Long): Unit

139

def getCurrentWatermarkMs(): Long

140

def getCurrentProcessingTimeMs(): Long

141

}

142

```

143

144

[Streaming Operations](./streaming-operations.md)

145

146

### Encoding Framework

147

148

Type-safe conversion between JVM objects and Spark SQL representations for distributed serialization.

149

150

```scala { .api }

151

trait Encoder[T] extends Serializable {

152

def schema: StructType

153

def clsTag: ClassTag[T]

154

}

155

156

trait AgnosticEncoder[T] extends Encoder[T] {

157

def isPrimitive: Boolean

158

def nullable: Boolean

159

def dataType: DataType

160

}

161

```

162

163

[Encoders](./encoders.md)

164

165

### Analysis and Error Handling

166

167

Structured exception handling with detailed error information for query analysis and execution.

168

169

```scala { .api }

170

class AnalysisException(

171

message: String,

172

line: Option[Int] = None,

173

startPosition: Option[Int] = None,

174

errorClass: Option[String] = None,

175

messageParameters: Map[String, String] = Map.empty,

176

context: Array[QueryContext] = Array.empty

177

) extends Exception with SparkThrowable {

178

def withPosition(origin: Origin): AnalysisException

179

def getSimpleMessage: String

180

}

181

```

182

183

[Error Handling](./error-handling.md)

184

185

### Utility Functions

186

187

Helper utilities for data type conversions and integrations with external systems.

188

189

```scala { .api }

190

object ArrowUtils {

191

def toArrowType(dt: DataType, timeZoneId: String, largeVarTypes: Boolean = false): ArrowType

192

def fromArrowType(dt: ArrowType): DataType

193

def toArrowSchema(schema: StructType, timeZoneId: String, errorOnDuplicatedFieldNames: Boolean, largeVarTypes: Boolean = false): Schema

194

def fromArrowSchema(schema: Schema): StructType

195

}

196

```

197

198

[Utilities](./utilities.md)