Tessl Tile for maven/org.apache.spark/spark-sql-api_2.12@3.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

data-types.md encoders.md error-handling.md index.md row-operations.md streaming-operations.md utilities.md

index.mddocs/

0
# Apache Spark SQL API
1

2
Apache Spark SQL API provides the core SQL data types, row representations, and foundational APIs for Spark SQL operations. This library serves as the foundation for DataFrame and Dataset operations, SQL query execution, and structured streaming in Apache Spark's distributed computing framework.
3

4
## Package Information
5

6
- **Package Name**: org.apache.spark:spark-sql-api_2.12
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: `maven: org.apache.spark:spark-sql-api_2.12:3.5.6`
10

11
## Core Imports
12

13
```scala
14
// Core row and type imports
15
import org.apache.spark.sql.Row
16
import org.apache.spark.sql.types._
17

18
// Streaming state management
19
import org.apache.spark.sql.streaming.GroupState
20

21
// Encoding support
22
import org.apache.spark.sql.Encoder
23

24
// Error handling
25
import org.apache.spark.sql.AnalysisException
26
```
27

28
## Basic Usage
29

30
```scala
31
import org.apache.spark.sql.{Row, AnalysisException}
32
import org.apache.spark.sql.types._
33

34
// Create a schema for structured data
35
val schema = StructType(Array(
36
  StructField("name", StringType, nullable = false),
37
  StructField("age", IntegerType, nullable = false),
38
  StructField("salary", DecimalType(10, 2), nullable = true)
39
))
40

41
// Create rows of data
42
val row1 = Row("Alice", 25, BigDecimal("55000.00"))
43
val row2 = Row.fromSeq(Seq("Bob", 30, BigDecimal("65000.00")))
44

45
// Access row data
46
val name: String = row1.getAs[String]("name")
47
val age: Int = row1.getInt(1)
48
val hasNullSalary: Boolean = row1.isNullAt(2)
49

50
// Work with complex data types
51
val arrayType = ArrayType(StringType, containsNull = true)
52
val mapType = MapType(StringType, IntegerType, valueContainsNull = false)
53
val nestedSchema = StructType(Array(
54
  StructField("addresses", arrayType, nullable = true),
55
  StructField("scores", mapType, nullable = false)
56
))
57
```
58

59
## Architecture
60

61
The Spark SQL API is built around several key components:
62

63
- **Type System**: Comprehensive data type hierarchy supporting primitives, collections, and user-defined types
64
- **Row Interface**: Structured data representation with type-safe access methods
65
- **Schema Management**: Dynamic schema creation, validation, and evolution support
66
- **Streaming State**: Stateful operations for complex streaming analytics
67
- **Encoding Framework**: Type-safe conversion between JVM objects and Spark SQL representations
68
- **Error Handling**: Structured exception hierarchy with detailed error reporting
69

70
## Capabilities
71

72
### Core Data Types
73

74
Comprehensive type system including primitives, collections, and complex nested structures. Essential for defining schemas and working with structured data.
75

76
```scala { .api }
77
// Base type hierarchy
78
abstract class DataType extends AbstractDataType
79
abstract class AbstractDataType
80

81
// Primitive types
82
case object StringType extends StringType
83
case object IntegerType extends IntegerType  
84
case object LongType extends LongType
85
case object DoubleType extends DoubleType
86
case object BooleanType extends BooleanType
87

88
// Complex types
89
case class DecimalType(precision: Int, scale: Int) extends FractionalType
90
case class ArrayType(elementType: DataType, containsNull: Boolean) extends DataType
91
case class MapType(keyType: DataType, valueType: DataType, valueContainsNull: Boolean) extends DataType
92
case class StructType(fields: Array[StructField]) extends DataType
93
```
94

95
[Data Types](./data-types.md)
96

97
### Row Operations
98

99
Structured data representation and manipulation with type-safe access methods for distributed data processing.
100

101
```scala { .api }
102
trait Row extends Serializable {
103
  def length: Int
104
  def apply(i: Int): Any
105
  def get(i: Int): Any
106
  def isNullAt(i: Int): Boolean
107
  def getAs[T](i: Int): T
108
  def getAs[T](fieldName: String): T
109
  def getString(i: Int): String
110
  def getInt(i: Int): Int
111
  def getLong(i: Int): Long
112
  def getDouble(i: Int): Double
113
  def getBoolean(i: Int): Boolean
114
}
115

116
object Row {
117
  def apply(values: Any*): Row
118
  def fromSeq(values: Seq[Any]): Row
119
  def fromTuple(tuple: Product): Row
120
}
121
```
122

123
[Row Operations](./row-operations.md)
124

125
### Streaming State Management
126

127
Stateful operations for complex streaming analytics with timeout support and watermark handling.
128

129
```scala { .api }
130
trait GroupState[S] extends LogicalGroupState[S] {
131
  def exists: Boolean
132
  def get: S
133
  def getOption: Option[S]
134
  def update(newState: S): Unit
135
  def remove(): Unit
136
  def hasTimedOut: Boolean
137
  def setTimeoutDuration(durationMs: Long): Unit
138
  def setTimeoutTimestamp(timestampMs: Long): Unit
139
  def getCurrentWatermarkMs(): Long
140
  def getCurrentProcessingTimeMs(): Long
141
}
142
```
143

144
[Streaming Operations](./streaming-operations.md)
145

146
### Encoding Framework
147

148
Type-safe conversion between JVM objects and Spark SQL representations for distributed serialization.
149

150
```scala { .api }
151
trait Encoder[T] extends Serializable {
152
  def schema: StructType
153
  def clsTag: ClassTag[T]
154
}
155

156
trait AgnosticEncoder[T] extends Encoder[T] {
157
  def isPrimitive: Boolean
158
  def nullable: Boolean
159
  def dataType: DataType
160
}
161
```
162

163
[Encoders](./encoders.md)
164

165
### Analysis and Error Handling
166

167
Structured exception handling with detailed error information for query analysis and execution.
168

169
```scala { .api }
170
class AnalysisException(
171
  message: String,
172
  line: Option[Int] = None,
173
  startPosition: Option[Int] = None,
174
  errorClass: Option[String] = None,
175
  messageParameters: Map[String, String] = Map.empty,
176
  context: Array[QueryContext] = Array.empty
177
) extends Exception with SparkThrowable {
178
  def withPosition(origin: Origin): AnalysisException
179
  def getSimpleMessage: String
180
}
181
```
182

183
[Error Handling](./error-handling.md)
184

185
### Utility Functions
186

187
Helper utilities for data type conversions and integrations with external systems.
188

189
```scala { .api }
190
object ArrowUtils {
191
  def toArrowType(dt: DataType, timeZoneId: String, largeVarTypes: Boolean = false): ArrowType
192
  def fromArrowType(dt: ArrowType): DataType
193
  def toArrowSchema(schema: StructType, timeZoneId: String, errorOnDuplicatedFieldNames: Boolean, largeVarTypes: Boolean = false): Schema
194
  def fromArrowSchema(schema: Schema): StructType
195
}
196
```
197

198
[Utilities](./utilities.md)

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/