Hive-compatible Thrift server for Spark SQL that enables JDBC/ODBC connectivity
npx @tessl/cli install tessl/maven-org-apache-spark--spark-hive-thriftserver-2-12@3.0.00
# Spark Hive Thrift Server
1
2
The Spark Hive Thrift Server is a component that provides JDBC/ODBC connectivity to Spark SQL through the HiveServer2 protocol. It enables SQL clients to connect to Spark using standard database connectivity protocols while maintaining compatibility with existing Hive-based tools and applications.
3
4
## Package Information
5
6
- **Package Name**: spark-hive-thriftserver_2.12
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: Include as a dependency in your Spark application or use the pre-built server
10
11
```xml
12
<dependency>
13
<groupId>org.apache.spark</groupId>
14
<artifactId>spark-hive-thriftserver_2.12</artifactId>
15
<version>3.0.1</version>
16
</dependency>
17
```
18
19
## Core Imports
20
21
```scala
22
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
23
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
24
import org.apache.spark.sql.SQLContext
25
```
26
27
## Basic Usage
28
29
### Starting the Thrift Server Programmatically
30
31
```scala
32
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
33
import org.apache.spark.sql.SparkSession
34
35
// Create Spark session with Hive support
36
val spark = SparkSession.builder()
37
.appName("ThriftServerExample")
38
.enableHiveSupport()
39
.getOrCreate()
40
41
// Start the thrift server with the SQL context
42
val server = HiveThriftServer2.startWithContext(spark.sqlContext)
43
```
44
45
### Using the CLI Driver
46
47
```scala
48
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
49
50
// Launch interactive SQL CLI
51
SparkSQLCLIDriver.main(Array("--conf", "spark.sql.warehouse.dir=/tmp/warehouse"))
52
```
53
54
### Command Line Usage
55
56
```bash
57
# Start the thrift server
58
./bin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10000
59
60
# Use beeline to connect
61
./bin/beeline -u jdbc:hive2://localhost:10000
62
```
63
64
## Architecture
65
66
The Spark Hive Thrift Server consists of several key components:
67
68
- **Server Components**: Core thrift server implementation and CLI driver for different usage modes
69
- **Service Layer**: Thrift protocol implementation providing HiveServer2-compatible interface
70
- **Session Management**: Multi-session support with isolated SQL contexts and configuration management
71
- **Operation Management**: SQL execution engine with support for DDL, DML, and metadata operations
72
- **Authentication**: Multiple authentication mechanisms including Kerberos, LDAP, and custom providers
73
- **Web UI**: Built-in monitoring interface for tracking sessions, queries, and performance metrics
74
75
## Capabilities
76
77
### Server Management
78
79
Core server lifecycle management including startup, configuration, and shutdown operations.
80
81
```scala { .api }
82
object HiveThriftServer2 {
83
def main(args: Array[String]): Unit
84
def startWithContext(sqlContext: SQLContext): HiveThriftServer2
85
86
var uiTab: Option[ThriftServerTab]
87
var listener: HiveThriftServer2Listener
88
var eventManager: HiveThriftServer2EventManager
89
}
90
91
class HiveThriftServer2(sqlContext: SQLContext) extends HiveServer2 {
92
def init(hiveConf: HiveConf): Unit
93
def start(): Unit
94
def stop(): Unit
95
}
96
```
97
98
[Server Management](./server-management.md)
99
100
### CLI Operations
101
102
Command-line interface for interactive SQL query execution and batch processing.
103
104
```scala { .api }
105
object SparkSQLCLIDriver {
106
def main(args: Array[String]): Unit
107
def installSignalHandler(): Unit
108
def isRemoteMode(state: CliSessionState): Boolean
109
}
110
111
class SparkSQLCLIDriver extends CliDriver {
112
def setHiveVariables(hiveVariables: java.util.Map[String, String]): Unit
113
def printMasterAndAppId(): Unit
114
def processCmd(cmd: String): Int
115
def processLine(line: String, allowInterrupting: Boolean): Int
116
}
117
```
118
119
[CLI Operations](./cli-operations.md)
120
121
### Session Management
122
123
Multi-client session handling with isolated contexts and configuration management.
124
125
```scala { .api }
126
class SparkSQLSessionManager(
127
hiveServer: HiveServer2,
128
sqlContext: SQLContext
129
) extends SessionManager {
130
def init(hiveConf: HiveConf): Unit
131
def openSession(
132
protocol: ThriftserverShimUtils.TProtocolVersion,
133
username: String,
134
passwd: String,
135
ipAddress: String,
136
sessionConf: java.util.Map[String, String],
137
withImpersonation: Boolean,
138
delegationToken: String
139
): SessionHandle
140
def closeSession(sessionHandle: SessionHandle): Unit
141
def setConfMap(conf: SQLContext, confMap: java.util.Map[String, String]): Unit
142
}
143
```
144
145
[Session Management](./session-management.md)
146
147
### SQL Operations
148
149
SQL statement execution and metadata operations for database introspection.
150
151
```scala { .api }
152
class SparkSQLOperationManager extends OperationManager {
153
val handleToOperation: JMap[OperationHandle, Operation]
154
val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]
155
156
def newExecuteStatementOperation(
157
parentSession: HiveSession,
158
statement: String,
159
confOverlay: JMap[String, String],
160
async: Boolean
161
): ExecuteStatementOperation
162
}
163
164
class SparkExecuteStatementOperation(
165
sqlContext: SQLContext,
166
parentSession: HiveSession,
167
statement: String,
168
confOverlay: JMap[String, String],
169
runInBackground: Boolean
170
) extends ExecuteStatementOperation
171
```
172
173
[SQL Operations](./sql-operations.md)
174
175
### Metadata Operations
176
177
Database metadata introspection including catalogs, schemas, tables, columns, and functions.
178
179
```scala { .api }
180
class SparkGetCatalogsOperation(
181
sqlContext: SQLContext,
182
parentSession: HiveSession
183
) extends GetCatalogsOperation
184
185
class SparkGetSchemasOperation(
186
sqlContext: SQLContext,
187
parentSession: HiveSession,
188
catalogName: String,
189
schemaName: String
190
) extends GetSchemasOperation
191
192
class SparkGetTablesOperation(
193
sqlContext: SQLContext,
194
parentSession: HiveSession,
195
catalogName: String,
196
schemaName: String,
197
tableName: String,
198
tableTypes: JList[String]
199
) extends MetadataOperation
200
```
201
202
[Metadata Operations](./metadata-operations.md)
203
204
### Web UI Integration
205
206
Built-in web interface for monitoring active sessions, query execution, and server metrics.
207
208
```scala { .api }
209
class ThriftServerTab(
210
store: HiveThriftServer2AppStatusStore,
211
sparkUI: SparkUI
212
) extends SparkUITab {
213
val name: String
214
def detach(): Unit
215
}
216
217
object ThriftServerTab {
218
def getSparkUI(sparkContext: SparkContext): SparkUI
219
}
220
221
class HiveThriftServer2Listener extends SparkListener
222
class HiveThriftServer2EventManager
223
class HiveThriftServer2AppStatusStore
224
```
225
226
[Web UI Integration](./web-ui.md)
227
228
## Environment Management
229
230
```scala { .api }
231
object SparkSQLEnv {
232
var sqlContext: SQLContext
233
var sparkContext: SparkContext
234
235
def init(): Unit
236
def stop(): Unit
237
}
238
```
239
240
## Service Layer
241
242
```scala { .api }
243
class SparkSQLCLIService(
244
hiveServer: HiveServer2,
245
sqlContext: SQLContext
246
) extends CLIService {
247
def init(hiveConf: HiveConf): Unit
248
def getInfo(sessionHandle: SessionHandle, getInfoType: GetInfoType): GetInfoValue
249
}
250
```
251
252
## Utility Components
253
254
```scala { .api }
255
// Base trait for Spark operations with session management
256
trait SparkOperation extends Operation with Logging {
257
protected def sqlContext: SQLContext
258
protected var statementId: String
259
protected def cleanup(): Unit
260
261
def withLocalProperties[T](f: => T): T
262
def tableTypeString(tableType: CatalogTableType): String
263
}
264
265
// Reflection utilities for internal server operations
266
object ReflectionUtils {
267
def setSuperField(obj: Object, fieldName: String, fieldValue: Object): Unit
268
def setAncestorField(obj: AnyRef, level: Int, fieldName: String, fieldValue: AnyRef): Unit
269
def getSuperField[T](obj: AnyRef, fieldName: String): T
270
def getAncestorField[T](clazz: Object, level: Int, fieldName: String): T
271
def invokeStatic(clazz: Class[_], methodName: String, args: (Class[_], AnyRef)*): AnyRef
272
def invoke(clazz: Class[_], obj: AnyRef, methodName: String, args: (Class[_], AnyRef)*): AnyRef
273
}
274
```
275
276
## Common Types
277
278
```scala { .api }
279
// From Hive libraries
280
import org.apache.hive.service.cli.SessionHandle
281
import org.apache.hive.service.cli.OperationHandle
282
import org.apache.hadoop.hive.conf.HiveConf
283
import org.apache.hive.service.cli.session.HiveSession
284
285
// Execution states
286
object ExecutionState extends Enumeration {
287
val STARTED, COMPILED, CANCELED, FAILED, FINISHED, CLOSED = Value
288
type ExecutionState = Value
289
}
290
291
// Authentication and transport
292
import org.apache.hive.service.cli.thrift.ThriftBinaryCLIService
293
import org.apache.hive.service.cli.thrift.ThriftHttpCLIService
294
```