0
# Apache Spark Hive Thrift Server
1
2
Apache Spark Hive Thrift Server provides JDBC and ODBC access to Apache Spark SQL through a distributed SQL engine that is compatible with the HiveServer2 protocol. It enables external applications to connect to Spark clusters and execute SQL queries using standard database connectivity protocols, while leveraging Spark's distributed computing capabilities for high-performance big data workloads.
3
4
## Package Information
5
6
- **Package Name**: spark-hive-thriftserver_2.12
7
- **Package Type**: Maven
8
- **Language**: Scala (with Java support)
9
- **Group ID**: org.apache.spark
10
- **Version**: 2.4.8
11
- **Installation**: Maven dependency or part of Spark distribution
12
13
## Core Imports
14
15
```scala
16
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
17
import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv
18
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
19
import org.apache.spark.sql.SQLContext
20
```
21
22
For Java usage:
23
```java
24
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2;
25
import org.apache.spark.sql.SQLContext;
26
```
27
28
## Basic Usage
29
30
### Starting the Thrift Server Programmatically
31
32
```scala
33
import org.apache.spark.sql.hive.thriftserver.{HiveThriftServer2, SparkSQLEnv}
34
35
// Initialize the Spark SQL environment
36
SparkSQLEnv.init()
37
38
// Start the thrift server with the current SQL context
39
HiveThriftServer2.startWithContext(SparkSQLEnv.sqlContext)
40
```
41
42
### Running as Standalone Server
43
44
```bash
45
# Start the thrift server using the main class
46
spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 \
47
spark-hive-thriftserver_2.12-2.4.8.jar
48
```
49
50
### CLI Access
51
52
```bash
53
# Start the Spark SQL CLI
54
spark-submit --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver \
55
spark-hive-thriftserver_2.12-2.4.8.jar
56
```
57
58
## Architecture
59
60
The Spark Hive Thrift Server is built around several key components:
61
62
- **Server Core**: `HiveThriftServer2` provides the main server functionality and lifecycle management
63
- **Environment Management**: `SparkSQLEnv` manages the Spark and SQL contexts as singletons
64
- **CLI Interface**: `SparkSQLCLIDriver` provides command-line interface similar to Hive CLI
65
- **Session Management**: Handles client sessions with authentication and authorization
66
- **Query Execution**: Executes SQL statements through Spark SQL with result streaming
67
- **Thrift Protocol**: Implements HiveServer2 protocol for JDBC/ODBC compatibility
68
- **Web UI Integration**: Provides monitoring and management through Spark's web interface
69
70
## Capabilities
71
72
### Server Management
73
74
Core server lifecycle management including startup, configuration, and shutdown operations for the Thrift server.
75
76
```scala { .api }
77
object HiveThriftServer2 {
78
def startWithContext(sqlContext: SQLContext): Unit
79
def main(args: Array[String]): Unit
80
var uiTab: Option[ThriftServerTab]
81
var listener: HiveThriftServer2Listener
82
}
83
```
84
85
[Server Management](./server-management.md)
86
87
### Environment Management
88
89
Singleton environment management for Spark and SQL contexts, providing initialization and cleanup operations.
90
91
```scala { .api }
92
object SparkSQLEnv {
93
var sqlContext: SQLContext
94
var sparkContext: SparkContext
95
def init(): Unit
96
def stop(): Unit
97
}
98
```
99
100
[Environment Management](./environment-management.md)
101
102
### Service Management
103
104
Core service layer components providing CLI service implementation and composite service functionality.
105
106
```scala { .api }
107
class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLContext)
108
extends CLIService(hiveServer) with ReflectedCompositeService {
109
def init(hiveConf: HiveConf): Unit
110
def getInfo(sessionHandle: SessionHandle, getInfoType: GetInfoType): GetInfoValue
111
}
112
113
trait ReflectedCompositeService {
114
def initCompositeService(hiveConf: HiveConf): Unit
115
}
116
```
117
118
[Service Management](./service-management.md)
119
120
### CLI Operations
121
122
Command-line interface functionality providing interactive SQL execution and Hive-compatible CLI operations.
123
124
```scala { .api }
125
object SparkSQLCLIDriver {
126
def main(args: Array[String]): Unit
127
def installSignalHandler(): Unit
128
def isRemoteMode(state: CliSessionState): Boolean
129
}
130
131
class SparkSQLCLIDriver extends CliDriver {
132
def setHiveVariables(hiveVariables: java.util.Map[String, String]): Unit
133
def printMasterAndAppId(): Unit
134
def processCmd(cmd: String): Int
135
}
136
```
137
138
[CLI Operations](./cli-operations.md)
139
140
### SQL Execution
141
142
SQL statement execution engine with result processing, schema management, and query lifecycle handling.
143
144
```scala { .api }
145
class SparkSQLDriver(context: SQLContext = SparkSQLEnv.sqlContext) extends Driver {
146
def init(): Unit
147
def run(command: String): CommandProcessorResponse
148
def close(): Int
149
def getResults(res: JList[_]): Boolean
150
def getSchema: Schema
151
def destroy(): Unit
152
}
153
```
154
155
[SQL Execution](./sql-execution.md)
156
157
### Session Management
158
159
Client session lifecycle management with authentication, authorization, and per-session SQL context handling.
160
161
```scala { .api }
162
class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext: SQLContext)
163
extends SessionManager {
164
def init(hiveConf: HiveConf): Unit
165
def openSession(
166
protocol: TProtocolVersion,
167
username: String,
168
passwd: String,
169
ipAddress: String,
170
sessionConf: java.util.Map[String, String],
171
withImpersonation: Boolean,
172
delegationToken: String
173
): SessionHandle
174
def closeSession(sessionHandle: SessionHandle): Unit
175
}
176
```
177
178
[Session Management](./session-management.md)
179
180
### Operation Management
181
182
Management of SQL operations including statement execution, result streaming, and operation lifecycle tracking.
183
184
```scala { .api }
185
class SparkSQLOperationManager extends OperationManager {
186
val handleToOperation: JMap[OperationHandle, Operation]
187
val sessionToActivePool: ConcurrentHashMap[SessionHandle, String]
188
val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]
189
190
def newExecuteStatementOperation(
191
parentSession: HiveSession,
192
statement: String,
193
confOverlay: JMap[String, String],
194
async: Boolean
195
): ExecuteStatementOperation
196
def setConfMap(conf: SQLConf, confMap: java.util.Map[String, String]): Unit
197
}
198
```
199
200
[Operation Management](./operation-management.md)
201
202
### UI Components
203
204
Web UI integration providing monitoring and management interface for Thrift server operations and sessions.
205
206
```scala { .api }
207
class ThriftServerTab(sparkContext: SparkContext) extends SparkUITab {
208
override val name: String
209
val parent: SparkUI
210
val listener: HiveThriftServer2Listener
211
def detach(): Unit
212
}
213
214
class ThriftServerPage(parent: ThriftServerTab) extends WebUIPage
215
class ThriftServerSessionPage(parent: ThriftServerTab) extends WebUIPage
216
```
217
218
[UI Components](./ui-components.md)
219
220
## Types
221
222
### Core Server Types
223
224
```scala { .api }
225
// Server lifecycle management
226
class HiveThriftServer2(sqlContext: SQLContext) extends HiveServer2 with ReflectedCompositeService {
227
def init(hiveConf: HiveConf): Unit
228
def start(): Unit
229
def stop(): Unit
230
}
231
232
// Session information tracking
233
class SessionInfo(
234
sessionId: String,
235
startTimestamp: Long,
236
ip: String,
237
userName: String
238
) {
239
var finishTimestamp: Long
240
var totalExecution: Int
241
def totalTime: Long
242
}
243
244
// Execution state enumeration
245
object ExecutionState extends Enumeration {
246
val STARTED, COMPILED, FAILED, FINISHED = Value
247
type ExecutionState = Value
248
}
249
250
// Execution information tracking
251
class ExecutionInfo(
252
statement: String,
253
sessionId: String,
254
startTimestamp: Long,
255
userName: String
256
) {
257
var finishTimestamp: Long
258
var executePlan: String
259
var detail: String
260
var state: ExecutionState.Value
261
val jobId: ArrayBuffer[String]
262
var groupId: String
263
def totalTime: Long
264
}
265
```