0
# Spark Hive Thrift Server
1
2
Spark Hive Thrift Server provides a Thrift-based JDBC/ODBC interface for Spark SQL, making it compatible with HiveServer2 clients. It enables remote access to Spark SQL through standard database connectivity protocols, allowing users to connect using JDBC drivers and execute SQL queries against Spark datasets and tables.
3
4
The server implements the HiveServer2 thrift interface but uses Spark SQL as the execution engine instead of Hive, providing better performance and broader data source support. It includes support for concurrent sessions, query execution management, and a web UI for monitoring active connections and queries.
5
6
## Package Information
7
8
- **Package Name**: spark-hive-thriftserver_2.11
9
- **Package Type**: Maven
10
- **Language**: Scala
11
- **Group ID**: org.apache.spark
12
- **Version**: 1.6.3
13
- **Installation**: Add Maven dependency or use pre-built Spark distribution
14
15
## Core Imports
16
17
```scala
18
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
19
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
20
import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv
21
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIService
22
import org.apache.spark.sql.hive.thriftserver.ReflectionUtils
23
import org.apache.spark.sql.hive.HiveContext
24
```
25
26
## Basic Usage
27
28
### Server Mode
29
```scala
30
// Start thrift server standalone
31
object MyThriftServer extends App {
32
HiveThriftServer2.main(args)
33
}
34
35
// Or programmatically with existing context
36
import org.apache.spark.sql.hive.HiveContext
37
38
val hiveContext = new HiveContext(sparkContext)
39
HiveThriftServer2.startWithContext(hiveContext)
40
```
41
42
### CLI Mode
43
```scala
44
// Start interactive SQL CLI
45
object MySQLCLI extends App {
46
SparkSQLCLIDriver.main(args)
47
}
48
```
49
50
### Environment Setup
51
```scala
52
// Initialize Spark SQL environment
53
SparkSQLEnv.init()
54
55
// Access shared contexts
56
val sparkContext = SparkSQLEnv.sparkContext
57
val hiveContext = SparkSQLEnv.hiveContext
58
59
// Clean shutdown
60
SparkSQLEnv.stop()
61
```
62
63
## Architecture
64
65
The Spark Hive Thrift Server is built around several key components:
66
67
- **Server Entry Points**: `HiveThriftServer2` and `SparkSQLCLIDriver` provide main application entry points for server and CLI modes
68
- **Environment Management**: `SparkSQLEnv` manages shared Spark and Hive contexts with optimal configurations
69
- **Session Management**: `SparkSQLSessionManager` handles client session lifecycle and isolation
70
- **Query Execution**: `SparkExecuteStatementOperation` and `SparkSQLDriver` process SQL statements and manage results
71
- **Service Layer**: `SparkSQLCLIService` implements the Thrift service interface compatible with HiveServer2
72
- **Web UI Integration**: Monitoring and statistics through Spark's web UI with dedicated JDBC/ODBC server tab
73
- **Reflection Utilities**: `ReflectionUtils` provides compatibility layer for Hive integration
74
75
## Capabilities
76
77
### Server Management
78
79
Core server lifecycle management including startup, configuration, and shutdown operations.
80
81
```scala { .api }
82
object HiveThriftServer2 {
83
def main(args: Array[String]): Unit
84
85
@DeveloperApi
86
def startWithContext(sqlContext: HiveContext): Unit
87
88
var LOG: Log
89
var uiTab: Option[ThriftServerTab]
90
var listener: HiveThriftServer2Listener
91
}
92
```
93
94
[Server Management](./server-management.md)
95
96
### CLI Interface
97
98
Interactive command-line interface for executing SQL queries with Hive CLI compatibility.
99
100
```scala { .api }
101
object SparkSQLCLIDriver {
102
def main(args: Array[String]): Unit
103
def installSignalHandler(): Unit
104
}
105
106
private[hive] class SparkSQLCLIDriver extends CliDriver {
107
override def processCmd(cmd: String): Int
108
}
109
```
110
111
[CLI Interface](./cli-interface.md)
112
113
### Environment Management
114
115
Centralized management of Spark and Hive execution contexts with optimized configurations.
116
117
```scala { .api }
118
object SparkSQLEnv {
119
var hiveContext: HiveContext
120
var sparkContext: SparkContext
121
122
def init(): Unit
123
def stop(): Unit
124
}
125
```
126
127
[Environment Management](./environment-management.md)
128
129
### Session Management
130
131
Client session lifecycle management with isolation and resource cleanup.
132
133
```scala { .api }
134
private[hive] class SparkSQLSessionManager(
135
hiveServer: HiveServer2,
136
hiveContext: HiveContext
137
) extends SessionManager {
138
override def openSession(...): SessionHandle
139
override def closeSession(sessionHandle: SessionHandle): Unit
140
}
141
```
142
143
[Session Management](./session-management.md)
144
145
### Query Execution
146
147
SQL statement execution with result management and schema introspection.
148
149
```scala { .api }
150
private[hive] class SparkExecuteStatementOperation(
151
parentSession: HiveSession,
152
statement: String,
153
confOverlay: JMap[String, String],
154
runInBackground: Boolean
155
) extends ExecuteStatementOperation {
156
def close(): Unit
157
def getNextRowSet(order: FetchOrientation, maxRowsL: Long): RowSet
158
def getResultSetSchema: TableSchema
159
def cancel(): Unit
160
}
161
162
private[hive] class SparkSQLDriver(
163
context: HiveContext = SparkSQLEnv.hiveContext
164
) extends Driver {
165
def init(): Unit
166
def run(command: String): CommandProcessorResponse
167
def close(): Int
168
def getResults(res: JList[_]): Boolean
169
def getSchema: Schema
170
def destroy(): Unit
171
}
172
```
173
174
[Query Execution](./query-execution.md)
175
176
### Monitoring and UI
177
178
Web-based monitoring interface with session tracking and query statistics.
179
180
```scala { .api }
181
private[thriftserver] class HiveThriftServer2Listener(
182
server: HiveServer2,
183
conf: SQLConf
184
) extends SparkListener {
185
def getOnlineSessionNum: Int
186
def getTotalRunning: Int
187
def getSessionList: Seq[SessionInfo]
188
def getSession(sessionId: String): Option[SessionInfo]
189
def getExecutionList: Seq[ExecutionInfo]
190
}
191
192
private[thriftserver] class ThriftServerTab(
193
sparkContext: SparkContext
194
) extends SparkUITab {
195
def detach(): Unit
196
}
197
```
198
199
[Monitoring and UI](./monitoring-ui.md)
200
201
### Service Layer Integration
202
203
Core Thrift service implementation providing HiveServer2 compatibility layer.
204
205
```scala { .api }
206
private[hive] class SparkSQLCLIService(
207
hiveServer: HiveServer2,
208
hiveContext: HiveContext
209
) extends CLIService(hiveServer) {
210
override def init(hiveConf: HiveConf): Unit
211
override def start(): Unit
212
override def stop(): Unit
213
}
214
```
215
216
### Reflection Utilities
217
218
Utility methods for accessing private fields and methods in Hive classes for compatibility.
219
220
```scala { .api }
221
private[hive] object ReflectionUtils {
222
def setSuperField(obj: Object, fieldName: String, fieldValue: Object): Unit
223
def setAncestorField(obj: AnyRef, level: Int, fieldName: String, fieldValue: AnyRef): Unit
224
def getSuperField[T](obj: AnyRef, fieldName: String): T
225
def getAncestorField[T](clazz: Object, level: Int, fieldName: String): T
226
def invokeStatic(clazz: Class[_], methodName: String, args: (Class[_], AnyRef)*): AnyRef
227
def invoke(clazz: Class[_], obj: AnyRef, methodName: String, args: (Class[_], AnyRef)*): AnyRef
228
}
229
```
230
231
## Configuration
232
233
### Spark Configuration Properties
234
- `spark.app.name` - Application name (default: "SparkSQL::{hostname}")
235
- `spark.serializer` - Serializer class (default: KryoSerializer)
236
- `spark.kryo.referenceTracking` - Kryo reference tracking (default: false)
237
- `spark.ui.enabled` - Enable Spark Web UI (default: true)
238
239
### Hive Server Configuration Properties
240
- `hive.server2.transport.mode` - Transport mode ("binary" or "http")
241
- `hive.server2.async.exec.threads` - Background execution thread pool size
242
- `hive.server2.logging.operation.enabled` - Enable operation logging
243
244
### SQL Configuration Properties
245
- `SQLConf.THRIFTSERVER_POOL.key` - Scheduler pool for query execution
246
- `SQLConf.THRIFTSERVER_UI_STATEMENT_LIMIT` - Maximum statements retained in UI
247
- `SQLConf.THRIFTSERVER_UI_SESSION_LIMIT` - Maximum sessions retained in UI
248
249
## Common Types
250
251
```scala { .api }
252
private[thriftserver] class SessionInfo(
253
val sessionId: String,
254
val startTimestamp: Long,
255
val ip: String,
256
val userName: String
257
) {
258
var finishTimestamp: Long
259
var totalExecution: Int
260
def totalTime: Long
261
}
262
263
private[thriftserver] class ExecutionInfo(
264
val statement: String,
265
val sessionId: String,
266
val startTimestamp: Long,
267
val userName: String
268
) {
269
var finishTimestamp: Long
270
var executePlan: String
271
var detail: String
272
var state: ExecutionState.Value
273
val jobId: ArrayBuffer[String]
274
var groupId: String
275
def totalTime: Long
276
}
277
278
private[thriftserver] object ExecutionState extends Enumeration {
279
val STARTED, COMPILED, FAILED, FINISHED = Value
280
type ExecutionState = Value
281
}
282
```