Spark Project Hive Thrift Server - A Thrift server implementation that provides JDBC/ODBC access to Spark SQL
npx @tessl/cli install tessl/maven-org-apache-spark--spark-hive-thriftserver_2-11@2.4.00
# Spark Hive Thrift Server
1
2
Apache Spark Hive Thrift Server provides JDBC/ODBC access to Spark SQL through the HiveServer2 protocol, enabling remote clients to execute SQL queries against Spark clusters using standard database connectivity tools and BI applications.
3
4
## Package Information
5
6
- **Package Name**: spark-hive-thriftserver_2.11
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Artifact ID**: org.apache.spark:spark-hive-thriftserver_2.11:2.4.8
10
- **Installation**: Include as Maven dependency or part of Spark distribution
11
12
## Core Imports
13
14
```scala
15
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
16
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
17
import org.apache.spark.sql.SQLContext
18
```
19
20
## Basic Usage
21
22
### Starting the Thrift Server Programmatically
23
24
```scala
25
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
26
import org.apache.spark.sql.SQLContext
27
import org.apache.spark.SparkContext
28
import org.apache.spark.SparkConf
29
30
// Create Spark SQL context
31
val conf = new SparkConf().setAppName("ThriftServer")
32
val sc = new SparkContext(conf)
33
val sqlContext = new SQLContext(sc)
34
35
// Start the thrift server
36
HiveThriftServer2.startWithContext(sqlContext)
37
```
38
39
### Starting from Command Line
40
41
```bash
42
# Start Thrift Server
43
$SPARK_HOME/sbin/start-thriftserver.sh --master spark://master:7077
44
45
# Start CLI
46
$SPARK_HOME/bin/spark-sql
47
```
48
49
## Architecture
50
51
The Spark Hive Thrift Server consists of several key components:
52
53
- **HiveThriftServer2**: Main server entry point and lifecycle management
54
- **Service Layer**: CLI service, session management, and operation handling
55
- **Transport Layer**: HTTP and binary Thrift protocol support
56
- **Web UI**: Monitoring interface for sessions and queries
57
- **Authentication**: Kerberos and delegation token support
58
59
## Capabilities
60
61
### Server Management
62
63
Main entry points for starting and managing the Thrift Server with lifecycle control and configuration.
64
65
```scala { .api }
66
object HiveThriftServer2 {
67
def startWithContext(sqlContext: SQLContext): Unit
68
def main(args: Array[String]): Unit
69
var uiTab: Option[ThriftServerTab]
70
var listener: HiveThriftServer2Listener
71
}
72
```
73
74
[Server Management](./server-management.md)
75
76
### CLI Interface
77
78
Command-line interface for interactive SQL execution with Spark SQL integration.
79
80
```scala { .api }
81
object SparkSQLCLIDriver {
82
def main(args: Array[String]): Unit
83
def installSignalHandler(): Unit
84
}
85
```
86
87
[CLI Interface](./cli-interface.md)
88
89
### Session Management
90
91
Session lifecycle management with SQL context handling and client connection management.
92
93
```scala { .api }
94
class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext: SQLContext) extends SessionManager {
95
def openSession(protocol: TProtocolVersion, username: String, passwd: String,
96
ipAddress: String, sessionConf: java.util.Map[String, String],
97
withImpersonation: Boolean, delegationToken: String): SessionHandle
98
def closeSession(sessionHandle: SessionHandle): Unit
99
}
100
```
101
102
[Session Management](./session-management.md)
103
104
### Query Operations
105
106
SQL query execution operations with result handling and asynchronous processing support.
107
108
```scala { .api }
109
class SparkSQLOperationManager extends OperationManager {
110
val sessionToActivePool: ConcurrentHashMap[SessionHandle, String]
111
val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]
112
def newExecuteStatementOperation(parentSession: HiveSession, statement: String,
113
confOverlay: JMap[String, String], async: Boolean): ExecuteStatementOperation
114
}
115
```
116
117
[Query Operations](./query-operations.md)
118
119
### Web UI Monitoring
120
121
Web-based monitoring interface for active sessions, query execution, and server performance metrics.
122
123
```scala { .api }
124
class ThriftServerTab(sparkContext: SparkContext) extends SparkUITab {
125
val name: String = "JDBC/ODBC Server"
126
def detach(): Unit
127
}
128
```
129
130
[Web UI Monitoring](./web-ui-monitoring.md)
131
132
### Environment Management
133
134
Spark SQL environment initialization and cleanup with configuration management.
135
136
```scala { .api }
137
object SparkSQLEnv {
138
var sqlContext: SQLContext
139
var sparkContext: SparkContext
140
def init(): Unit
141
def stop(): Unit
142
}
143
```
144
145
[Environment Management](./environment-management.md)
146
147
## Types
148
149
### Core Types
150
151
```scala { .api }
152
// Session information tracking
153
class SessionInfo(sessionId: String, startTimestamp: Long, ip: String, userName: String) {
154
var finishTimestamp: Long
155
var totalExecution: Int
156
def totalTime: Long
157
}
158
159
// Query execution tracking
160
class ExecutionInfo(statement: String, sessionId: String, startTimestamp: Long, userName: String) {
161
var finishTimestamp: Long
162
var executePlan: String
163
var detail: String
164
var state: ExecutionState.Value
165
val jobId: ArrayBuffer[String]
166
var groupId: String
167
def totalTime: Long
168
}
169
170
// Execution states
171
object ExecutionState extends Enumeration {
172
val STARTED, COMPILED, FAILED, FINISHED = Value
173
type ExecutionState = Value
174
}
175
176
// Server listener for events
177
class HiveThriftServer2Listener(server: HiveServer2, conf: SQLConf) extends SparkListener {
178
def getOnlineSessionNum: Int
179
def getTotalRunning: Int
180
def getSessionList: Seq[SessionInfo]
181
def getSession(sessionId: String): Option[SessionInfo]
182
def getExecutionList: Seq[ExecutionInfo]
183
}
184
```
185
186
### Hive Integration Types
187
188
```scala { .api }
189
// From Hive Service API
190
import org.apache.hive.service.cli.SessionHandle
191
import org.apache.hive.service.cli.OperationHandle
192
import org.apache.hive.service.cli.thrift.TProtocolVersion
193
import org.apache.hive.service.server.HiveServer2
194
import org.apache.hadoop.hive.conf.HiveConf
195
```
196
197
## Configuration
198
199
### Transport Modes
200
201
- **Binary**: Default TCP transport using Thrift binary protocol
202
- **HTTP**: HTTP-based transport for firewall-friendly connections
203
204
### Authentication
205
206
- **Kerberos**: Enterprise authentication with keytab support
207
- **SPNEGO**: HTTP authentication for web-based access
208
- **Delegation Tokens**: Secure token-based authentication
209
210
### Key Configuration Properties
211
212
- `hive.server2.transport.mode`: "binary" or "http"
213
- `hive.server2.thrift.port`: Server port (default: 10000)
214
- `hive.server2.thrift.bind.host`: Bind address
215
- `spark.sql.hive.thriftServer.singleSession`: Share single session
216
- `spark.sql.thriftServer.incrementalCollect`: Incremental result collection