Apache Spark Hive Thrift Server provides HiveServer2 compatibility for Spark SQL, enabling JDBC/ODBC connectivity and Hive CLI compatibility for Spark SQL queries
npx @tessl/cli install tessl/maven-org-apache-spark--spark-hive-thriftserver_2-12@3.5.00
# Spark Hive Thrift Server
1
2
Apache Spark Hive Thrift Server provides HiveServer2 compatibility for Spark SQL, enabling JDBC/ODBC connectivity and Hive CLI compatibility for Spark SQL queries. It offers a complete thrift-based server implementation with session management, authentication, and comprehensive metadata operations.
3
4
## Package Information
5
6
- **Package Name**: spark-hive-thriftserver_2.12
7
- **Package Type**: maven
8
- **Language**: Scala/Java
9
- **GroupId**: org.apache.spark
10
- **ArtifactId**: spark-hive-thriftserver_2.12
11
- **Version**: 3.5.6
12
- **Installation**: Add to Maven POM or use with Spark distribution
13
14
## Core Imports
15
16
```scala
17
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
18
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
19
import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv
20
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIService
21
import org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager
22
import org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager
23
import org.apache.spark.sql.SQLContext
24
```
25
26
For Java usage (requires Hive dependencies):
27
```java
28
import org.apache.hive.service.cli.ICLIService;
29
import org.apache.hive.service.cli.SessionHandle;
30
import org.apache.hive.service.cli.OperationHandle;
31
```
32
33
**Note**: Many interfaces (`ICLIService`, `SessionHandle`, etc.) are provided by the Apache Hive library, which is included as a dependency of this module.
34
35
## Basic Usage
36
37
### Starting the Thrift Server
38
39
```scala
40
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
41
import org.apache.spark.sql.SQLContext
42
43
// Initialize Spark SQL environment
44
SparkSQLEnv.init()
45
46
// Start the thrift server with SQL context
47
val server = HiveThriftServer2.startWithContext(SparkSQLEnv.sqlContext)
48
```
49
50
### Using the CLI Driver
51
52
```scala
53
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
54
55
// Start interactive SQL CLI
56
SparkSQLCLIDriver.main(Array("--hiveconf", "hive.server2.thrift.port=10000"))
57
```
58
59
### JDBC Connection (from client applications)
60
61
```java
62
// Standard JDBC connection to Spark Thrift Server
63
String url = "jdbc:hive2://localhost:10000/default";
64
Connection conn = DriverManager.getConnection(url, "username", "password");
65
Statement stmt = conn.createStatement();
66
ResultSet rs = stmt.executeQuery("SELECT * FROM my_table");
67
```
68
69
## Architecture
70
71
The Spark Hive Thrift Server is built around several key components:
72
73
- **Server Management**: `HiveThriftServer2` provides the main server lifecycle and initialization
74
- **CLI Services**: `SparkSQLCLIService` implements the core CLI service interface with Spark SQL integration
75
- **Session Management**: `SparkSQLSessionManager` handles client sessions and their associated SQL contexts
76
- **Operation Management**: `SparkSQLOperationManager` creates and manages SQL operations and metadata operations
77
- **SQL Execution**: `SparkSQLDriver` and `SparkExecuteStatementOperation` execute SQL queries using Spark SQL engine
78
- **CLI Interface**: `SparkSQLCLIDriver` provides interactive command-line interface
79
- **Web UI**: Integration with Spark Web UI for monitoring sessions and queries
80
- **Transport Protocols**: Support for both binary Thrift and HTTP transport modes
81
- **Authentication**: Kerberos, SPNEGO, and custom authentication provider support
82
83
## Capabilities
84
85
### Server Management
86
87
Core server lifecycle management and initialization with Spark SQL integration.
88
89
```scala { .api }
90
object HiveThriftServer2 {
91
def startWithContext(sqlContext: SQLContext): HiveThriftServer2
92
def main(args: Array[String]): Unit
93
94
// Note: ExecutionState is private[thriftserver] - not part of public API
95
private[thriftserver] object ExecutionState extends Enumeration {
96
val STARTED, COMPILED, CANCELED, TIMEDOUT, FAILED, FINISHED, CLOSED = Value
97
}
98
}
99
```
100
101
[Server Management](./server-management.md)
102
103
### CLI Services
104
105
Comprehensive CLI service implementation providing HiveServer2 compatibility with Spark SQL enhancements.
106
107
```scala { .api }
108
class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLContext) extends CLIService(hiveServer) {
109
override def init(hiveConf: HiveConf): Unit
110
override def start(): Unit
111
override def getInfo(sessionHandle: SessionHandle, getInfoType: GetInfoType): GetInfoValue
112
}
113
```
114
115
[CLI Services](./cli-services.md)
116
117
### Session Management
118
119
Client session management with SQL context association and configuration handling.
120
121
```scala { .api }
122
class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext: SQLContext) extends SessionManager(hiveServer) {
123
override def openSession(
124
protocol: TProtocolVersion,
125
username: String,
126
passwd: String,
127
ipAddress: String,
128
sessionConf: java.util.Map[String, String],
129
withImpersonation: Boolean,
130
delegationToken: String
131
): SessionHandle
132
override def closeSession(sessionHandle: SessionHandle): Unit
133
def setConfMap(conf: SQLContext, confMap: java.util.Map[String, String]): Unit
134
}
135
```
136
137
[Session Management](./session-management.md)
138
139
### Operation Management
140
141
Manages SQL operations and metadata operations with session context mapping.
142
143
```scala { .api }
144
class SparkSQLOperationManager extends OperationManager {
145
val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]
146
147
override def newExecuteStatementOperation(
148
parentSession: HiveSession,
149
statement: String,
150
confOverlay: java.util.Map[String, String],
151
async: Boolean,
152
queryTimeout: Long
153
): ExecuteStatementOperation
154
155
override def newGetTablesOperation(
156
parentSession: HiveSession,
157
catalogName: String,
158
schemaName: String,
159
tableName: String,
160
tableTypes: java.util.List[String]
161
): MetadataOperation
162
163
override def newGetColumnsOperation(
164
parentSession: HiveSession,
165
catalogName: String,
166
schemaName: String,
167
tableName: String,
168
columnName: String
169
): GetColumnsOperation
170
171
override def newGetSchemasOperation(
172
parentSession: HiveSession,
173
catalogName: String,
174
schemaName: String
175
): GetSchemasOperation
176
177
override def newGetFunctionsOperation(
178
parentSession: HiveSession,
179
catalogName: String,
180
schemaName: String,
181
functionName: String
182
): GetFunctionsOperation
183
184
override def newGetTypeInfoOperation(parentSession: HiveSession): GetTypeInfoOperation
185
override def newGetCatalogsOperation(parentSession: HiveSession): GetCatalogsOperation
186
override def newGetTableTypesOperation(parentSession: HiveSession): GetTableTypesOperation
187
}
188
```
189
190
[Operation Management](./operation-management.md)
191
192
### SQL Execution
193
194
SQL statement execution with Spark SQL engine integration and result handling.
195
196
```scala { .api }
197
class SparkExecuteStatementOperation {
198
def getNextRowSet(order: FetchOrientation, maxRowsL: Long): TRowSet
199
def getResultSetSchema: TTableSchema
200
def runInternal(): Unit
201
def cancel(): Unit
202
def timeoutCancel(): Unit
203
}
204
205
class SparkSQLDriver(context: SQLContext) extends Driver {
206
override def init(): Unit
207
override def run(command: String): CommandProcessorResponse
208
override def close(): Int
209
override def getResults(res: JList[_]): Boolean
210
override def getSchema: Schema
211
override def destroy(): Unit
212
}
213
```
214
215
[SQL Execution](./sql-execution.md)
216
217
### Metadata Operations
218
219
Comprehensive metadata operations for catalogs, schemas, tables, columns, functions, and type information.
220
221
```java { .api }
222
interface ICLIService {
223
OperationHandle getCatalogs(SessionHandle sessionHandle);
224
OperationHandle getSchemas(SessionHandle sessionHandle, String catalogName, String schemaName);
225
OperationHandle getTables(SessionHandle sessionHandle, String catalogName, String schemaName, String tableName, List<String> tableTypes);
226
OperationHandle getColumns(SessionHandle sessionHandle, String catalogName, String schemaName, String tableName, String columnName);
227
OperationHandle getFunctions(SessionHandle sessionHandle, String catalogName, String schemaName, String functionName);
228
OperationHandle getTypeInfo(SessionHandle sessionHandle);
229
}
230
```
231
232
[Metadata Operations](./metadata-operations.md)
233
234
### CLI Driver
235
236
Interactive command-line interface with SQL completion, history, and signal handling.
237
238
```scala { .api }
239
object SparkSQLCLIDriver {
240
def main(args: Array[String]): Unit
241
def installSignalHandler(): Unit
242
def printUsage(): Unit
243
}
244
245
class SparkSQLCLIDriver {
246
def processCmd(cmd: String): Int
247
def processLine(line: String, allowInterrupting: Boolean): Int
248
def printMasterAndAppId(): Unit
249
}
250
```
251
252
[CLI Driver](./cli-driver.md)
253
254
### Web UI Integration
255
256
Spark Web UI integration for monitoring thrift server sessions, queries, and performance metrics.
257
258
```scala { .api }
259
class ThriftServerTab {
260
def detach(): Unit
261
}
262
263
class HiveThriftServer2Listener {
264
// Event listener for UI display and metrics collection
265
}
266
```
267
268
[Web UI Integration](./web-ui.md)
269
270
## Types
271
272
### Core Handle Types
273
274
```java { .api }
275
class SessionHandle extends Handle {
276
// Identifies client sessions
277
}
278
279
class OperationHandle extends Handle {
280
// Identifies operations (queries, metadata calls)
281
}
282
283
abstract class Handle {
284
HandleIdentifier getHandleIdentifier()
285
}
286
```
287
288
### Operation Types
289
290
```java { .api }
291
enum OperationType {
292
EXECUTE_STATEMENT,
293
GET_TYPE_INFO,
294
GET_CATALOGS,
295
GET_SCHEMAS,
296
GET_TABLES,
297
GET_COLUMNS,
298
GET_FUNCTIONS,
299
GET_PRIMARY_KEYS,
300
GET_CROSS_REFERENCE
301
}
302
303
enum OperationState {
304
INITIALIZED,
305
RUNNING,
306
FINISHED,
307
CANCELED,
308
CLOSED,
309
ERROR,
310
UNKNOWN
311
}
312
```
313
314
### Data Transfer Types
315
316
```java { .api }
317
abstract class RowSet {
318
// Base class for result sets
319
}
320
321
class RowBasedSet extends RowSet {
322
// Row-based result set implementation
323
}
324
325
class ColumnBasedSet extends RowSet {
326
// Column-based result set implementation
327
}
328
329
class TableSchema {
330
List<ColumnDescriptor> getColumns()
331
}
332
333
class ColumnDescriptor {
334
String getName()
335
TypeDescriptor getTypeDescriptor()
336
String getComment()
337
}
338
```
339
340
### Configuration Types
341
342
```java { .api }
343
enum FetchOrientation {
344
FETCH_NEXT,
345
FETCH_PRIOR,
346
FETCH_RELATIVE,
347
FETCH_ABSOLUTE,
348
FETCH_FIRST,
349
FETCH_LAST
350
}
351
352
enum FetchType {
353
QUERY_OUTPUT,
354
LOG
355
}
356
357
class GetInfoValue {
358
String getStringValue()
359
short getShortValue()
360
int getIntValue()
361
long getLongValue()
362
}
363
```