Tessl Tile for maven/org.apache.spark/spark-hive-thriftserver_2.12@3.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-hive-thriftserver_2-12

Apache Spark Hive Thrift Server provides HiveServer2 compatibility for Spark SQL, enabling JDBC/ODBC connectivity and Hive CLI compatibility for Spark SQL queries

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:maven/org.apache.spark/spark-hive-thriftserver_2.12@3.5.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-hive-thriftserver_2-12@3.5.0

0
# Spark Hive Thrift Server
1

2
Apache Spark Hive Thrift Server provides HiveServer2 compatibility for Spark SQL, enabling JDBC/ODBC connectivity and Hive CLI compatibility for Spark SQL queries. It offers a complete thrift-based server implementation with session management, authentication, and comprehensive metadata operations.
3

4
## Package Information
5

6
- **Package Name**: spark-hive-thriftserver_2.12
7
- **Package Type**: maven
8
- **Language**: Scala/Java
9
- **GroupId**: org.apache.spark
10
- **ArtifactId**: spark-hive-thriftserver_2.12
11
- **Version**: 3.5.6
12
- **Installation**: Add to Maven POM or use with Spark distribution
13

14
## Core Imports
15

16
```scala
17
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
18
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
19
import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv
20
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIService
21
import org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager
22
import org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager
23
import org.apache.spark.sql.SQLContext
24
```
25

26
For Java usage (requires Hive dependencies):
27
```java
28
import org.apache.hive.service.cli.ICLIService;
29
import org.apache.hive.service.cli.SessionHandle;
30
import org.apache.hive.service.cli.OperationHandle;
31
```
32

33
**Note**: Many interfaces (`ICLIService`, `SessionHandle`, etc.) are provided by the Apache Hive library, which is included as a dependency of this module.
34

35
## Basic Usage
36

37
### Starting the Thrift Server
38

39
```scala
40
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
41
import org.apache.spark.sql.SQLContext
42

43
// Initialize Spark SQL environment
44
SparkSQLEnv.init()
45

46
// Start the thrift server with SQL context
47
val server = HiveThriftServer2.startWithContext(SparkSQLEnv.sqlContext)
48
```
49

50
### Using the CLI Driver
51

52
```scala
53
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
54

55
// Start interactive SQL CLI
56
SparkSQLCLIDriver.main(Array("--hiveconf", "hive.server2.thrift.port=10000"))
57
```
58

59
### JDBC Connection (from client applications)
60

61
```java
62
// Standard JDBC connection to Spark Thrift Server
63
String url = "jdbc:hive2://localhost:10000/default";
64
Connection conn = DriverManager.getConnection(url, "username", "password");
65
Statement stmt = conn.createStatement();
66
ResultSet rs = stmt.executeQuery("SELECT * FROM my_table");
67
```
68

69
## Architecture
70

71
The Spark Hive Thrift Server is built around several key components:
72

73
- **Server Management**: `HiveThriftServer2` provides the main server lifecycle and initialization
74
- **CLI Services**: `SparkSQLCLIService` implements the core CLI service interface with Spark SQL integration
75
- **Session Management**: `SparkSQLSessionManager` handles client sessions and their associated SQL contexts
76
- **Operation Management**: `SparkSQLOperationManager` creates and manages SQL operations and metadata operations
77
- **SQL Execution**: `SparkSQLDriver` and `SparkExecuteStatementOperation` execute SQL queries using Spark SQL engine
78
- **CLI Interface**: `SparkSQLCLIDriver` provides interactive command-line interface
79
- **Web UI**: Integration with Spark Web UI for monitoring sessions and queries
80
- **Transport Protocols**: Support for both binary Thrift and HTTP transport modes
81
- **Authentication**: Kerberos, SPNEGO, and custom authentication provider support
82

83
## Capabilities
84

85
### Server Management
86

87
Core server lifecycle management and initialization with Spark SQL integration.
88

89
```scala { .api }
90
object HiveThriftServer2 {
91
  def startWithContext(sqlContext: SQLContext): HiveThriftServer2
92
  def main(args: Array[String]): Unit
93
  
94
  // Note: ExecutionState is private[thriftserver] - not part of public API
95
  private[thriftserver] object ExecutionState extends Enumeration {
96
    val STARTED, COMPILED, CANCELED, TIMEDOUT, FAILED, FINISHED, CLOSED = Value
97
  }
98
}
99
```
100

101
[Server Management](./server-management.md)
102

103
### CLI Services
104

105
Comprehensive CLI service implementation providing HiveServer2 compatibility with Spark SQL enhancements.
106

107
```scala { .api }
108
class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLContext) extends CLIService(hiveServer) {
109
  override def init(hiveConf: HiveConf): Unit
110
  override def start(): Unit
111
  override def getInfo(sessionHandle: SessionHandle, getInfoType: GetInfoType): GetInfoValue
112
}
113
```
114

115
[CLI Services](./cli-services.md)
116

117
### Session Management  
118

119
Client session management with SQL context association and configuration handling.
120

121
```scala { .api }
122
class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext: SQLContext) extends SessionManager(hiveServer) {
123
  override def openSession(
124
    protocol: TProtocolVersion,
125
    username: String,
126
    passwd: String,
127
    ipAddress: String,
128
    sessionConf: java.util.Map[String, String],
129
    withImpersonation: Boolean,
130
    delegationToken: String
131
  ): SessionHandle
132
  override def closeSession(sessionHandle: SessionHandle): Unit
133
  def setConfMap(conf: SQLContext, confMap: java.util.Map[String, String]): Unit
134
}
135
```
136

137
[Session Management](./session-management.md)
138

139
### Operation Management
140

141
Manages SQL operations and metadata operations with session context mapping.
142

143
```scala { .api }
144
class SparkSQLOperationManager extends OperationManager {
145
  val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]
146
  
147
  override def newExecuteStatementOperation(
148
    parentSession: HiveSession,
149
    statement: String,
150
    confOverlay: java.util.Map[String, String],
151
    async: Boolean,
152
    queryTimeout: Long
153
  ): ExecuteStatementOperation
154
  
155
  override def newGetTablesOperation(
156
    parentSession: HiveSession,
157
    catalogName: String,
158
    schemaName: String,
159
    tableName: String,
160
    tableTypes: java.util.List[String]
161
  ): MetadataOperation
162
  
163
  override def newGetColumnsOperation(
164
    parentSession: HiveSession,
165
    catalogName: String,
166
    schemaName: String,
167
    tableName: String,
168
    columnName: String
169
  ): GetColumnsOperation
170
  
171
  override def newGetSchemasOperation(
172
    parentSession: HiveSession,
173
    catalogName: String,
174
    schemaName: String
175
  ): GetSchemasOperation
176
  
177
  override def newGetFunctionsOperation(
178
    parentSession: HiveSession,
179
    catalogName: String,
180
    schemaName: String,
181
    functionName: String
182
  ): GetFunctionsOperation
183
  
184
  override def newGetTypeInfoOperation(parentSession: HiveSession): GetTypeInfoOperation
185
  override def newGetCatalogsOperation(parentSession: HiveSession): GetCatalogsOperation
186
  override def newGetTableTypesOperation(parentSession: HiveSession): GetTableTypesOperation
187
}
188
```
189

190
[Operation Management](./operation-management.md)
191

192
### SQL Execution
193

194
SQL statement execution with Spark SQL engine integration and result handling.
195

196
```scala { .api }
197
class SparkExecuteStatementOperation {
198
  def getNextRowSet(order: FetchOrientation, maxRowsL: Long): TRowSet
199
  def getResultSetSchema: TTableSchema
200
  def runInternal(): Unit
201
  def cancel(): Unit
202
  def timeoutCancel(): Unit
203
}
204

205
class SparkSQLDriver(context: SQLContext) extends Driver {
206
  override def init(): Unit
207
  override def run(command: String): CommandProcessorResponse
208
  override def close(): Int
209
  override def getResults(res: JList[_]): Boolean
210
  override def getSchema: Schema
211
  override def destroy(): Unit
212
}
213
```
214

215
[SQL Execution](./sql-execution.md)
216

217
### Metadata Operations
218

219
Comprehensive metadata operations for catalogs, schemas, tables, columns, functions, and type information.
220

221
```java { .api }
222
interface ICLIService {
223
  OperationHandle getCatalogs(SessionHandle sessionHandle);
224
  OperationHandle getSchemas(SessionHandle sessionHandle, String catalogName, String schemaName);
225
  OperationHandle getTables(SessionHandle sessionHandle, String catalogName, String schemaName, String tableName, List<String> tableTypes);
226
  OperationHandle getColumns(SessionHandle sessionHandle, String catalogName, String schemaName, String tableName, String columnName);
227
  OperationHandle getFunctions(SessionHandle sessionHandle, String catalogName, String schemaName, String functionName);
228
  OperationHandle getTypeInfo(SessionHandle sessionHandle);
229
}
230
```
231

232
[Metadata Operations](./metadata-operations.md)
233

234
### CLI Driver
235

236
Interactive command-line interface with SQL completion, history, and signal handling.
237

238
```scala { .api }
239
object SparkSQLCLIDriver {
240
  def main(args: Array[String]): Unit
241
  def installSignalHandler(): Unit
242
  def printUsage(): Unit
243
}
244

245
class SparkSQLCLIDriver {
246
  def processCmd(cmd: String): Int
247
  def processLine(line: String, allowInterrupting: Boolean): Int
248
  def printMasterAndAppId(): Unit
249
}
250
```
251

252
[CLI Driver](./cli-driver.md)
253

254
### Web UI Integration
255

256
Spark Web UI integration for monitoring thrift server sessions, queries, and performance metrics.
257

258
```scala { .api }
259
class ThriftServerTab {
260
  def detach(): Unit
261
}
262

263
class HiveThriftServer2Listener {
264
  // Event listener for UI display and metrics collection
265
}
266
```
267

268
[Web UI Integration](./web-ui.md)
269

270
## Types
271

272
### Core Handle Types
273

274
```java { .api }
275
class SessionHandle extends Handle {
276
  // Identifies client sessions
277
}
278

279
class OperationHandle extends Handle {
280
  // Identifies operations (queries, metadata calls)
281
}
282

283
abstract class Handle {
284
  HandleIdentifier getHandleIdentifier()
285
}
286
```
287

288
### Operation Types
289

290
```java { .api }
291
enum OperationType {
292
  EXECUTE_STATEMENT,
293
  GET_TYPE_INFO,
294
  GET_CATALOGS,
295
  GET_SCHEMAS,
296
  GET_TABLES,
297
  GET_COLUMNS,
298
  GET_FUNCTIONS,
299
  GET_PRIMARY_KEYS,
300
  GET_CROSS_REFERENCE
301
}
302

303
enum OperationState {
304
  INITIALIZED,
305
  RUNNING,
306
  FINISHED,
307
  CANCELED,
308
  CLOSED,
309
  ERROR,
310
  UNKNOWN
311
}
312
```
313

314
### Data Transfer Types
315

316
```java { .api }
317
abstract class RowSet {
318
  // Base class for result sets
319
}
320

321
class RowBasedSet extends RowSet {
322
  // Row-based result set implementation
323
}
324

325
class ColumnBasedSet extends RowSet {
326
  // Column-based result set implementation
327
}
328

329
class TableSchema {
330
  List<ColumnDescriptor> getColumns()
331
}
332

333
class ColumnDescriptor {
334
  String getName()
335
  TypeDescriptor getTypeDescriptor()
336
  String getComment()
337
}
338
```
339

340
### Configuration Types
341

342
```java { .api }
343
enum FetchOrientation {
344
  FETCH_NEXT,
345
  FETCH_PRIOR,
346
  FETCH_RELATIVE,
347
  FETCH_ABSOLUTE,
348
  FETCH_FIRST,
349
  FETCH_LAST
350
}
351

352
enum FetchType {
353
  QUERY_OUTPUT,
354
  LOG
355
}
356

357
class GetInfoValue {
358
  String getStringValue()
359
  short getShortValue()
360
  int getIntValue()
361
  long getLongValue()
362
}
363
```