Tessl Tile for maven/org.apache.spark/spark-hive-thriftserver_2.11@2.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-hive-thriftserver_2-11

Spark Project Hive Thrift Server - A Thrift server implementation that provides JDBC/ODBC access to Spark SQL

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:maven/org.apache.spark/spark-hive-thriftserver_2.11@2.4.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-hive-thriftserver_2-11@2.4.0

0
# Spark Hive Thrift Server
1

2
Apache Spark Hive Thrift Server provides JDBC/ODBC access to Spark SQL through the HiveServer2 protocol, enabling remote clients to execute SQL queries against Spark clusters using standard database connectivity tools and BI applications.
3

4
## Package Information
5

6
- **Package Name**: spark-hive-thriftserver_2.11
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Artifact ID**: org.apache.spark:spark-hive-thriftserver_2.11:2.4.8
10
- **Installation**: Include as Maven dependency or part of Spark distribution
11

12
## Core Imports
13

14
```scala
15
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
16
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
17
import org.apache.spark.sql.SQLContext
18
```
19

20
## Basic Usage
21

22
### Starting the Thrift Server Programmatically
23

24
```scala
25
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
26
import org.apache.spark.sql.SQLContext
27
import org.apache.spark.SparkContext
28
import org.apache.spark.SparkConf
29

30
// Create Spark SQL context
31
val conf = new SparkConf().setAppName("ThriftServer")
32
val sc = new SparkContext(conf)
33
val sqlContext = new SQLContext(sc)
34

35
// Start the thrift server
36
HiveThriftServer2.startWithContext(sqlContext)
37
```
38

39
### Starting from Command Line
40

41
```bash
42
# Start Thrift Server
43
$SPARK_HOME/sbin/start-thriftserver.sh --master spark://master:7077
44

45
# Start CLI
46
$SPARK_HOME/bin/spark-sql
47
```
48

49
## Architecture
50

51
The Spark Hive Thrift Server consists of several key components:
52

53
- **HiveThriftServer2**: Main server entry point and lifecycle management
54
- **Service Layer**: CLI service, session management, and operation handling  
55
- **Transport Layer**: HTTP and binary Thrift protocol support
56
- **Web UI**: Monitoring interface for sessions and queries
57
- **Authentication**: Kerberos and delegation token support
58

59
## Capabilities
60

61
### Server Management
62

63
Main entry points for starting and managing the Thrift Server with lifecycle control and configuration.
64

65
```scala { .api }
66
object HiveThriftServer2 {
67
  def startWithContext(sqlContext: SQLContext): Unit
68
  def main(args: Array[String]): Unit
69
  var uiTab: Option[ThriftServerTab]
70
  var listener: HiveThriftServer2Listener
71
}
72
```
73

74
[Server Management](./server-management.md)
75

76
### CLI Interface
77

78
Command-line interface for interactive SQL execution with Spark SQL integration.
79

80
```scala { .api }
81
object SparkSQLCLIDriver {
82
  def main(args: Array[String]): Unit
83
  def installSignalHandler(): Unit
84
}
85
```
86

87
[CLI Interface](./cli-interface.md)
88

89
### Session Management
90

91
Session lifecycle management with SQL context handling and client connection management.
92

93
```scala { .api }
94
class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext: SQLContext) extends SessionManager {
95
  def openSession(protocol: TProtocolVersion, username: String, passwd: String, 
96
                 ipAddress: String, sessionConf: java.util.Map[String, String], 
97
                 withImpersonation: Boolean, delegationToken: String): SessionHandle
98
  def closeSession(sessionHandle: SessionHandle): Unit
99
}
100
```
101

102
[Session Management](./session-management.md)
103

104
### Query Operations
105

106
SQL query execution operations with result handling and asynchronous processing support.
107

108
```scala { .api }
109
class SparkSQLOperationManager extends OperationManager {
110
  val sessionToActivePool: ConcurrentHashMap[SessionHandle, String]
111
  val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]  
112
  def newExecuteStatementOperation(parentSession: HiveSession, statement: String,
113
                                  confOverlay: JMap[String, String], async: Boolean): ExecuteStatementOperation
114
}
115
```
116

117
[Query Operations](./query-operations.md)
118

119
### Web UI Monitoring
120

121
Web-based monitoring interface for active sessions, query execution, and server performance metrics.
122

123
```scala { .api }
124
class ThriftServerTab(sparkContext: SparkContext) extends SparkUITab {
125
  val name: String = "JDBC/ODBC Server"
126
  def detach(): Unit
127
}
128
```
129

130
[Web UI Monitoring](./web-ui-monitoring.md)
131

132
### Environment Management
133

134
Spark SQL environment initialization and cleanup with configuration management.
135

136
```scala { .api }
137
object SparkSQLEnv {
138
  var sqlContext: SQLContext
139
  var sparkContext: SparkContext
140
  def init(): Unit
141
  def stop(): Unit
142
}
143
```
144

145
[Environment Management](./environment-management.md)
146

147
## Types
148

149
### Core Types
150

151
```scala { .api }
152
// Session information tracking
153
class SessionInfo(sessionId: String, startTimestamp: Long, ip: String, userName: String) {
154
  var finishTimestamp: Long
155
  var totalExecution: Int
156
  def totalTime: Long
157
}
158

159
// Query execution tracking  
160
class ExecutionInfo(statement: String, sessionId: String, startTimestamp: Long, userName: String) {
161
  var finishTimestamp: Long
162
  var executePlan: String
163
  var detail: String
164
  var state: ExecutionState.Value
165
  val jobId: ArrayBuffer[String]
166
  var groupId: String
167
  def totalTime: Long
168
}
169

170
// Execution states
171
object ExecutionState extends Enumeration {
172
  val STARTED, COMPILED, FAILED, FINISHED = Value
173
  type ExecutionState = Value
174
}
175

176
// Server listener for events
177
class HiveThriftServer2Listener(server: HiveServer2, conf: SQLConf) extends SparkListener {
178
  def getOnlineSessionNum: Int
179
  def getTotalRunning: Int
180
  def getSessionList: Seq[SessionInfo]
181
  def getSession(sessionId: String): Option[SessionInfo]
182
  def getExecutionList: Seq[ExecutionInfo]
183
}
184
```
185

186
### Hive Integration Types
187

188
```scala { .api }
189
// From Hive Service API
190
import org.apache.hive.service.cli.SessionHandle
191
import org.apache.hive.service.cli.OperationHandle  
192
import org.apache.hive.service.cli.thrift.TProtocolVersion
193
import org.apache.hive.service.server.HiveServer2
194
import org.apache.hadoop.hive.conf.HiveConf
195
```
196

197
## Configuration
198

199
### Transport Modes
200

201
- **Binary**: Default TCP transport using Thrift binary protocol
202
- **HTTP**: HTTP-based transport for firewall-friendly connections
203

204
### Authentication
205

206
- **Kerberos**: Enterprise authentication with keytab support
207
- **SPNEGO**: HTTP authentication for web-based access
208
- **Delegation Tokens**: Secure token-based authentication
209

210
### Key Configuration Properties
211

212
- `hive.server2.transport.mode`: "binary" or "http" 
213
- `hive.server2.thrift.port`: Server port (default: 10000)
214
- `hive.server2.thrift.bind.host`: Bind address
215
- `spark.sql.hive.thriftServer.singleSession`: Share single session
216
- `spark.sql.thriftServer.incrementalCollect`: Incremental result collection