tessl/maven-org-apache-spark--spark-hive-thriftserver-2-11

Spark Project Hive Thrift Server - A Thrift server implementation that provides JDBC/ODBC access to Spark SQL

—

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

—

The risk profile of this skill

Overview

Eval results

Files

Spark Hive Thrift Server

Name: tessl/maven-org-apache-spark--spark-hive-thriftserver-2-11
Author: tessl

Apache Spark Hive Thrift Server provides JDBC/ODBC access to Spark SQL through the HiveServer2 protocol, enabling remote clients to execute SQL queries against Spark clusters using standard database connectivity tools and BI applications.

Package Information

Package Name: spark-hive-thriftserver_2.11
Package Type: maven
Language: Scala
Artifact ID: org.apache.spark:spark-hive-thriftserver_2.11:2.4.8
Installation: Include as Maven dependency or part of Spark distribution

Core Imports

import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
import org.apache.spark.sql.SQLContext

Basic Usage

Starting the Thrift Server Programmatically

import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

// Create Spark SQL context
val conf = new SparkConf().setAppName("ThriftServer")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

// Start the thrift server
HiveThriftServer2.startWithContext(sqlContext)

Starting from Command Line

# Start Thrift Server
$SPARK_HOME/sbin/start-thriftserver.sh --master spark://master:7077

# Start CLI
$SPARK_HOME/bin/spark-sql

Architecture

The Spark Hive Thrift Server consists of several key components:

HiveThriftServer2: Main server entry point and lifecycle management
Service Layer: CLI service, session management, and operation handling
Transport Layer: HTTP and binary Thrift protocol support
Web UI: Monitoring interface for sessions and queries
Authentication: Kerberos and delegation token support

Capabilities

Server Management

Main entry points for starting and managing the Thrift Server with lifecycle control and configuration.

object HiveThriftServer2 {
  def startWithContext(sqlContext: SQLContext): Unit
  def main(args: Array[String]): Unit
  var uiTab: Option[ThriftServerTab]
  var listener: HiveThriftServer2Listener
}

Server Management

CLI Interface

Command-line interface for interactive SQL execution with Spark SQL integration.

object SparkSQLCLIDriver {
  def main(args: Array[String]): Unit
  def installSignalHandler(): Unit
}

CLI Interface

Session Management

Session lifecycle management with SQL context handling and client connection management.

class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext: SQLContext) extends SessionManager {
  def openSession(protocol: TProtocolVersion, username: String, passwd: String, 
                 ipAddress: String, sessionConf: java.util.Map[String, String], 
                 withImpersonation: Boolean, delegationToken: String): SessionHandle
  def closeSession(sessionHandle: SessionHandle): Unit
}

Session Management

Query Operations

SQL query execution operations with result handling and asynchronous processing support.

class SparkSQLOperationManager extends OperationManager {
  val sessionToActivePool: ConcurrentHashMap[SessionHandle, String]
  val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]  
  def newExecuteStatementOperation(parentSession: HiveSession, statement: String,
                                  confOverlay: JMap[String, String], async: Boolean): ExecuteStatementOperation
}

Query Operations

Web UI Monitoring

Web-based monitoring interface for active sessions, query execution, and server performance metrics.

class ThriftServerTab(sparkContext: SparkContext) extends SparkUITab {
  val name: String = "JDBC/ODBC Server"
  def detach(): Unit
}

Web UI Monitoring

Environment Management

Spark SQL environment initialization and cleanup with configuration management.

object SparkSQLEnv {
  var sqlContext: SQLContext
  var sparkContext: SparkContext
  def init(): Unit
  def stop(): Unit
}

Environment Management

Types

Core Types

// Session information tracking
class SessionInfo(sessionId: String, startTimestamp: Long, ip: String, userName: String) {
  var finishTimestamp: Long
  var totalExecution: Int
  def totalTime: Long
}

// Query execution tracking  
class ExecutionInfo(statement: String, sessionId: String, startTimestamp: Long, userName: String) {
  var finishTimestamp: Long
  var executePlan: String
  var detail: String
  var state: ExecutionState.Value
  val jobId: ArrayBuffer[String]
  var groupId: String
  def totalTime: Long
}

// Execution states
object ExecutionState extends Enumeration {
  val STARTED, COMPILED, FAILED, FINISHED = Value
  type ExecutionState = Value
}

// Server listener for events
class HiveThriftServer2Listener(server: HiveServer2, conf: SQLConf) extends SparkListener {
  def getOnlineSessionNum: Int
  def getTotalRunning: Int
  def getSessionList: Seq[SessionInfo]
  def getSession(sessionId: String): Option[SessionInfo]
  def getExecutionList: Seq[ExecutionInfo]
}

Hive Integration Types

// From Hive Service API
import org.apache.hive.service.cli.SessionHandle
import org.apache.hive.service.cli.OperationHandle  
import org.apache.hive.service.cli.thrift.TProtocolVersion
import org.apache.hive.service.server.HiveServer2
import org.apache.hadoop.hive.conf.HiveConf