CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-spark--spark-hive-thriftserver-2-11

Spark Project Hive Thrift Server - A Thrift server implementation that provides JDBC/ODBC access to Spark SQL

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

index.mddocs/

Spark Hive Thrift Server

Apache Spark Hive Thrift Server provides JDBC/ODBC access to Spark SQL through the HiveServer2 protocol, enabling remote clients to execute SQL queries against Spark clusters using standard database connectivity tools and BI applications.

Package Information

  • Package Name: spark-hive-thriftserver_2.11
  • Package Type: maven
  • Language: Scala
  • Artifact ID: org.apache.spark:spark-hive-thriftserver_2.11:2.4.8
  • Installation: Include as Maven dependency or part of Spark distribution

Core Imports

import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
import org.apache.spark.sql.SQLContext

Basic Usage

Starting the Thrift Server Programmatically

import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

// Create Spark SQL context
val conf = new SparkConf().setAppName("ThriftServer")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

// Start the thrift server
HiveThriftServer2.startWithContext(sqlContext)

Starting from Command Line

# Start Thrift Server
$SPARK_HOME/sbin/start-thriftserver.sh --master spark://master:7077

# Start CLI
$SPARK_HOME/bin/spark-sql

Architecture

The Spark Hive Thrift Server consists of several key components:

  • HiveThriftServer2: Main server entry point and lifecycle management
  • Service Layer: CLI service, session management, and operation handling
  • Transport Layer: HTTP and binary Thrift protocol support
  • Web UI: Monitoring interface for sessions and queries
  • Authentication: Kerberos and delegation token support

Capabilities

Server Management

Main entry points for starting and managing the Thrift Server with lifecycle control and configuration.

object HiveThriftServer2 {
  def startWithContext(sqlContext: SQLContext): Unit
  def main(args: Array[String]): Unit
  var uiTab: Option[ThriftServerTab]
  var listener: HiveThriftServer2Listener
}

Server Management

CLI Interface

Command-line interface for interactive SQL execution with Spark SQL integration.

object SparkSQLCLIDriver {
  def main(args: Array[String]): Unit
  def installSignalHandler(): Unit
}

CLI Interface

Session Management

Session lifecycle management with SQL context handling and client connection management.

class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext: SQLContext) extends SessionManager {
  def openSession(protocol: TProtocolVersion, username: String, passwd: String, 
                 ipAddress: String, sessionConf: java.util.Map[String, String], 
                 withImpersonation: Boolean, delegationToken: String): SessionHandle
  def closeSession(sessionHandle: SessionHandle): Unit
}

Session Management

Query Operations

SQL query execution operations with result handling and asynchronous processing support.

class SparkSQLOperationManager extends OperationManager {
  val sessionToActivePool: ConcurrentHashMap[SessionHandle, String]
  val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]  
  def newExecuteStatementOperation(parentSession: HiveSession, statement: String,
                                  confOverlay: JMap[String, String], async: Boolean): ExecuteStatementOperation
}

Query Operations

Web UI Monitoring

Web-based monitoring interface for active sessions, query execution, and server performance metrics.

class ThriftServerTab(sparkContext: SparkContext) extends SparkUITab {
  val name: String = "JDBC/ODBC Server"
  def detach(): Unit
}

Web UI Monitoring

Environment Management

Spark SQL environment initialization and cleanup with configuration management.

object SparkSQLEnv {
  var sqlContext: SQLContext
  var sparkContext: SparkContext
  def init(): Unit
  def stop(): Unit
}

Environment Management

Types

Core Types

// Session information tracking
class SessionInfo(sessionId: String, startTimestamp: Long, ip: String, userName: String) {
  var finishTimestamp: Long
  var totalExecution: Int
  def totalTime: Long
}

// Query execution tracking  
class ExecutionInfo(statement: String, sessionId: String, startTimestamp: Long, userName: String) {
  var finishTimestamp: Long
  var executePlan: String
  var detail: String
  var state: ExecutionState.Value
  val jobId: ArrayBuffer[String]
  var groupId: String
  def totalTime: Long
}

// Execution states
object ExecutionState extends Enumeration {
  val STARTED, COMPILED, FAILED, FINISHED = Value
  type ExecutionState = Value
}

// Server listener for events
class HiveThriftServer2Listener(server: HiveServer2, conf: SQLConf) extends SparkListener {
  def getOnlineSessionNum: Int
  def getTotalRunning: Int
  def getSessionList: Seq[SessionInfo]
  def getSession(sessionId: String): Option[SessionInfo]
  def getExecutionList: Seq[ExecutionInfo]
}

Hive Integration Types

// From Hive Service API
import org.apache.hive.service.cli.SessionHandle
import org.apache.hive.service.cli.OperationHandle  
import org.apache.hive.service.cli.thrift.TProtocolVersion
import org.apache.hive.service.server.HiveServer2
import org.apache.hadoop.hive.conf.HiveConf

Configuration

Transport Modes

  • Binary: Default TCP transport using Thrift binary protocol
  • HTTP: HTTP-based transport for firewall-friendly connections

Authentication

  • Kerberos: Enterprise authentication with keytab support
  • SPNEGO: HTTP authentication for web-based access
  • Delegation Tokens: Secure token-based authentication

Key Configuration Properties

  • hive.server2.transport.mode: "binary" or "http"
  • hive.server2.thrift.port: Server port (default: 10000)
  • hive.server2.thrift.bind.host: Bind address
  • spark.sql.hive.thriftServer.singleSession: Share single session
  • spark.sql.thriftServer.incrementalCollect: Incremental result collection

docs

cli-interface.md

environment-management.md

index.md

query-operations.md

server-management.md

session-management.md

web-ui-monitoring.md

tile.json