or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

cli-driver.mdcli-services.mdindex.mdmetadata-operations.mdoperation-management.mdserver-management.mdsession-management.mdsql-execution.mdweb-ui.md
tile.json

tessl/maven-org-apache-spark--spark-hive-thriftserver_2-12

Apache Spark Hive Thrift Server provides HiveServer2 compatibility for Spark SQL, enabling JDBC/ODBC connectivity and Hive CLI compatibility for Spark SQL queries

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-hive-thriftserver_2.12@3.5.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-hive-thriftserver_2-12@3.5.0

index.mddocs/

Spark Hive Thrift Server

Apache Spark Hive Thrift Server provides HiveServer2 compatibility for Spark SQL, enabling JDBC/ODBC connectivity and Hive CLI compatibility for Spark SQL queries. It offers a complete thrift-based server implementation with session management, authentication, and comprehensive metadata operations.

Package Information

  • Package Name: spark-hive-thriftserver_2.12
  • Package Type: maven
  • Language: Scala/Java
  • GroupId: org.apache.spark
  • ArtifactId: spark-hive-thriftserver_2.12
  • Version: 3.5.6
  • Installation: Add to Maven POM or use with Spark distribution

Core Imports

import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv
import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIService
import org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager
import org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager
import org.apache.spark.sql.SQLContext

For Java usage (requires Hive dependencies):

import org.apache.hive.service.cli.ICLIService;
import org.apache.hive.service.cli.SessionHandle;
import org.apache.hive.service.cli.OperationHandle;

Note: Many interfaces (ICLIService, SessionHandle, etc.) are provided by the Apache Hive library, which is included as a dependency of this module.

Basic Usage

Starting the Thrift Server

import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.SQLContext

// Initialize Spark SQL environment
SparkSQLEnv.init()

// Start the thrift server with SQL context
val server = HiveThriftServer2.startWithContext(SparkSQLEnv.sqlContext)

Using the CLI Driver

import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

// Start interactive SQL CLI
SparkSQLCLIDriver.main(Array("--hiveconf", "hive.server2.thrift.port=10000"))

JDBC Connection (from client applications)

// Standard JDBC connection to Spark Thrift Server
String url = "jdbc:hive2://localhost:10000/default";
Connection conn = DriverManager.getConnection(url, "username", "password");
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM my_table");

Architecture

The Spark Hive Thrift Server is built around several key components:

  • Server Management: HiveThriftServer2 provides the main server lifecycle and initialization
  • CLI Services: SparkSQLCLIService implements the core CLI service interface with Spark SQL integration
  • Session Management: SparkSQLSessionManager handles client sessions and their associated SQL contexts
  • Operation Management: SparkSQLOperationManager creates and manages SQL operations and metadata operations
  • SQL Execution: SparkSQLDriver and SparkExecuteStatementOperation execute SQL queries using Spark SQL engine
  • CLI Interface: SparkSQLCLIDriver provides interactive command-line interface
  • Web UI: Integration with Spark Web UI for monitoring sessions and queries
  • Transport Protocols: Support for both binary Thrift and HTTP transport modes
  • Authentication: Kerberos, SPNEGO, and custom authentication provider support

Capabilities

Server Management

Core server lifecycle management and initialization with Spark SQL integration.

object HiveThriftServer2 {
  def startWithContext(sqlContext: SQLContext): HiveThriftServer2
  def main(args: Array[String]): Unit
  
  // Note: ExecutionState is private[thriftserver] - not part of public API
  private[thriftserver] object ExecutionState extends Enumeration {
    val STARTED, COMPILED, CANCELED, TIMEDOUT, FAILED, FINISHED, CLOSED = Value
  }
}

Server Management

CLI Services

Comprehensive CLI service implementation providing HiveServer2 compatibility with Spark SQL enhancements.

class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLContext) extends CLIService(hiveServer) {
  override def init(hiveConf: HiveConf): Unit
  override def start(): Unit
  override def getInfo(sessionHandle: SessionHandle, getInfoType: GetInfoType): GetInfoValue
}

CLI Services

Session Management

Client session management with SQL context association and configuration handling.

class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext: SQLContext) extends SessionManager(hiveServer) {
  override def openSession(
    protocol: TProtocolVersion,
    username: String,
    passwd: String,
    ipAddress: String,
    sessionConf: java.util.Map[String, String],
    withImpersonation: Boolean,
    delegationToken: String
  ): SessionHandle
  override def closeSession(sessionHandle: SessionHandle): Unit
  def setConfMap(conf: SQLContext, confMap: java.util.Map[String, String]): Unit
}

Session Management

Operation Management

Manages SQL operations and metadata operations with session context mapping.

class SparkSQLOperationManager extends OperationManager {
  val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]
  
  override def newExecuteStatementOperation(
    parentSession: HiveSession,
    statement: String,
    confOverlay: java.util.Map[String, String],
    async: Boolean,
    queryTimeout: Long
  ): ExecuteStatementOperation
  
  override def newGetTablesOperation(
    parentSession: HiveSession,
    catalogName: String,
    schemaName: String,
    tableName: String,
    tableTypes: java.util.List[String]
  ): MetadataOperation
  
  override def newGetColumnsOperation(
    parentSession: HiveSession,
    catalogName: String,
    schemaName: String,
    tableName: String,
    columnName: String
  ): GetColumnsOperation
  
  override def newGetSchemasOperation(
    parentSession: HiveSession,
    catalogName: String,
    schemaName: String
  ): GetSchemasOperation
  
  override def newGetFunctionsOperation(
    parentSession: HiveSession,
    catalogName: String,
    schemaName: String,
    functionName: String
  ): GetFunctionsOperation
  
  override def newGetTypeInfoOperation(parentSession: HiveSession): GetTypeInfoOperation
  override def newGetCatalogsOperation(parentSession: HiveSession): GetCatalogsOperation
  override def newGetTableTypesOperation(parentSession: HiveSession): GetTableTypesOperation
}

Operation Management

SQL Execution

SQL statement execution with Spark SQL engine integration and result handling.

class SparkExecuteStatementOperation {
  def getNextRowSet(order: FetchOrientation, maxRowsL: Long): TRowSet
  def getResultSetSchema: TTableSchema
  def runInternal(): Unit
  def cancel(): Unit
  def timeoutCancel(): Unit
}

class SparkSQLDriver(context: SQLContext) extends Driver {
  override def init(): Unit
  override def run(command: String): CommandProcessorResponse
  override def close(): Int
  override def getResults(res: JList[_]): Boolean
  override def getSchema: Schema
  override def destroy(): Unit
}

SQL Execution

Metadata Operations

Comprehensive metadata operations for catalogs, schemas, tables, columns, functions, and type information.

interface ICLIService {
  OperationHandle getCatalogs(SessionHandle sessionHandle);
  OperationHandle getSchemas(SessionHandle sessionHandle, String catalogName, String schemaName);
  OperationHandle getTables(SessionHandle sessionHandle, String catalogName, String schemaName, String tableName, List<String> tableTypes);
  OperationHandle getColumns(SessionHandle sessionHandle, String catalogName, String schemaName, String tableName, String columnName);
  OperationHandle getFunctions(SessionHandle sessionHandle, String catalogName, String schemaName, String functionName);
  OperationHandle getTypeInfo(SessionHandle sessionHandle);
}

Metadata Operations

CLI Driver

Interactive command-line interface with SQL completion, history, and signal handling.

object SparkSQLCLIDriver {
  def main(args: Array[String]): Unit
  def installSignalHandler(): Unit
  def printUsage(): Unit
}

class SparkSQLCLIDriver {
  def processCmd(cmd: String): Int
  def processLine(line: String, allowInterrupting: Boolean): Int
  def printMasterAndAppId(): Unit
}

CLI Driver

Web UI Integration

Spark Web UI integration for monitoring thrift server sessions, queries, and performance metrics.

class ThriftServerTab {
  def detach(): Unit
}

class HiveThriftServer2Listener {
  // Event listener for UI display and metrics collection
}

Web UI Integration

Types

Core Handle Types

class SessionHandle extends Handle {
  // Identifies client sessions
}

class OperationHandle extends Handle {
  // Identifies operations (queries, metadata calls)
}

abstract class Handle {
  HandleIdentifier getHandleIdentifier()
}

Operation Types

enum OperationType {
  EXECUTE_STATEMENT,
  GET_TYPE_INFO,
  GET_CATALOGS,
  GET_SCHEMAS,
  GET_TABLES,
  GET_COLUMNS,
  GET_FUNCTIONS,
  GET_PRIMARY_KEYS,
  GET_CROSS_REFERENCE
}

enum OperationState {
  INITIALIZED,
  RUNNING,
  FINISHED,
  CANCELED,
  CLOSED,
  ERROR,
  UNKNOWN
}

Data Transfer Types

abstract class RowSet {
  // Base class for result sets
}

class RowBasedSet extends RowSet {
  // Row-based result set implementation
}

class ColumnBasedSet extends RowSet {
  // Column-based result set implementation
}

class TableSchema {
  List<ColumnDescriptor> getColumns()
}

class ColumnDescriptor {
  String getName()
  TypeDescriptor getTypeDescriptor()
  String getComment()
}

Configuration Types

enum FetchOrientation {
  FETCH_NEXT,
  FETCH_PRIOR,
  FETCH_RELATIVE,
  FETCH_ABSOLUTE,
  FETCH_FIRST,
  FETCH_LAST
}

enum FetchType {
  QUERY_OUTPUT,
  LOG
}

class GetInfoValue {
  String getStringValue()
  short getShortValue()
  int getIntValue()
  long getLongValue()
}