or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cli-interface.md environment-management.md index.md query-operations.md server-management.md session-management.md web-ui-monitoring.md

tile.json

Environment Management

Environment management handles the initialization, configuration, and lifecycle of Spark SQL environments for the Thrift Server, ensuring proper resource allocation and cleanup.

Environment Controller

SparkSQLEnv

Singleton environment manager that provides centralized Spark context and SQL context management.

private[hive] object SparkSQLEnv extends Logging {
  var sqlContext: SQLContext
  var sparkContext: SparkContext
  
  def init(): Unit
  def stop(): Unit
}

Environment Initialization

The init method creates and configures the Spark environment for Thrift Server operations:

Usage Example:

import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv

// Initialize environment (typically called by server startup)
SparkSQLEnv.init()

// Environment is now available
val sqlContext = SparkSQLEnv.sqlContext
val sparkContext = SparkSQLEnv.sparkContext

Initialization Process:

Configuration Setup: Loads Spark configuration with defaults
Application Naming: Sets appropriate application name
Spark Session Creation: Creates Spark session with Hive support
Context Assignment: Assigns Spark and SQL contexts to singleton
Session State Initialization: Forces session state initialization
Hive Integration: Configures Hive metastore integration
Version Configuration: Sets Hive version compatibility

Configuration Management

The environment automatically handles configuration from multiple sources:

val sparkConf = new SparkConf(loadDefaults = true)

// Application name resolution
val maybeAppName = sparkConf
  .getOption("spark.app.name")
  .filterNot(_ == classOf[SparkSQLCLIDriver].getName)
  .filterNot(_ == classOf[HiveThriftServer2].getName)

sparkConf.setAppName(maybeAppName.getOrElse(s"SparkSQL::${Utils.localHostName()}"))

Configuration Sources:

Default Configuration: System-wide Spark defaults
User Configuration: Spark configuration files and system properties
Application Overrides: Thrift Server specific settings
Runtime Parameters: Command-line and programmatic overrides

Hive Integration Setup

The environment ensures proper Hive integration for SQL compatibility:

val sparkSession = SparkSession.builder.config(sparkConf).enableHiveSupport().getOrCreate()

// Force session state initialization with correct class loader
sparkSession.sessionState

// Configure Hive metastore client
val metadataHive = sparkSession
  .sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
metadataHive.setOut(new PrintStream(System.out, true, "UTF-8"))
metadataHive.setInfo(new PrintStream(System.err, true, "UTF-8"))  
metadataHive.setError(new PrintStream(System.err, true, "UTF-8"))

// Set Hive version compatibility
sparkSession.conf.set(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)

Hive Integration Features:

Metastore Access: Full access to Hive metastore for table metadata
UDF Support: Hive user-defined functions available in SQL
SerDe Support: Hive serialization/deserialization formats
Compatibility: Maintains compatibility with existing Hive queries

Environment Cleanup

The stop method provides comprehensive cleanup of all resources:

def stop(): Unit = {
  logDebug("Shutting down Spark SQL Environment")
  // Stop the SparkContext  
  if (SparkSQLEnv.sparkContext != null) {
    sparkContext.stop()
    sparkContext = null
    sqlContext = null
  }
}

Cleanup Process:

Context Shutdown: Stops Spark context and releases cluster resources
Variable Reset: Clears singleton references to prevent memory leaks
Resource Release: Ensures all system resources are properly released
Logging: Records shutdown events for debugging

Application Naming

Dynamic Name Resolution

The environment uses intelligent application naming based on the startup method:

val maybeAppName = sparkConf
  .getOption("spark.app.name")
  .filterNot(_ == classOf[SparkSQLCLIDriver].getName)
  .filterNot(_ == classOf[HiveThriftServer2].getName)

sparkConf.setAppName(maybeAppName.getOrElse(s"SparkSQL::${Utils.localHostName()}"))

Naming Strategy:

User-Specified: Uses explicitly configured application name
Filtered Names: Excludes default class names for cleaner identification
Dynamic Default: Generates name based on hostname for uniqueness

Application Name Examples:

Custom: "MyThriftServerApp"
Default: "SparkSQL::worker-node-01"
CLI Mode: "SparkSQL::dev-machine"

Resource Management

Context Lifecycle

The environment manages the complete lifecycle of Spark contexts:

Initialization Phase:

Configuration validation and merging
Resource allocation and cluster connection
Service registration and startup
Integration component setup

Runtime Phase:

Context sharing across sessions
Resource monitoring and management
Configuration updates and refreshes
Performance optimization

Shutdown Phase:

Graceful service shutdown
Resource deallocation and cleanup
Cluster disconnection
Memory and handle cleanup

Session State Management

Critical session state initialization ensures proper operation:

// SPARK-29604: force initialization of the session state with the Spark class loader,
// instead of having it happen during the initialization of the Hive client (which may use a
// different class loader).
sparkSession.sessionState

This prevents class loading issues that can occur when Hive clients use different class loaders.

Configuration Integration

Multi-Source Configuration

The environment integrates configuration from various sources:

Configuration Hierarchy:

System Defaults: Built-in Spark and Hive defaults
Configuration Files: spark-defaults.conf, hive-site.xml
Environment Variables: SPARK_* environment variables
Command Line: Runtime parameters and overrides
Programmatic: API-specified configuration values

Hive Compatibility Settings

Specific configuration ensures Hive compatibility:

sparkSession.conf.set(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)

Compatibility Features:

Version Emulation: Reports compatible Hive version to clients
Metadata Compatibility: Ensures metastore schema compatibility
Query Compatibility: Maintains HiveQL query behavior
Function Compatibility: Preserves Hive function semantics

Integration Points

Cluster Integration

The environment handles various cluster deployment modes:

Local Mode:

val conf = new SparkConf().setMaster("local[*]")

Standalone Cluster:

val conf = new SparkConf().setMaster("spark://master:7077")

YARN Integration:

val conf = new SparkConf().setMaster("yarn").setDeployMode("cluster")

Kubernetes Support:

val conf = new SparkConf().setMaster("k8s://api-server:8443")

Security Integration

Environment initialization includes security configuration:

Authentication:

Kerberos principal and keytab configuration
Delegation token management
Secure cluster communication

Authorization:

Hive metastore authorization integration
Spark SQL authorization policies
Resource access control

Encryption:

Network communication encryption
Data-at-rest encryption configuration
Temporary file security

Monitoring Integration

The environment provides hooks for monitoring systems:

Metrics Collection:

JVM metrics and garbage collection
Spark context and executor metrics
SQL execution and performance metrics

Event Generation:

Application lifecycle events
Context creation and destruction events
Configuration change notifications

Health Checks:

Context availability and responsiveness
Resource utilization monitoring
Error rate and exception tracking

Version

Tile

Files