or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

cli-interface.mdenvironment-management.mdindex.mdquery-operations.mdserver-management.mdsession-management.mdweb-ui-monitoring.md
tile.json

cli-interface.mddocs/

CLI Interface

The Spark SQL CLI driver provides an interactive command-line interface for executing SQL queries directly against Spark SQL, similar to Hive's beeline tool but with Spark's enhanced capabilities.

CLI Driver

SparkSQLCLIDriver

Main CLI driver object that provides interactive SQL session management with Spark SQL integration.

private[hive] object SparkSQLCLIDriver extends Logging {
  def main(args: Array[String]): Unit
  def installSignalHandler(): Unit
}

main

Entry point for interactive CLI sessions. Processes command-line arguments and starts an interactive SQL shell.

Usage Example:

# Start interactive CLI
$SPARK_HOME/bin/spark-sql

# With specific options
$SPARK_HOME/bin/spark-sql --master local[4] --conf spark.sql.warehouse.dir=/tmp/warehouse

The main method performs the following initialization:

  1. Argument Processing: Uses Hive's OptionsProcessor to parse command-line options
  2. Configuration Setup: Merges Spark configuration with Hadoop and Hive settings
  3. Session State: Creates CLI session state with merged configuration
  4. Environment Setup: Initializes Spark SQL environment
  5. CLI Loop: Starts interactive command processing

Configuration Merging

The CLI automatically merges configuration from multiple sources:

val sparkConf = new SparkConf(loadDefaults = true)
val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
val extraConfigs = HiveUtils.formatTimeVarsForHiveClient(hadoopConf)

val cliConf = new HiveConf(classOf[SessionState])
(hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue)
  ++ sparkConf.getAll.toMap ++ extraConfigs).foreach {
  case (k, v) => cliConf.set(k, v)
}

This ensures CLI sessions have access to:

  • Spark configuration properties
  • Hadoop cluster settings
  • Hive compatibility configurations
  • User-specified overrides

installSignalHandler

Installs interrupt handlers for graceful query cancellation during interactive sessions.

Signal Handling:

def installSignalHandler(): Unit = {
  HiveInterruptUtils.add(new HiveInterruptCallback {
    override def interrupt(): Unit = {
      // Handle remote execution mode
      if (SparkSQLEnv.sparkContext != null) {
        SparkSQLEnv.sparkContext.cancelAllJobs()
      } else {
        if (transport != null) {
          // Force closing of TCP connection upon session termination
          transport.getSocket.close()
        }
      }
    }
  })
}

When users press Ctrl+C during query execution:

  1. Local Mode: Cancels all running Spark jobs
  2. Remote Mode: Closes TCP transport connection to server
  3. Cleanup: Ensures resources are properly released

CLI Session Management

Session State Integration

The CLI integrates with Hive's session state management while adding Spark-specific enhancements:

val sessionState = new CliSessionState(cliConf)

Session Features:

  • Command History: Persistent command history across sessions
  • Variable Management: Set/get session variables and configuration
  • Database Context: Current database and catalog management
  • Query Results: Formatted output with configurable display options

Interactive Commands

The CLI supports standard HiveQL commands plus Spark SQL extensions:

Database Operations:

-- Show databases
SHOW DATABASES;

-- Use database  
USE my_database;

-- Show tables
SHOW TABLES;

Configuration Management:

-- Set configuration
SET spark.sql.adaptive.enabled=true;

-- Show configuration
SET spark.sql.adaptive.enabled;

-- Show all configuration
SET;

Query Execution:

-- Standard SQL queries
SELECT * FROM my_table WHERE condition = 'value';

-- Spark SQL specific features
SELECT explode(array_column) FROM my_table;

Authentication and Security

The CLI supports the same authentication mechanisms as the Thrift Server:

Kerberos Authentication

# Kinit before starting CLI
kinit user@REALM.COM
$SPARK_HOME/bin/spark-sql

Configuration Properties

// Kerberos principal for delegation tokens
spark.yarn.keytab=/path/to/keytab
spark.yarn.principal=user@REALM.COM

// Hive metastore authentication
hive.metastore.sasl.enabled=true
hive.metastore.kerberos.principal=hive/_HOST@REALM.COM

Connection Management

Remote Connections

While the current implementation focuses on local CLI usage, it maintains compatibility with remote connection patterns:

private var transport: TSocket = _

Connection Lifecycle:

  1. Transport Creation: TCP socket to remote Thrift Server
  2. Protocol Negotiation: Thrift protocol version agreement
  3. Authentication: Credential exchange if security enabled
  4. Session Establishment: CLI session setup on server
  5. Command Processing: Interactive query execution
  6. Cleanup: Proper connection and session cleanup

Error Handling

The CLI includes comprehensive error handling for common scenarios:

Network Issues:

  • Connection timeouts and retries
  • Transport layer failures
  • Server unavailability

Authentication Failures:

  • Invalid credentials
  • Expired tokens
  • Insufficient permissions

Query Errors:

  • SQL parsing errors
  • Execution failures
  • Resource constraints

Performance Considerations

Query Result Streaming:

  • Large result sets streamed incrementally
  • Configurable fetch size for memory management
  • Progress indicators for long-running queries

Resource Management:

  • Automatic cleanup of temporary resources
  • Connection pooling for multiple sessions
  • Memory-efficient result processing

Configuration Tuning:

// Result fetch size
spark.sql.thriftServer.incrementalCollect=true

// UI retention limits  
spark.sql.thriftServer.ui.retainedSessions=200
spark.sql.thriftServer.ui.retainedStatements=1000