or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cli-interface.md environment-management.md index.md query-operations.md server-management.md session-management.md web-ui-monitoring.md

tile.json

CLI Interface

The Spark SQL CLI driver provides an interactive command-line interface for executing SQL queries directly against Spark SQL, similar to Hive's beeline tool but with Spark's enhanced capabilities.

CLI Driver

SparkSQLCLIDriver

Main CLI driver object that provides interactive SQL session management with Spark SQL integration.

private[hive] object SparkSQLCLIDriver extends Logging {
  def main(args: Array[String]): Unit
  def installSignalHandler(): Unit
}

main

Entry point for interactive CLI sessions. Processes command-line arguments and starts an interactive SQL shell.

Usage Example:

# Start interactive CLI
$SPARK_HOME/bin/spark-sql

# With specific options
$SPARK_HOME/bin/spark-sql --master local[4] --conf spark.sql.warehouse.dir=/tmp/warehouse

The main method performs the following initialization:

Argument Processing: Uses Hive's OptionsProcessor to parse command-line options
Configuration Setup: Merges Spark configuration with Hadoop and Hive settings
Session State: Creates CLI session state with merged configuration
Environment Setup: Initializes Spark SQL environment
CLI Loop: Starts interactive command processing

Configuration Merging

The CLI automatically merges configuration from multiple sources:

val sparkConf = new SparkConf(loadDefaults = true)
val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
val extraConfigs = HiveUtils.formatTimeVarsForHiveClient(hadoopConf)

val cliConf = new HiveConf(classOf[SessionState])
(hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue)
  ++ sparkConf.getAll.toMap ++ extraConfigs).foreach {
  case (k, v) => cliConf.set(k, v)
}

This ensures CLI sessions have access to:

Spark configuration properties
Hadoop cluster settings
Hive compatibility configurations
User-specified overrides

installSignalHandler

Installs interrupt handlers for graceful query cancellation during interactive sessions.

Signal Handling:

def installSignalHandler(): Unit = {
  HiveInterruptUtils.add(new HiveInterruptCallback {
    override def interrupt(): Unit = {
      // Handle remote execution mode
      if (SparkSQLEnv.sparkContext != null) {
        SparkSQLEnv.sparkContext.cancelAllJobs()
      } else {
        if (transport != null) {
          // Force closing of TCP connection upon session termination
          transport.getSocket.close()
        }
      }
    }
  })
}

When users press Ctrl+C during query execution:

Local Mode: Cancels all running Spark jobs
Remote Mode: Closes TCP transport connection to server
Cleanup: Ensures resources are properly released

CLI Session Management

Session State Integration

The CLI integrates with Hive's session state management while adding Spark-specific enhancements:

val sessionState = new CliSessionState(cliConf)

Session Features:

Command History: Persistent command history across sessions
Variable Management: Set/get session variables and configuration
Database Context: Current database and catalog management
Query Results: Formatted output with configurable display options

Interactive Commands

The CLI supports standard HiveQL commands plus Spark SQL extensions:

Database Operations:

-- Show databases
SHOW DATABASES;

-- Use database  
USE my_database;

-- Show tables
SHOW TABLES;

Configuration Management:

-- Set configuration
SET spark.sql.adaptive.enabled=true;

-- Show configuration
SET spark.sql.adaptive.enabled;

-- Show all configuration
SET;

Query Execution:

-- Standard SQL queries
SELECT * FROM my_table WHERE condition = 'value';

-- Spark SQL specific features
SELECT explode(array_column) FROM my_table;

Authentication and Security

The CLI supports the same authentication mechanisms as the Thrift Server:

Kerberos Authentication

# Kinit before starting CLI
kinit user@REALM.COM
$SPARK_HOME/bin/spark-sql

Configuration Properties

// Kerberos principal for delegation tokens
spark.yarn.keytab=/path/to/keytab
spark.yarn.principal=user@REALM.COM

// Hive metastore authentication
hive.metastore.sasl.enabled=true
hive.metastore.kerberos.principal=hive/_HOST@REALM.COM

Connection Management

Remote Connections

While the current implementation focuses on local CLI usage, it maintains compatibility with remote connection patterns:

private var transport: TSocket = _

Connection Lifecycle:

Transport Creation: TCP socket to remote Thrift Server
Protocol Negotiation: Thrift protocol version agreement
Authentication: Credential exchange if security enabled
Session Establishment: CLI session setup on server
Command Processing: Interactive query execution
Cleanup: Proper connection and session cleanup

Error Handling

The CLI includes comprehensive error handling for common scenarios:

Network Issues:

Connection timeouts and retries
Transport layer failures
Server unavailability

Authentication Failures:

Invalid credentials
Expired tokens
Insufficient permissions

Query Errors:

SQL parsing errors
Execution failures
Resource constraints

Performance Considerations

Query Result Streaming:

Large result sets streamed incrementally
Configurable fetch size for memory management
Progress indicators for long-running queries

Resource Management:

Automatic cleanup of temporary resources
Connection pooling for multiple sessions
Memory-efficient result processing

Configuration Tuning:

// Result fetch size
spark.sql.thriftServer.incrementalCollect=true

// UI retention limits  
spark.sql.thriftServer.ui.retainedSessions=200
spark.sql.thriftServer.ui.retainedStatements=1000