The Spark SQL CLI driver provides an interactive command-line interface for executing SQL queries directly against Spark SQL, similar to Hive's beeline tool but with Spark's enhanced capabilities.
Main CLI driver object that provides interactive SQL session management with Spark SQL integration.
private[hive] object SparkSQLCLIDriver extends Logging {
def main(args: Array[String]): Unit
def installSignalHandler(): Unit
}Entry point for interactive CLI sessions. Processes command-line arguments and starts an interactive SQL shell.
Usage Example:
# Start interactive CLI
$SPARK_HOME/bin/spark-sql
# With specific options
$SPARK_HOME/bin/spark-sql --master local[4] --conf spark.sql.warehouse.dir=/tmp/warehouseThe main method performs the following initialization:
OptionsProcessor to parse command-line optionsThe CLI automatically merges configuration from multiple sources:
val sparkConf = new SparkConf(loadDefaults = true)
val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
val extraConfigs = HiveUtils.formatTimeVarsForHiveClient(hadoopConf)
val cliConf = new HiveConf(classOf[SessionState])
(hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue)
++ sparkConf.getAll.toMap ++ extraConfigs).foreach {
case (k, v) => cliConf.set(k, v)
}This ensures CLI sessions have access to:
Installs interrupt handlers for graceful query cancellation during interactive sessions.
Signal Handling:
def installSignalHandler(): Unit = {
HiveInterruptUtils.add(new HiveInterruptCallback {
override def interrupt(): Unit = {
// Handle remote execution mode
if (SparkSQLEnv.sparkContext != null) {
SparkSQLEnv.sparkContext.cancelAllJobs()
} else {
if (transport != null) {
// Force closing of TCP connection upon session termination
transport.getSocket.close()
}
}
}
})
}When users press Ctrl+C during query execution:
The CLI integrates with Hive's session state management while adding Spark-specific enhancements:
val sessionState = new CliSessionState(cliConf)Session Features:
The CLI supports standard HiveQL commands plus Spark SQL extensions:
Database Operations:
-- Show databases
SHOW DATABASES;
-- Use database
USE my_database;
-- Show tables
SHOW TABLES;Configuration Management:
-- Set configuration
SET spark.sql.adaptive.enabled=true;
-- Show configuration
SET spark.sql.adaptive.enabled;
-- Show all configuration
SET;Query Execution:
-- Standard SQL queries
SELECT * FROM my_table WHERE condition = 'value';
-- Spark SQL specific features
SELECT explode(array_column) FROM my_table;The CLI supports the same authentication mechanisms as the Thrift Server:
# Kinit before starting CLI
kinit user@REALM.COM
$SPARK_HOME/bin/spark-sql// Kerberos principal for delegation tokens
spark.yarn.keytab=/path/to/keytab
spark.yarn.principal=user@REALM.COM
// Hive metastore authentication
hive.metastore.sasl.enabled=true
hive.metastore.kerberos.principal=hive/_HOST@REALM.COMWhile the current implementation focuses on local CLI usage, it maintains compatibility with remote connection patterns:
private var transport: TSocket = _Connection Lifecycle:
The CLI includes comprehensive error handling for common scenarios:
Network Issues:
Authentication Failures:
Query Errors:
Query Result Streaming:
Resource Management:
Configuration Tuning:
// Result fetch size
spark.sql.thriftServer.incrementalCollect=true
// UI retention limits
spark.sql.thriftServer.ui.retainedSessions=200
spark.sql.thriftServer.ui.retainedStatements=1000