0
# CLI Interface
1
2
The Spark SQL CLI driver provides an interactive command-line interface for executing SQL queries directly against Spark SQL, similar to Hive's beeline tool but with Spark's enhanced capabilities.
3
4
## CLI Driver
5
6
### SparkSQLCLIDriver
7
8
Main CLI driver object that provides interactive SQL session management with Spark SQL integration.
9
10
```scala { .api }
11
private[hive] object SparkSQLCLIDriver extends Logging {
12
def main(args: Array[String]): Unit
13
def installSignalHandler(): Unit
14
}
15
```
16
17
#### main
18
19
Entry point for interactive CLI sessions. Processes command-line arguments and starts an interactive SQL shell.
20
21
**Usage Example:**
22
23
```bash
24
# Start interactive CLI
25
$SPARK_HOME/bin/spark-sql
26
27
# With specific options
28
$SPARK_HOME/bin/spark-sql --master local[4] --conf spark.sql.warehouse.dir=/tmp/warehouse
29
```
30
31
The main method performs the following initialization:
32
33
1. **Argument Processing**: Uses Hive's `OptionsProcessor` to parse command-line options
34
2. **Configuration Setup**: Merges Spark configuration with Hadoop and Hive settings
35
3. **Session State**: Creates CLI session state with merged configuration
36
4. **Environment Setup**: Initializes Spark SQL environment
37
5. **CLI Loop**: Starts interactive command processing
38
39
#### Configuration Merging
40
41
The CLI automatically merges configuration from multiple sources:
42
43
```scala
44
val sparkConf = new SparkConf(loadDefaults = true)
45
val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
46
val extraConfigs = HiveUtils.formatTimeVarsForHiveClient(hadoopConf)
47
48
val cliConf = new HiveConf(classOf[SessionState])
49
(hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue)
50
++ sparkConf.getAll.toMap ++ extraConfigs).foreach {
51
case (k, v) => cliConf.set(k, v)
52
}
53
```
54
55
This ensures CLI sessions have access to:
56
- Spark configuration properties
57
- Hadoop cluster settings
58
- Hive compatibility configurations
59
- User-specified overrides
60
61
#### installSignalHandler
62
63
Installs interrupt handlers for graceful query cancellation during interactive sessions.
64
65
**Signal Handling:**
66
67
```scala
68
def installSignalHandler(): Unit = {
69
HiveInterruptUtils.add(new HiveInterruptCallback {
70
override def interrupt(): Unit = {
71
// Handle remote execution mode
72
if (SparkSQLEnv.sparkContext != null) {
73
SparkSQLEnv.sparkContext.cancelAllJobs()
74
} else {
75
if (transport != null) {
76
// Force closing of TCP connection upon session termination
77
transport.getSocket.close()
78
}
79
}
80
}
81
})
82
}
83
```
84
85
When users press Ctrl+C during query execution:
86
1. **Local Mode**: Cancels all running Spark jobs
87
2. **Remote Mode**: Closes TCP transport connection to server
88
3. **Cleanup**: Ensures resources are properly released
89
90
## CLI Session Management
91
92
### Session State Integration
93
94
The CLI integrates with Hive's session state management while adding Spark-specific enhancements:
95
96
```scala
97
val sessionState = new CliSessionState(cliConf)
98
```
99
100
**Session Features:**
101
- **Command History**: Persistent command history across sessions
102
- **Variable Management**: Set/get session variables and configuration
103
- **Database Context**: Current database and catalog management
104
- **Query Results**: Formatted output with configurable display options
105
106
### Interactive Commands
107
108
The CLI supports standard HiveQL commands plus Spark SQL extensions:
109
110
**Database Operations:**
111
```sql
112
-- Show databases
113
SHOW DATABASES;
114
115
-- Use database
116
USE my_database;
117
118
-- Show tables
119
SHOW TABLES;
120
```
121
122
**Configuration Management:**
123
```sql
124
-- Set configuration
125
SET spark.sql.adaptive.enabled=true;
126
127
-- Show configuration
128
SET spark.sql.adaptive.enabled;
129
130
-- Show all configuration
131
SET;
132
```
133
134
**Query Execution:**
135
```sql
136
-- Standard SQL queries
137
SELECT * FROM my_table WHERE condition = 'value';
138
139
-- Spark SQL specific features
140
SELECT explode(array_column) FROM my_table;
141
```
142
143
### Authentication and Security
144
145
The CLI supports the same authentication mechanisms as the Thrift Server:
146
147
#### Kerberos Authentication
148
149
```bash
150
# Kinit before starting CLI
151
kinit user@REALM.COM
152
$SPARK_HOME/bin/spark-sql
153
```
154
155
#### Configuration Properties
156
157
```scala
158
// Kerberos principal for delegation tokens
159
spark.yarn.keytab=/path/to/keytab
160
spark.yarn.principal=user@REALM.COM
161
162
// Hive metastore authentication
163
hive.metastore.sasl.enabled=true
164
hive.metastore.kerberos.principal=hive/_HOST@REALM.COM
165
```
166
167
## Connection Management
168
169
### Remote Connections
170
171
While the current implementation focuses on local CLI usage, it maintains compatibility with remote connection patterns:
172
173
```scala
174
private var transport: TSocket = _
175
```
176
177
**Connection Lifecycle:**
178
1. **Transport Creation**: TCP socket to remote Thrift Server
179
2. **Protocol Negotiation**: Thrift protocol version agreement
180
3. **Authentication**: Credential exchange if security enabled
181
4. **Session Establishment**: CLI session setup on server
182
5. **Command Processing**: Interactive query execution
183
6. **Cleanup**: Proper connection and session cleanup
184
185
### Error Handling
186
187
The CLI includes comprehensive error handling for common scenarios:
188
189
**Network Issues:**
190
- Connection timeouts and retries
191
- Transport layer failures
192
- Server unavailability
193
194
**Authentication Failures:**
195
- Invalid credentials
196
- Expired tokens
197
- Insufficient permissions
198
199
**Query Errors:**
200
- SQL parsing errors
201
- Execution failures
202
- Resource constraints
203
204
### Performance Considerations
205
206
**Query Result Streaming:**
207
- Large result sets streamed incrementally
208
- Configurable fetch size for memory management
209
- Progress indicators for long-running queries
210
211
**Resource Management:**
212
- Automatic cleanup of temporary resources
213
- Connection pooling for multiple sessions
214
- Memory-efficient result processing
215
216
**Configuration Tuning:**
217
```scala
218
// Result fetch size
219
spark.sql.thriftServer.incrementalCollect=true
220
221
// UI retention limits
222
spark.sql.thriftServer.ui.retainedSessions=200
223
spark.sql.thriftServer.ui.retainedStatements=1000
224
```