Tessl Tile for maven/org.apache.spark/spark-hive-thriftserver_2.11@2.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cli-interface.md environment-management.md index.md query-operations.md server-management.md session-management.md web-ui-monitoring.md

cli-interface.mddocs/

0
# CLI Interface
1

2
The Spark SQL CLI driver provides an interactive command-line interface for executing SQL queries directly against Spark SQL, similar to Hive's beeline tool but with Spark's enhanced capabilities.
3

4
## CLI Driver
5

6
### SparkSQLCLIDriver
7

8
Main CLI driver object that provides interactive SQL session management with Spark SQL integration.
9

10
```scala { .api }
11
private[hive] object SparkSQLCLIDriver extends Logging {
12
  def main(args: Array[String]): Unit
13
  def installSignalHandler(): Unit
14
}
15
```
16

17
#### main
18

19
Entry point for interactive CLI sessions. Processes command-line arguments and starts an interactive SQL shell.
20

21
**Usage Example:**
22

23
```bash
24
# Start interactive CLI
25
$SPARK_HOME/bin/spark-sql
26

27
# With specific options
28
$SPARK_HOME/bin/spark-sql --master local[4] --conf spark.sql.warehouse.dir=/tmp/warehouse
29
```
30

31
The main method performs the following initialization:
32

33
1. **Argument Processing**: Uses Hive's `OptionsProcessor` to parse command-line options
34
2. **Configuration Setup**: Merges Spark configuration with Hadoop and Hive settings
35
3. **Session State**: Creates CLI session state with merged configuration
36
4. **Environment Setup**: Initializes Spark SQL environment
37
5. **CLI Loop**: Starts interactive command processing
38

39
#### Configuration Merging
40

41
The CLI automatically merges configuration from multiple sources:
42

43
```scala
44
val sparkConf = new SparkConf(loadDefaults = true)
45
val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
46
val extraConfigs = HiveUtils.formatTimeVarsForHiveClient(hadoopConf)
47

48
val cliConf = new HiveConf(classOf[SessionState])
49
(hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue)
50
  ++ sparkConf.getAll.toMap ++ extraConfigs).foreach {
51
  case (k, v) => cliConf.set(k, v)
52
}
53
```
54

55
This ensures CLI sessions have access to:
56
- Spark configuration properties
57
- Hadoop cluster settings  
58
- Hive compatibility configurations
59
- User-specified overrides
60

61
#### installSignalHandler  
62

63
Installs interrupt handlers for graceful query cancellation during interactive sessions.
64

65
**Signal Handling:**
66

67
```scala
68
def installSignalHandler(): Unit = {
69
  HiveInterruptUtils.add(new HiveInterruptCallback {
70
    override def interrupt(): Unit = {
71
      // Handle remote execution mode
72
      if (SparkSQLEnv.sparkContext != null) {
73
        SparkSQLEnv.sparkContext.cancelAllJobs()
74
      } else {
75
        if (transport != null) {
76
          // Force closing of TCP connection upon session termination
77
          transport.getSocket.close()
78
        }
79
      }
80
    }
81
  })
82
}
83
```
84

85
When users press Ctrl+C during query execution:
86
1. **Local Mode**: Cancels all running Spark jobs
87
2. **Remote Mode**: Closes TCP transport connection to server
88
3. **Cleanup**: Ensures resources are properly released
89

90
## CLI Session Management
91

92
### Session State Integration
93

94
The CLI integrates with Hive's session state management while adding Spark-specific enhancements:
95

96
```scala
97
val sessionState = new CliSessionState(cliConf)
98
```
99

100
**Session Features:**
101
- **Command History**: Persistent command history across sessions
102
- **Variable Management**: Set/get session variables and configuration
103
- **Database Context**: Current database and catalog management
104
- **Query Results**: Formatted output with configurable display options
105

106
### Interactive Commands
107

108
The CLI supports standard HiveQL commands plus Spark SQL extensions:
109

110
**Database Operations:**
111
```sql
112
-- Show databases
113
SHOW DATABASES;
114

115
-- Use database  
116
USE my_database;
117

118
-- Show tables
119
SHOW TABLES;
120
```
121

122
**Configuration Management:**
123
```sql  
124
-- Set configuration
125
SET spark.sql.adaptive.enabled=true;
126

127
-- Show configuration
128
SET spark.sql.adaptive.enabled;
129

130
-- Show all configuration
131
SET;
132
```
133

134
**Query Execution:**
135
```sql
136
-- Standard SQL queries
137
SELECT * FROM my_table WHERE condition = 'value';
138

139
-- Spark SQL specific features
140
SELECT explode(array_column) FROM my_table;
141
```
142

143
### Authentication and Security
144

145
The CLI supports the same authentication mechanisms as the Thrift Server:
146

147
#### Kerberos Authentication
148

149
```bash
150
# Kinit before starting CLI
151
kinit user@REALM.COM
152
$SPARK_HOME/bin/spark-sql
153
```
154

155
#### Configuration Properties
156

157
```scala
158
// Kerberos principal for delegation tokens
159
spark.yarn.keytab=/path/to/keytab
160
spark.yarn.principal=user@REALM.COM
161

162
// Hive metastore authentication
163
hive.metastore.sasl.enabled=true
164
hive.metastore.kerberos.principal=hive/_HOST@REALM.COM
165
```
166

167
## Connection Management
168

169
### Remote Connections  
170

171
While the current implementation focuses on local CLI usage, it maintains compatibility with remote connection patterns:
172

173
```scala
174
private var transport: TSocket = _
175
```
176

177
**Connection Lifecycle:**
178
1. **Transport Creation**: TCP socket to remote Thrift Server
179
2. **Protocol Negotiation**: Thrift protocol version agreement
180
3. **Authentication**: Credential exchange if security enabled
181
4. **Session Establishment**: CLI session setup on server
182
5. **Command Processing**: Interactive query execution
183
6. **Cleanup**: Proper connection and session cleanup
184

185
### Error Handling
186

187
The CLI includes comprehensive error handling for common scenarios:
188

189
**Network Issues:**
190
- Connection timeouts and retries
191
- Transport layer failures
192
- Server unavailability
193

194
**Authentication Failures:**  
195
- Invalid credentials
196
- Expired tokens
197
- Insufficient permissions
198

199
**Query Errors:**
200
- SQL parsing errors
201
- Execution failures
202
- Resource constraints
203

204
### Performance Considerations
205

206
**Query Result Streaming:**
207
- Large result sets streamed incrementally
208
- Configurable fetch size for memory management
209
- Progress indicators for long-running queries
210

211
**Resource Management:**
212
- Automatic cleanup of temporary resources
213
- Connection pooling for multiple sessions
214
- Memory-efficient result processing
215

216
**Configuration Tuning:**
217
```scala
218
// Result fetch size
219
spark.sql.thriftServer.incrementalCollect=true
220

221
// UI retention limits  
222
spark.sql.thriftServer.ui.retainedSessions=200
223
spark.sql.thriftServer.ui.retainedStatements=1000
224
```

Version

Tile

Files

cli-interface.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

cli-interface.mddocs/