or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli-interface.mdenvironment-management.mdindex.mdquery-operations.mdserver-management.mdsession-management.mdweb-ui-monitoring.md

cli-interface.mddocs/

0

# CLI Interface

1

2

The Spark SQL CLI driver provides an interactive command-line interface for executing SQL queries directly against Spark SQL, similar to Hive's beeline tool but with Spark's enhanced capabilities.

3

4

## CLI Driver

5

6

### SparkSQLCLIDriver

7

8

Main CLI driver object that provides interactive SQL session management with Spark SQL integration.

9

10

```scala { .api }

11

private[hive] object SparkSQLCLIDriver extends Logging {

12

def main(args: Array[String]): Unit

13

def installSignalHandler(): Unit

14

}

15

```

16

17

#### main

18

19

Entry point for interactive CLI sessions. Processes command-line arguments and starts an interactive SQL shell.

20

21

**Usage Example:**

22

23

```bash

24

# Start interactive CLI

25

$SPARK_HOME/bin/spark-sql

26

27

# With specific options

28

$SPARK_HOME/bin/spark-sql --master local[4] --conf spark.sql.warehouse.dir=/tmp/warehouse

29

```

30

31

The main method performs the following initialization:

32

33

1. **Argument Processing**: Uses Hive's `OptionsProcessor` to parse command-line options

34

2. **Configuration Setup**: Merges Spark configuration with Hadoop and Hive settings

35

3. **Session State**: Creates CLI session state with merged configuration

36

4. **Environment Setup**: Initializes Spark SQL environment

37

5. **CLI Loop**: Starts interactive command processing

38

39

#### Configuration Merging

40

41

The CLI automatically merges configuration from multiple sources:

42

43

```scala

44

val sparkConf = new SparkConf(loadDefaults = true)

45

val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf)

46

val extraConfigs = HiveUtils.formatTimeVarsForHiveClient(hadoopConf)

47

48

val cliConf = new HiveConf(classOf[SessionState])

49

(hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue)

50

++ sparkConf.getAll.toMap ++ extraConfigs).foreach {

51

case (k, v) => cliConf.set(k, v)

52

}

53

```

54

55

This ensures CLI sessions have access to:

56

- Spark configuration properties

57

- Hadoop cluster settings

58

- Hive compatibility configurations

59

- User-specified overrides

60

61

#### installSignalHandler

62

63

Installs interrupt handlers for graceful query cancellation during interactive sessions.

64

65

**Signal Handling:**

66

67

```scala

68

def installSignalHandler(): Unit = {

69

HiveInterruptUtils.add(new HiveInterruptCallback {

70

override def interrupt(): Unit = {

71

// Handle remote execution mode

72

if (SparkSQLEnv.sparkContext != null) {

73

SparkSQLEnv.sparkContext.cancelAllJobs()

74

} else {

75

if (transport != null) {

76

// Force closing of TCP connection upon session termination

77

transport.getSocket.close()

78

}

79

}

80

}

81

})

82

}

83

```

84

85

When users press Ctrl+C during query execution:

86

1. **Local Mode**: Cancels all running Spark jobs

87

2. **Remote Mode**: Closes TCP transport connection to server

88

3. **Cleanup**: Ensures resources are properly released

89

90

## CLI Session Management

91

92

### Session State Integration

93

94

The CLI integrates with Hive's session state management while adding Spark-specific enhancements:

95

96

```scala

97

val sessionState = new CliSessionState(cliConf)

98

```

99

100

**Session Features:**

101

- **Command History**: Persistent command history across sessions

102

- **Variable Management**: Set/get session variables and configuration

103

- **Database Context**: Current database and catalog management

104

- **Query Results**: Formatted output with configurable display options

105

106

### Interactive Commands

107

108

The CLI supports standard HiveQL commands plus Spark SQL extensions:

109

110

**Database Operations:**

111

```sql

112

-- Show databases

113

SHOW DATABASES;

114

115

-- Use database

116

USE my_database;

117

118

-- Show tables

119

SHOW TABLES;

120

```

121

122

**Configuration Management:**

123

```sql

124

-- Set configuration

125

SET spark.sql.adaptive.enabled=true;

126

127

-- Show configuration

128

SET spark.sql.adaptive.enabled;

129

130

-- Show all configuration

131

SET;

132

```

133

134

**Query Execution:**

135

```sql

136

-- Standard SQL queries

137

SELECT * FROM my_table WHERE condition = 'value';

138

139

-- Spark SQL specific features

140

SELECT explode(array_column) FROM my_table;

141

```

142

143

### Authentication and Security

144

145

The CLI supports the same authentication mechanisms as the Thrift Server:

146

147

#### Kerberos Authentication

148

149

```bash

150

# Kinit before starting CLI

151

kinit user@REALM.COM

152

$SPARK_HOME/bin/spark-sql

153

```

154

155

#### Configuration Properties

156

157

```scala

158

// Kerberos principal for delegation tokens

159

spark.yarn.keytab=/path/to/keytab

160

spark.yarn.principal=user@REALM.COM

161

162

// Hive metastore authentication

163

hive.metastore.sasl.enabled=true

164

hive.metastore.kerberos.principal=hive/_HOST@REALM.COM

165

```

166

167

## Connection Management

168

169

### Remote Connections

170

171

While the current implementation focuses on local CLI usage, it maintains compatibility with remote connection patterns:

172

173

```scala

174

private var transport: TSocket = _

175

```

176

177

**Connection Lifecycle:**

178

1. **Transport Creation**: TCP socket to remote Thrift Server

179

2. **Protocol Negotiation**: Thrift protocol version agreement

180

3. **Authentication**: Credential exchange if security enabled

181

4. **Session Establishment**: CLI session setup on server

182

5. **Command Processing**: Interactive query execution

183

6. **Cleanup**: Proper connection and session cleanup

184

185

### Error Handling

186

187

The CLI includes comprehensive error handling for common scenarios:

188

189

**Network Issues:**

190

- Connection timeouts and retries

191

- Transport layer failures

192

- Server unavailability

193

194

**Authentication Failures:**

195

- Invalid credentials

196

- Expired tokens

197

- Insufficient permissions

198

199

**Query Errors:**

200

- SQL parsing errors

201

- Execution failures

202

- Resource constraints

203

204

### Performance Considerations

205

206

**Query Result Streaming:**

207

- Large result sets streamed incrementally

208

- Configurable fetch size for memory management

209

- Progress indicators for long-running queries

210

211

**Resource Management:**

212

- Automatic cleanup of temporary resources

213

- Connection pooling for multiple sessions

214

- Memory-efficient result processing

215

216

**Configuration Tuning:**

217

```scala

218

// Result fetch size

219

spark.sql.thriftServer.incrementalCollect=true

220

221

// UI retention limits

222

spark.sql.thriftServer.ui.retainedSessions=200

223

spark.sql.thriftServer.ui.retainedStatements=1000

224

```