or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli-interface.mdenvironment-management.mdindex.mdmonitoring-ui.mdquery-execution.mdserver-management.mdsession-management.md

index.mddocs/

0

# Spark Hive Thrift Server

1

2

Spark Hive Thrift Server provides a Thrift-based JDBC/ODBC interface for Spark SQL, making it compatible with HiveServer2 clients. It enables remote access to Spark SQL through standard database connectivity protocols, allowing users to connect using JDBC drivers and execute SQL queries against Spark datasets and tables.

3

4

The server implements the HiveServer2 thrift interface but uses Spark SQL as the execution engine instead of Hive, providing better performance and broader data source support. It includes support for concurrent sessions, query execution management, and a web UI for monitoring active connections and queries.

5

6

## Package Information

7

8

- **Package Name**: spark-hive-thriftserver_2.11

9

- **Package Type**: Maven

10

- **Language**: Scala

11

- **Group ID**: org.apache.spark

12

- **Version**: 1.6.3

13

- **Installation**: Add Maven dependency or use pre-built Spark distribution

14

15

## Core Imports

16

17

```scala

18

import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

19

import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

20

import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv

21

import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIService

22

import org.apache.spark.sql.hive.thriftserver.ReflectionUtils

23

import org.apache.spark.sql.hive.HiveContext

24

```

25

26

## Basic Usage

27

28

### Server Mode

29

```scala

30

// Start thrift server standalone

31

object MyThriftServer extends App {

32

HiveThriftServer2.main(args)

33

}

34

35

// Or programmatically with existing context

36

import org.apache.spark.sql.hive.HiveContext

37

38

val hiveContext = new HiveContext(sparkContext)

39

HiveThriftServer2.startWithContext(hiveContext)

40

```

41

42

### CLI Mode

43

```scala

44

// Start interactive SQL CLI

45

object MySQLCLI extends App {

46

SparkSQLCLIDriver.main(args)

47

}

48

```

49

50

### Environment Setup

51

```scala

52

// Initialize Spark SQL environment

53

SparkSQLEnv.init()

54

55

// Access shared contexts

56

val sparkContext = SparkSQLEnv.sparkContext

57

val hiveContext = SparkSQLEnv.hiveContext

58

59

// Clean shutdown

60

SparkSQLEnv.stop()

61

```

62

63

## Architecture

64

65

The Spark Hive Thrift Server is built around several key components:

66

67

- **Server Entry Points**: `HiveThriftServer2` and `SparkSQLCLIDriver` provide main application entry points for server and CLI modes

68

- **Environment Management**: `SparkSQLEnv` manages shared Spark and Hive contexts with optimal configurations

69

- **Session Management**: `SparkSQLSessionManager` handles client session lifecycle and isolation

70

- **Query Execution**: `SparkExecuteStatementOperation` and `SparkSQLDriver` process SQL statements and manage results

71

- **Service Layer**: `SparkSQLCLIService` implements the Thrift service interface compatible with HiveServer2

72

- **Web UI Integration**: Monitoring and statistics through Spark's web UI with dedicated JDBC/ODBC server tab

73

- **Reflection Utilities**: `ReflectionUtils` provides compatibility layer for Hive integration

74

75

## Capabilities

76

77

### Server Management

78

79

Core server lifecycle management including startup, configuration, and shutdown operations.

80

81

```scala { .api }

82

object HiveThriftServer2 {

83

def main(args: Array[String]): Unit

84

85

@DeveloperApi

86

def startWithContext(sqlContext: HiveContext): Unit

87

88

var LOG: Log

89

var uiTab: Option[ThriftServerTab]

90

var listener: HiveThriftServer2Listener

91

}

92

```

93

94

[Server Management](./server-management.md)

95

96

### CLI Interface

97

98

Interactive command-line interface for executing SQL queries with Hive CLI compatibility.

99

100

```scala { .api }

101

object SparkSQLCLIDriver {

102

def main(args: Array[String]): Unit

103

def installSignalHandler(): Unit

104

}

105

106

private[hive] class SparkSQLCLIDriver extends CliDriver {

107

override def processCmd(cmd: String): Int

108

}

109

```

110

111

[CLI Interface](./cli-interface.md)

112

113

### Environment Management

114

115

Centralized management of Spark and Hive execution contexts with optimized configurations.

116

117

```scala { .api }

118

object SparkSQLEnv {

119

var hiveContext: HiveContext

120

var sparkContext: SparkContext

121

122

def init(): Unit

123

def stop(): Unit

124

}

125

```

126

127

[Environment Management](./environment-management.md)

128

129

### Session Management

130

131

Client session lifecycle management with isolation and resource cleanup.

132

133

```scala { .api }

134

private[hive] class SparkSQLSessionManager(

135

hiveServer: HiveServer2,

136

hiveContext: HiveContext

137

) extends SessionManager {

138

override def openSession(...): SessionHandle

139

override def closeSession(sessionHandle: SessionHandle): Unit

140

}

141

```

142

143

[Session Management](./session-management.md)

144

145

### Query Execution

146

147

SQL statement execution with result management and schema introspection.

148

149

```scala { .api }

150

private[hive] class SparkExecuteStatementOperation(

151

parentSession: HiveSession,

152

statement: String,

153

confOverlay: JMap[String, String],

154

runInBackground: Boolean

155

) extends ExecuteStatementOperation {

156

def close(): Unit

157

def getNextRowSet(order: FetchOrientation, maxRowsL: Long): RowSet

158

def getResultSetSchema: TableSchema

159

def cancel(): Unit

160

}

161

162

private[hive] class SparkSQLDriver(

163

context: HiveContext = SparkSQLEnv.hiveContext

164

) extends Driver {

165

def init(): Unit

166

def run(command: String): CommandProcessorResponse

167

def close(): Int

168

def getResults(res: JList[_]): Boolean

169

def getSchema: Schema

170

def destroy(): Unit

171

}

172

```

173

174

[Query Execution](./query-execution.md)

175

176

### Monitoring and UI

177

178

Web-based monitoring interface with session tracking and query statistics.

179

180

```scala { .api }

181

private[thriftserver] class HiveThriftServer2Listener(

182

server: HiveServer2,

183

conf: SQLConf

184

) extends SparkListener {

185

def getOnlineSessionNum: Int

186

def getTotalRunning: Int

187

def getSessionList: Seq[SessionInfo]

188

def getSession(sessionId: String): Option[SessionInfo]

189

def getExecutionList: Seq[ExecutionInfo]

190

}

191

192

private[thriftserver] class ThriftServerTab(

193

sparkContext: SparkContext

194

) extends SparkUITab {

195

def detach(): Unit

196

}

197

```

198

199

[Monitoring and UI](./monitoring-ui.md)

200

201

### Service Layer Integration

202

203

Core Thrift service implementation providing HiveServer2 compatibility layer.

204

205

```scala { .api }

206

private[hive] class SparkSQLCLIService(

207

hiveServer: HiveServer2,

208

hiveContext: HiveContext

209

) extends CLIService(hiveServer) {

210

override def init(hiveConf: HiveConf): Unit

211

override def start(): Unit

212

override def stop(): Unit

213

}

214

```

215

216

### Reflection Utilities

217

218

Utility methods for accessing private fields and methods in Hive classes for compatibility.

219

220

```scala { .api }

221

private[hive] object ReflectionUtils {

222

def setSuperField(obj: Object, fieldName: String, fieldValue: Object): Unit

223

def setAncestorField(obj: AnyRef, level: Int, fieldName: String, fieldValue: AnyRef): Unit

224

def getSuperField[T](obj: AnyRef, fieldName: String): T

225

def getAncestorField[T](clazz: Object, level: Int, fieldName: String): T

226

def invokeStatic(clazz: Class[_], methodName: String, args: (Class[_], AnyRef)*): AnyRef

227

def invoke(clazz: Class[_], obj: AnyRef, methodName: String, args: (Class[_], AnyRef)*): AnyRef

228

}

229

```

230

231

## Configuration

232

233

### Spark Configuration Properties

234

- `spark.app.name` - Application name (default: "SparkSQL::{hostname}")

235

- `spark.serializer` - Serializer class (default: KryoSerializer)

236

- `spark.kryo.referenceTracking` - Kryo reference tracking (default: false)

237

- `spark.ui.enabled` - Enable Spark Web UI (default: true)

238

239

### Hive Server Configuration Properties

240

- `hive.server2.transport.mode` - Transport mode ("binary" or "http")

241

- `hive.server2.async.exec.threads` - Background execution thread pool size

242

- `hive.server2.logging.operation.enabled` - Enable operation logging

243

244

### SQL Configuration Properties

245

- `SQLConf.THRIFTSERVER_POOL.key` - Scheduler pool for query execution

246

- `SQLConf.THRIFTSERVER_UI_STATEMENT_LIMIT` - Maximum statements retained in UI

247

- `SQLConf.THRIFTSERVER_UI_SESSION_LIMIT` - Maximum sessions retained in UI

248

249

## Common Types

250

251

```scala { .api }

252

private[thriftserver] class SessionInfo(

253

val sessionId: String,

254

val startTimestamp: Long,

255

val ip: String,

256

val userName: String

257

) {

258

var finishTimestamp: Long

259

var totalExecution: Int

260

def totalTime: Long

261

}

262

263

private[thriftserver] class ExecutionInfo(

264

val statement: String,

265

val sessionId: String,

266

val startTimestamp: Long,

267

val userName: String

268

) {

269

var finishTimestamp: Long

270

var executePlan: String

271

var detail: String

272

var state: ExecutionState.Value

273

val jobId: ArrayBuffer[String]

274

var groupId: String

275

def totalTime: Long

276

}

277

278

private[thriftserver] object ExecutionState extends Enumeration {

279

val STARTED, COMPILED, FAILED, FINISHED = Value

280

type ExecutionState = Value

281

}

282

```