or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli-driver.mdcli-services.mdindex.mdmetadata-operations.mdoperation-management.mdserver-management.mdsession-management.mdsql-execution.mdweb-ui.md

index.mddocs/

0

# Spark Hive Thrift Server

1

2

Apache Spark Hive Thrift Server provides HiveServer2 compatibility for Spark SQL, enabling JDBC/ODBC connectivity and Hive CLI compatibility for Spark SQL queries. It offers a complete thrift-based server implementation with session management, authentication, and comprehensive metadata operations.

3

4

## Package Information

5

6

- **Package Name**: spark-hive-thriftserver_2.12

7

- **Package Type**: maven

8

- **Language**: Scala/Java

9

- **GroupId**: org.apache.spark

10

- **ArtifactId**: spark-hive-thriftserver_2.12

11

- **Version**: 3.5.6

12

- **Installation**: Add to Maven POM or use with Spark distribution

13

14

## Core Imports

15

16

```scala

17

import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

18

import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

19

import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv

20

import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIService

21

import org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager

22

import org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager

23

import org.apache.spark.sql.SQLContext

24

```

25

26

For Java usage (requires Hive dependencies):

27

```java

28

import org.apache.hive.service.cli.ICLIService;

29

import org.apache.hive.service.cli.SessionHandle;

30

import org.apache.hive.service.cli.OperationHandle;

31

```

32

33

**Note**: Many interfaces (`ICLIService`, `SessionHandle`, etc.) are provided by the Apache Hive library, which is included as a dependency of this module.

34

35

## Basic Usage

36

37

### Starting the Thrift Server

38

39

```scala

40

import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

41

import org.apache.spark.sql.SQLContext

42

43

// Initialize Spark SQL environment

44

SparkSQLEnv.init()

45

46

// Start the thrift server with SQL context

47

val server = HiveThriftServer2.startWithContext(SparkSQLEnv.sqlContext)

48

```

49

50

### Using the CLI Driver

51

52

```scala

53

import org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

54

55

// Start interactive SQL CLI

56

SparkSQLCLIDriver.main(Array("--hiveconf", "hive.server2.thrift.port=10000"))

57

```

58

59

### JDBC Connection (from client applications)

60

61

```java

62

// Standard JDBC connection to Spark Thrift Server

63

String url = "jdbc:hive2://localhost:10000/default";

64

Connection conn = DriverManager.getConnection(url, "username", "password");

65

Statement stmt = conn.createStatement();

66

ResultSet rs = stmt.executeQuery("SELECT * FROM my_table");

67

```

68

69

## Architecture

70

71

The Spark Hive Thrift Server is built around several key components:

72

73

- **Server Management**: `HiveThriftServer2` provides the main server lifecycle and initialization

74

- **CLI Services**: `SparkSQLCLIService` implements the core CLI service interface with Spark SQL integration

75

- **Session Management**: `SparkSQLSessionManager` handles client sessions and their associated SQL contexts

76

- **Operation Management**: `SparkSQLOperationManager` creates and manages SQL operations and metadata operations

77

- **SQL Execution**: `SparkSQLDriver` and `SparkExecuteStatementOperation` execute SQL queries using Spark SQL engine

78

- **CLI Interface**: `SparkSQLCLIDriver` provides interactive command-line interface

79

- **Web UI**: Integration with Spark Web UI for monitoring sessions and queries

80

- **Transport Protocols**: Support for both binary Thrift and HTTP transport modes

81

- **Authentication**: Kerberos, SPNEGO, and custom authentication provider support

82

83

## Capabilities

84

85

### Server Management

86

87

Core server lifecycle management and initialization with Spark SQL integration.

88

89

```scala { .api }

90

object HiveThriftServer2 {

91

def startWithContext(sqlContext: SQLContext): HiveThriftServer2

92

def main(args: Array[String]): Unit

93

94

// Note: ExecutionState is private[thriftserver] - not part of public API

95

private[thriftserver] object ExecutionState extends Enumeration {

96

val STARTED, COMPILED, CANCELED, TIMEDOUT, FAILED, FINISHED, CLOSED = Value

97

}

98

}

99

```

100

101

[Server Management](./server-management.md)

102

103

### CLI Services

104

105

Comprehensive CLI service implementation providing HiveServer2 compatibility with Spark SQL enhancements.

106

107

```scala { .api }

108

class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLContext) extends CLIService(hiveServer) {

109

override def init(hiveConf: HiveConf): Unit

110

override def start(): Unit

111

override def getInfo(sessionHandle: SessionHandle, getInfoType: GetInfoType): GetInfoValue

112

}

113

```

114

115

[CLI Services](./cli-services.md)

116

117

### Session Management

118

119

Client session management with SQL context association and configuration handling.

120

121

```scala { .api }

122

class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext: SQLContext) extends SessionManager(hiveServer) {

123

override def openSession(

124

protocol: TProtocolVersion,

125

username: String,

126

passwd: String,

127

ipAddress: String,

128

sessionConf: java.util.Map[String, String],

129

withImpersonation: Boolean,

130

delegationToken: String

131

): SessionHandle

132

override def closeSession(sessionHandle: SessionHandle): Unit

133

def setConfMap(conf: SQLContext, confMap: java.util.Map[String, String]): Unit

134

}

135

```

136

137

[Session Management](./session-management.md)

138

139

### Operation Management

140

141

Manages SQL operations and metadata operations with session context mapping.

142

143

```scala { .api }

144

class SparkSQLOperationManager extends OperationManager {

145

val sessionToContexts: ConcurrentHashMap[SessionHandle, SQLContext]

146

147

override def newExecuteStatementOperation(

148

parentSession: HiveSession,

149

statement: String,

150

confOverlay: java.util.Map[String, String],

151

async: Boolean,

152

queryTimeout: Long

153

): ExecuteStatementOperation

154

155

override def newGetTablesOperation(

156

parentSession: HiveSession,

157

catalogName: String,

158

schemaName: String,

159

tableName: String,

160

tableTypes: java.util.List[String]

161

): MetadataOperation

162

163

override def newGetColumnsOperation(

164

parentSession: HiveSession,

165

catalogName: String,

166

schemaName: String,

167

tableName: String,

168

columnName: String

169

): GetColumnsOperation

170

171

override def newGetSchemasOperation(

172

parentSession: HiveSession,

173

catalogName: String,

174

schemaName: String

175

): GetSchemasOperation

176

177

override def newGetFunctionsOperation(

178

parentSession: HiveSession,

179

catalogName: String,

180

schemaName: String,

181

functionName: String

182

): GetFunctionsOperation

183

184

override def newGetTypeInfoOperation(parentSession: HiveSession): GetTypeInfoOperation

185

override def newGetCatalogsOperation(parentSession: HiveSession): GetCatalogsOperation

186

override def newGetTableTypesOperation(parentSession: HiveSession): GetTableTypesOperation

187

}

188

```

189

190

[Operation Management](./operation-management.md)

191

192

### SQL Execution

193

194

SQL statement execution with Spark SQL engine integration and result handling.

195

196

```scala { .api }

197

class SparkExecuteStatementOperation {

198

def getNextRowSet(order: FetchOrientation, maxRowsL: Long): TRowSet

199

def getResultSetSchema: TTableSchema

200

def runInternal(): Unit

201

def cancel(): Unit

202

def timeoutCancel(): Unit

203

}

204

205

class SparkSQLDriver(context: SQLContext) extends Driver {

206

override def init(): Unit

207

override def run(command: String): CommandProcessorResponse

208

override def close(): Int

209

override def getResults(res: JList[_]): Boolean

210

override def getSchema: Schema

211

override def destroy(): Unit

212

}

213

```

214

215

[SQL Execution](./sql-execution.md)

216

217

### Metadata Operations

218

219

Comprehensive metadata operations for catalogs, schemas, tables, columns, functions, and type information.

220

221

```java { .api }

222

interface ICLIService {

223

OperationHandle getCatalogs(SessionHandle sessionHandle);

224

OperationHandle getSchemas(SessionHandle sessionHandle, String catalogName, String schemaName);

225

OperationHandle getTables(SessionHandle sessionHandle, String catalogName, String schemaName, String tableName, List<String> tableTypes);

226

OperationHandle getColumns(SessionHandle sessionHandle, String catalogName, String schemaName, String tableName, String columnName);

227

OperationHandle getFunctions(SessionHandle sessionHandle, String catalogName, String schemaName, String functionName);

228

OperationHandle getTypeInfo(SessionHandle sessionHandle);

229

}

230

```

231

232

[Metadata Operations](./metadata-operations.md)

233

234

### CLI Driver

235

236

Interactive command-line interface with SQL completion, history, and signal handling.

237

238

```scala { .api }

239

object SparkSQLCLIDriver {

240

def main(args: Array[String]): Unit

241

def installSignalHandler(): Unit

242

def printUsage(): Unit

243

}

244

245

class SparkSQLCLIDriver {

246

def processCmd(cmd: String): Int

247

def processLine(line: String, allowInterrupting: Boolean): Int

248

def printMasterAndAppId(): Unit

249

}

250

```

251

252

[CLI Driver](./cli-driver.md)

253

254

### Web UI Integration

255

256

Spark Web UI integration for monitoring thrift server sessions, queries, and performance metrics.

257

258

```scala { .api }

259

class ThriftServerTab {

260

def detach(): Unit

261

}

262

263

class HiveThriftServer2Listener {

264

// Event listener for UI display and metrics collection

265

}

266

```

267

268

[Web UI Integration](./web-ui.md)

269

270

## Types

271

272

### Core Handle Types

273

274

```java { .api }

275

class SessionHandle extends Handle {

276

// Identifies client sessions

277

}

278

279

class OperationHandle extends Handle {

280

// Identifies operations (queries, metadata calls)

281

}

282

283

abstract class Handle {

284

HandleIdentifier getHandleIdentifier()

285

}

286

```

287

288

### Operation Types

289

290

```java { .api }

291

enum OperationType {

292

EXECUTE_STATEMENT,

293

GET_TYPE_INFO,

294

GET_CATALOGS,

295

GET_SCHEMAS,

296

GET_TABLES,

297

GET_COLUMNS,

298

GET_FUNCTIONS,

299

GET_PRIMARY_KEYS,

300

GET_CROSS_REFERENCE

301

}

302

303

enum OperationState {

304

INITIALIZED,

305

RUNNING,

306

FINISHED,

307

CANCELED,

308

CLOSED,

309

ERROR,

310

UNKNOWN

311

}

312

```

313

314

### Data Transfer Types

315

316

```java { .api }

317

abstract class RowSet {

318

// Base class for result sets

319

}

320

321

class RowBasedSet extends RowSet {

322

// Row-based result set implementation

323

}

324

325

class ColumnBasedSet extends RowSet {

326

// Column-based result set implementation

327

}

328

329

class TableSchema {

330

List<ColumnDescriptor> getColumns()

331

}

332

333

class ColumnDescriptor {

334

String getName()

335

TypeDescriptor getTypeDescriptor()

336

String getComment()

337

}

338

```

339

340

### Configuration Types

341

342

```java { .api }

343

enum FetchOrientation {

344

FETCH_NEXT,

345

FETCH_PRIOR,

346

FETCH_RELATIVE,

347

FETCH_ABSOLUTE,

348

FETCH_FIRST,

349

FETCH_LAST

350

}

351

352

enum FetchType {

353

QUERY_OUTPUT,

354

LOG

355

}

356

357

class GetInfoValue {

358

String getStringValue()

359

short getShortValue()

360

int getIntValue()

361

long getLongValue()

362

}

363

```