or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mddata-type-conversion.mdfile-formats.mdindex.mdmetastore-operations.mdsession-management.mdudf-integration.md

session-management.mddocs/

0

# Session Management

1

2

Core functionality for creating and managing Spark sessions with Hive integration, providing both modern SparkSession-based approach and legacy HiveContext support.

3

4

## Capabilities

5

6

### SparkSession with Hive Support (Recommended)

7

8

Modern approach for enabling Hive integration using SparkSession builder pattern.

9

10

```scala { .api }

11

/**

12

* Enable Hive support for SparkSession, providing access to Hive metastore,

13

* HiveQL query execution, and Hive UDF/UDAF/UDTF functions

14

*/

15

def enableHiveSupport(): SparkSession.Builder

16

```

17

18

**Usage Examples:**

19

20

```scala

21

import org.apache.spark.sql.SparkSession

22

23

// Basic Hive-enabled session

24

val spark = SparkSession.builder()

25

.appName("Hive Integration App")

26

.enableHiveSupport()

27

.getOrCreate()

28

29

// With additional configuration

30

val spark = SparkSession.builder()

31

.appName("Advanced Hive App")

32

.config("spark.sql.warehouse.dir", "/user/hive/warehouse")

33

.config("spark.sql.hive.metastore.version", "2.3.0")

34

.enableHiveSupport()

35

.getOrCreate()

36

37

// Execute HiveQL

38

spark.sql("SHOW DATABASES").show()

39

spark.sql("USE my_database")

40

val result = spark.sql("SELECT * FROM my_table LIMIT 10")

41

```

42

43

### HiveContext (Legacy - Fully Deprecated)

44

45

**⚠️ DEPRECATED**: HiveContext is fully deprecated since Spark 2.0.0 and should not be used in new applications. All functionality has been replaced by `SparkSession.builder().enableHiveSupport()`.

46

47

```scala { .api }

48

/**

49

* Legacy Hive integration context - FULLY DEPRECATED since 2.0.0

50

* This class is a thin wrapper around SparkSession and will be removed in future versions

51

* Use SparkSession.builder.enableHiveSupport instead

52

*/

53

@deprecated("Use SparkSession.builder.enableHiveSupport instead", "2.0.0")

54

class HiveContext private[hive](_sparkSession: SparkSession) extends SQLContext(_sparkSession) {

55

56

/**

57

* Create HiveContext from SparkContext

58

*/

59

def this(sc: SparkContext)

60

61

/**

62

* Create HiveContext from JavaSparkContext

63

*/

64

def this(sc: JavaSparkContext)

65

66

/**

67

* Create new HiveContext session with separated SQLConf, UDF/UDAF,

68

* temporary tables and SessionState, but sharing CacheManager,

69

* IsolatedClientLoader and Hive client

70

*/

71

override def newSession(): HiveContext

72

73

/**

74

* Invalidate and refresh cached metadata for the given table

75

* @param tableName - Name of table to refresh

76

*/

77

def refreshTable(tableName: String): Unit

78

}

79

```

80

81

**Usage Examples (DO NOT USE - Deprecated):**

82

83

```scala

84

// ❌ DEPRECATED - DO NOT USE IN NEW CODE

85

import org.apache.spark.{SparkConf, SparkContext}

86

import org.apache.spark.sql.hive.HiveContext

87

88

// Create HiveContext (deprecated approach)

89

val conf = new SparkConf().setAppName("Hive Legacy App")

90

val sc = new SparkContext(conf)

91

val hiveContext = new HiveContext(sc)

92

93

// Execute queries

94

val result = hiveContext.sql("SELECT * FROM my_table")

95

result.show()

96

97

// Refresh table metadata

98

hiveContext.refreshTable("my_table")

99

100

// Create new session

101

val newSession = hiveContext.newSession()

102

```

103

104

**✅ Use This Instead:**

105

106

```scala

107

import org.apache.spark.sql.SparkSession

108

109

// Modern approach (recommended)

110

val spark = SparkSession.builder()

111

.appName("Modern Hive App")

112

.enableHiveSupport()

113

.getOrCreate()

114

115

// Execute queries (same API)

116

val result = spark.sql("SELECT * FROM my_table")

117

result.show()

118

119

// Refresh table metadata

120

spark.catalog.refreshTable("my_table")

121

122

// Create new session

123

val newSession = spark.newSession()

124

```

125

126

### Session State and Resource Management

127

128

Components for managing Hive-aware session state and resources.

129

130

```scala { .api }

131

/**

132

* Builder for Hive-aware SessionState

133

*/

134

class HiveSessionStateBuilder(

135

session: SparkSession,

136

parentState: Option[SessionState] = None

137

) extends BaseSessionStateBuilder(session, parentState)

138

139

/**

140

* Hive-aware resource loader for adding JARs to both Spark and Hive

141

*/

142

class HiveSessionResourceLoader(sparkSession: SparkSession) extends SessionResourceLoader(sparkSession) {

143

/**

144

* Add JAR to both Spark SQL and Hive client classpaths

145

* @param path - Path to JAR file

146

*/

147

override def addJar(path: String): Unit

148

}

149

```

150

151

**Configuration Integration:**

152

153

```scala

154

import org.apache.spark.sql.SparkSession

155

import org.apache.spark.sql.hive.HiveUtils

156

157

// Configure with Hive-specific settings

158

val spark = SparkSession.builder()

159

.appName("Configured Hive App")

160

.config(HiveUtils.HIVE_METASTORE_VERSION.key, "2.3.0")

161

.config(HiveUtils.CONVERT_METASTORE_PARQUET.key, "true")

162

.config(HiveUtils.CONVERT_METASTORE_ORC.key, "true")

163

.enableHiveSupport()

164

.getOrCreate()

165

166

// Access session-level catalog

167

val catalog = spark.catalog

168

catalog.listDatabases().show()

169

catalog.listTables("default").show()

170

```

171

172

### Session Utility Methods

173

174

Helper methods for session management and configuration.

175

176

```scala { .api }

177

object HiveUtils {

178

/**

179

* Configure SparkContext with Hive external catalog support

180

* @param sc - SparkContext to configure

181

* @return Configured SparkContext

182

*/

183

def withHiveExternalCatalog(sc: SparkContext): SparkContext

184

}

185

```

186

187

**Session Lifecycle Management:**

188

189

```scala

190

import org.apache.spark.sql.SparkSession

191

192

// Create session

193

val spark = SparkSession.builder()

194

.appName("Hive Session Lifecycle")

195

.enableHiveSupport()

196

.getOrCreate()

197

198

try {

199

// Use session for Hive operations

200

spark.sql("SHOW TABLES").show()

201

202

// Create new session (shares metastore connection)

203

val newSession = spark.newSession()

204

newSession.sql("USE another_database")

205

206

} finally {

207

// Clean up

208

spark.stop()

209

}

210

```

211

212

## Error Handling

213

214

Common exceptions and error handling patterns:

215

216

```scala

217

import org.apache.spark.sql.AnalysisException

218

import org.apache.spark.sql.catalyst.analysis.NoSuchTableException

219

220

try {

221

val spark = SparkSession.builder()

222

.enableHiveSupport()

223

.getOrCreate()

224

225

spark.sql("SELECT * FROM non_existent_table")

226

} catch {

227

case e: AnalysisException =>

228

println(s"Analysis error: ${e.getMessage}")

229

case e: NoSuchTableException =>

230

println(s"Table not found: ${e.getMessage}")

231

case e: Exception =>

232

println(s"Unexpected error: ${e.getMessage}")

233

}

234

```

235

236

## Migration from HiveContext to SparkSession

237

238

For migrating legacy code from HiveContext to SparkSession:

239

240

```scala

241

// OLD (Deprecated)

242

import org.apache.spark.sql.hive.HiveContext

243

val hiveContext = new HiveContext(sparkContext)

244

val df = hiveContext.sql("SELECT * FROM table")

245

246

// NEW (Recommended)

247

import org.apache.spark.sql.SparkSession

248

val spark = SparkSession.builder()

249

.sparkContext(sparkContext) // Reuse existing SparkContext if needed

250

.enableHiveSupport()

251

.getOrCreate()

252

val df = spark.sql("SELECT * FROM table")

253

```