0
# Session Management
1
2
Core functionality for creating and managing Spark sessions with Hive integration, providing both modern SparkSession-based approach and legacy HiveContext support.
3
4
## Capabilities
5
6
### SparkSession with Hive Support (Recommended)
7
8
Modern approach for enabling Hive integration using SparkSession builder pattern.
9
10
```scala { .api }
11
/**
12
* Enable Hive support for SparkSession, providing access to Hive metastore,
13
* HiveQL query execution, and Hive UDF/UDAF/UDTF functions
14
*/
15
def enableHiveSupport(): SparkSession.Builder
16
```
17
18
**Usage Examples:**
19
20
```scala
21
import org.apache.spark.sql.SparkSession
22
23
// Basic Hive-enabled session
24
val spark = SparkSession.builder()
25
.appName("Hive Integration App")
26
.enableHiveSupport()
27
.getOrCreate()
28
29
// With additional configuration
30
val spark = SparkSession.builder()
31
.appName("Advanced Hive App")
32
.config("spark.sql.warehouse.dir", "/user/hive/warehouse")
33
.config("spark.sql.hive.metastore.version", "2.3.0")
34
.enableHiveSupport()
35
.getOrCreate()
36
37
// Execute HiveQL
38
spark.sql("SHOW DATABASES").show()
39
spark.sql("USE my_database")
40
val result = spark.sql("SELECT * FROM my_table LIMIT 10")
41
```
42
43
### HiveContext (Legacy - Fully Deprecated)
44
45
**⚠️ DEPRECATED**: HiveContext is fully deprecated since Spark 2.0.0 and should not be used in new applications. All functionality has been replaced by `SparkSession.builder().enableHiveSupport()`.
46
47
```scala { .api }
48
/**
49
* Legacy Hive integration context - FULLY DEPRECATED since 2.0.0
50
* This class is a thin wrapper around SparkSession and will be removed in future versions
51
* Use SparkSession.builder.enableHiveSupport instead
52
*/
53
@deprecated("Use SparkSession.builder.enableHiveSupport instead", "2.0.0")
54
class HiveContext private[hive](_sparkSession: SparkSession) extends SQLContext(_sparkSession) {
55
56
/**
57
* Create HiveContext from SparkContext
58
*/
59
def this(sc: SparkContext)
60
61
/**
62
* Create HiveContext from JavaSparkContext
63
*/
64
def this(sc: JavaSparkContext)
65
66
/**
67
* Create new HiveContext session with separated SQLConf, UDF/UDAF,
68
* temporary tables and SessionState, but sharing CacheManager,
69
* IsolatedClientLoader and Hive client
70
*/
71
override def newSession(): HiveContext
72
73
/**
74
* Invalidate and refresh cached metadata for the given table
75
* @param tableName - Name of table to refresh
76
*/
77
def refreshTable(tableName: String): Unit
78
}
79
```
80
81
**Usage Examples (DO NOT USE - Deprecated):**
82
83
```scala
84
// ❌ DEPRECATED - DO NOT USE IN NEW CODE
85
import org.apache.spark.{SparkConf, SparkContext}
86
import org.apache.spark.sql.hive.HiveContext
87
88
// Create HiveContext (deprecated approach)
89
val conf = new SparkConf().setAppName("Hive Legacy App")
90
val sc = new SparkContext(conf)
91
val hiveContext = new HiveContext(sc)
92
93
// Execute queries
94
val result = hiveContext.sql("SELECT * FROM my_table")
95
result.show()
96
97
// Refresh table metadata
98
hiveContext.refreshTable("my_table")
99
100
// Create new session
101
val newSession = hiveContext.newSession()
102
```
103
104
**✅ Use This Instead:**
105
106
```scala
107
import org.apache.spark.sql.SparkSession
108
109
// Modern approach (recommended)
110
val spark = SparkSession.builder()
111
.appName("Modern Hive App")
112
.enableHiveSupport()
113
.getOrCreate()
114
115
// Execute queries (same API)
116
val result = spark.sql("SELECT * FROM my_table")
117
result.show()
118
119
// Refresh table metadata
120
spark.catalog.refreshTable("my_table")
121
122
// Create new session
123
val newSession = spark.newSession()
124
```
125
126
### Session State and Resource Management
127
128
Components for managing Hive-aware session state and resources.
129
130
```scala { .api }
131
/**
132
* Builder for Hive-aware SessionState
133
*/
134
class HiveSessionStateBuilder(
135
session: SparkSession,
136
parentState: Option[SessionState] = None
137
) extends BaseSessionStateBuilder(session, parentState)
138
139
/**
140
* Hive-aware resource loader for adding JARs to both Spark and Hive
141
*/
142
class HiveSessionResourceLoader(sparkSession: SparkSession) extends SessionResourceLoader(sparkSession) {
143
/**
144
* Add JAR to both Spark SQL and Hive client classpaths
145
* @param path - Path to JAR file
146
*/
147
override def addJar(path: String): Unit
148
}
149
```
150
151
**Configuration Integration:**
152
153
```scala
154
import org.apache.spark.sql.SparkSession
155
import org.apache.spark.sql.hive.HiveUtils
156
157
// Configure with Hive-specific settings
158
val spark = SparkSession.builder()
159
.appName("Configured Hive App")
160
.config(HiveUtils.HIVE_METASTORE_VERSION.key, "2.3.0")
161
.config(HiveUtils.CONVERT_METASTORE_PARQUET.key, "true")
162
.config(HiveUtils.CONVERT_METASTORE_ORC.key, "true")
163
.enableHiveSupport()
164
.getOrCreate()
165
166
// Access session-level catalog
167
val catalog = spark.catalog
168
catalog.listDatabases().show()
169
catalog.listTables("default").show()
170
```
171
172
### Session Utility Methods
173
174
Helper methods for session management and configuration.
175
176
```scala { .api }
177
object HiveUtils {
178
/**
179
* Configure SparkContext with Hive external catalog support
180
* @param sc - SparkContext to configure
181
* @return Configured SparkContext
182
*/
183
def withHiveExternalCatalog(sc: SparkContext): SparkContext
184
}
185
```
186
187
**Session Lifecycle Management:**
188
189
```scala
190
import org.apache.spark.sql.SparkSession
191
192
// Create session
193
val spark = SparkSession.builder()
194
.appName("Hive Session Lifecycle")
195
.enableHiveSupport()
196
.getOrCreate()
197
198
try {
199
// Use session for Hive operations
200
spark.sql("SHOW TABLES").show()
201
202
// Create new session (shares metastore connection)
203
val newSession = spark.newSession()
204
newSession.sql("USE another_database")
205
206
} finally {
207
// Clean up
208
spark.stop()
209
}
210
```
211
212
## Error Handling
213
214
Common exceptions and error handling patterns:
215
216
```scala
217
import org.apache.spark.sql.AnalysisException
218
import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
219
220
try {
221
val spark = SparkSession.builder()
222
.enableHiveSupport()
223
.getOrCreate()
224
225
spark.sql("SELECT * FROM non_existent_table")
226
} catch {
227
case e: AnalysisException =>
228
println(s"Analysis error: ${e.getMessage}")
229
case e: NoSuchTableException =>
230
println(s"Table not found: ${e.getMessage}")
231
case e: Exception =>
232
println(s"Unexpected error: ${e.getMessage}")
233
}
234
```
235
236
## Migration from HiveContext to SparkSession
237
238
For migrating legacy code from HiveContext to SparkSession:
239
240
```scala
241
// OLD (Deprecated)
242
import org.apache.spark.sql.hive.HiveContext
243
val hiveContext = new HiveContext(sparkContext)
244
val df = hiveContext.sql("SELECT * FROM table")
245
246
// NEW (Recommended)
247
import org.apache.spark.sql.SparkSession
248
val spark = SparkSession.builder()
249
.sparkContext(sparkContext) // Reuse existing SparkContext if needed
250
.enableHiveSupport()
251
.getOrCreate()
252
val df = spark.sql("SELECT * FROM table")
253
```