0
# Environment Management
1
2
Environment management handles the initialization, configuration, and lifecycle of Spark SQL environments for the Thrift Server, ensuring proper resource allocation and cleanup.
3
4
## Environment Controller
5
6
### SparkSQLEnv
7
8
Singleton environment manager that provides centralized Spark context and SQL context management.
9
10
```scala { .api }
11
private[hive] object SparkSQLEnv extends Logging {
12
var sqlContext: SQLContext
13
var sparkContext: SparkContext
14
15
def init(): Unit
16
def stop(): Unit
17
}
18
```
19
20
#### Environment Initialization
21
22
The `init` method creates and configures the Spark environment for Thrift Server operations:
23
24
**Usage Example:**
25
26
```scala
27
import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv
28
29
// Initialize environment (typically called by server startup)
30
SparkSQLEnv.init()
31
32
// Environment is now available
33
val sqlContext = SparkSQLEnv.sqlContext
34
val sparkContext = SparkSQLEnv.sparkContext
35
```
36
37
**Initialization Process:**
38
39
1. **Configuration Setup**: Loads Spark configuration with defaults
40
2. **Application Naming**: Sets appropriate application name
41
3. **Spark Session Creation**: Creates Spark session with Hive support
42
4. **Context Assignment**: Assigns Spark and SQL contexts to singleton
43
5. **Session State Initialization**: Forces session state initialization
44
6. **Hive Integration**: Configures Hive metastore integration
45
7. **Version Configuration**: Sets Hive version compatibility
46
47
#### Configuration Management
48
49
The environment automatically handles configuration from multiple sources:
50
51
```scala
52
val sparkConf = new SparkConf(loadDefaults = true)
53
54
// Application name resolution
55
val maybeAppName = sparkConf
56
.getOption("spark.app.name")
57
.filterNot(_ == classOf[SparkSQLCLIDriver].getName)
58
.filterNot(_ == classOf[HiveThriftServer2].getName)
59
60
sparkConf.setAppName(maybeAppName.getOrElse(s"SparkSQL::${Utils.localHostName()}"))
61
```
62
63
**Configuration Sources:**
64
- **Default Configuration**: System-wide Spark defaults
65
- **User Configuration**: Spark configuration files and system properties
66
- **Application Overrides**: Thrift Server specific settings
67
- **Runtime Parameters**: Command-line and programmatic overrides
68
69
#### Hive Integration Setup
70
71
The environment ensures proper Hive integration for SQL compatibility:
72
73
```scala
74
val sparkSession = SparkSession.builder.config(sparkConf).enableHiveSupport().getOrCreate()
75
76
// Force session state initialization with correct class loader
77
sparkSession.sessionState
78
79
// Configure Hive metastore client
80
val metadataHive = sparkSession
81
.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
82
metadataHive.setOut(new PrintStream(System.out, true, "UTF-8"))
83
metadataHive.setInfo(new PrintStream(System.err, true, "UTF-8"))
84
metadataHive.setError(new PrintStream(System.err, true, "UTF-8"))
85
86
// Set Hive version compatibility
87
sparkSession.conf.set(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)
88
```
89
90
**Hive Integration Features:**
91
- **Metastore Access**: Full access to Hive metastore for table metadata
92
- **UDF Support**: Hive user-defined functions available in SQL
93
- **SerDe Support**: Hive serialization/deserialization formats
94
- **Compatibility**: Maintains compatibility with existing Hive queries
95
96
#### Environment Cleanup
97
98
The `stop` method provides comprehensive cleanup of all resources:
99
100
```scala
101
def stop(): Unit = {
102
logDebug("Shutting down Spark SQL Environment")
103
// Stop the SparkContext
104
if (SparkSQLEnv.sparkContext != null) {
105
sparkContext.stop()
106
sparkContext = null
107
sqlContext = null
108
}
109
}
110
```
111
112
**Cleanup Process:**
113
1. **Context Shutdown**: Stops Spark context and releases cluster resources
114
2. **Variable Reset**: Clears singleton references to prevent memory leaks
115
3. **Resource Release**: Ensures all system resources are properly released
116
4. **Logging**: Records shutdown events for debugging
117
118
## Application Naming
119
120
### Dynamic Name Resolution
121
122
The environment uses intelligent application naming based on the startup method:
123
124
```scala
125
val maybeAppName = sparkConf
126
.getOption("spark.app.name")
127
.filterNot(_ == classOf[SparkSQLCLIDriver].getName)
128
.filterNot(_ == classOf[HiveThriftServer2].getName)
129
130
sparkConf.setAppName(maybeAppName.getOrElse(s"SparkSQL::${Utils.localHostName()}"))
131
```
132
133
**Naming Strategy:**
134
- **User-Specified**: Uses explicitly configured application name
135
- **Filtered Names**: Excludes default class names for cleaner identification
136
- **Dynamic Default**: Generates name based on hostname for uniqueness
137
138
**Application Name Examples:**
139
- Custom: `"MyThriftServerApp"`
140
- Default: `"SparkSQL::worker-node-01"`
141
- CLI Mode: `"SparkSQL::dev-machine"`
142
143
## Resource Management
144
145
### Context Lifecycle
146
147
The environment manages the complete lifecycle of Spark contexts:
148
149
**Initialization Phase:**
150
- Configuration validation and merging
151
- Resource allocation and cluster connection
152
- Service registration and startup
153
- Integration component setup
154
155
**Runtime Phase:**
156
- Context sharing across sessions
157
- Resource monitoring and management
158
- Configuration updates and refreshes
159
- Performance optimization
160
161
**Shutdown Phase:**
162
- Graceful service shutdown
163
- Resource deallocation and cleanup
164
- Cluster disconnection
165
- Memory and handle cleanup
166
167
### Session State Management
168
169
Critical session state initialization ensures proper operation:
170
171
```scala
172
// SPARK-29604: force initialization of the session state with the Spark class loader,
173
// instead of having it happen during the initialization of the Hive client (which may use a
174
// different class loader).
175
sparkSession.sessionState
176
```
177
178
This prevents class loading issues that can occur when Hive clients use different class loaders.
179
180
## Configuration Integration
181
182
### Multi-Source Configuration
183
184
The environment integrates configuration from various sources:
185
186
**Configuration Hierarchy:**
187
1. **System Defaults**: Built-in Spark and Hive defaults
188
2. **Configuration Files**: spark-defaults.conf, hive-site.xml
189
3. **Environment Variables**: SPARK_* environment variables
190
4. **Command Line**: Runtime parameters and overrides
191
5. **Programmatic**: API-specified configuration values
192
193
### Hive Compatibility Settings
194
195
Specific configuration ensures Hive compatibility:
196
197
```scala
198
sparkSession.conf.set(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)
199
```
200
201
**Compatibility Features:**
202
- **Version Emulation**: Reports compatible Hive version to clients
203
- **Metadata Compatibility**: Ensures metastore schema compatibility
204
- **Query Compatibility**: Maintains HiveQL query behavior
205
- **Function Compatibility**: Preserves Hive function semantics
206
207
## Integration Points
208
209
### Cluster Integration
210
211
The environment handles various cluster deployment modes:
212
213
**Local Mode:**
214
```scala
215
val conf = new SparkConf().setMaster("local[*]")
216
```
217
218
**Standalone Cluster:**
219
```scala
220
val conf = new SparkConf().setMaster("spark://master:7077")
221
```
222
223
**YARN Integration:**
224
```scala
225
val conf = new SparkConf().setMaster("yarn").setDeployMode("cluster")
226
```
227
228
**Kubernetes Support:**
229
```scala
230
val conf = new SparkConf().setMaster("k8s://api-server:8443")
231
```
232
233
### Security Integration
234
235
Environment initialization includes security configuration:
236
237
**Authentication:**
238
- Kerberos principal and keytab configuration
239
- Delegation token management
240
- Secure cluster communication
241
242
**Authorization:**
243
- Hive metastore authorization integration
244
- Spark SQL authorization policies
245
- Resource access control
246
247
**Encryption:**
248
- Network communication encryption
249
- Data-at-rest encryption configuration
250
- Temporary file security
251
252
### Monitoring Integration
253
254
The environment provides hooks for monitoring systems:
255
256
**Metrics Collection:**
257
- JVM metrics and garbage collection
258
- Spark context and executor metrics
259
- SQL execution and performance metrics
260
261
**Event Generation:**
262
- Application lifecycle events
263
- Context creation and destruction events
264
- Configuration change notifications
265
266
**Health Checks:**
267
- Context availability and responsiveness
268
- Resource utilization monitoring
269
- Error rate and exception tracking