0
# Environment Management
1
2
Singleton environment management for Spark and SQL contexts, providing initialization and cleanup operations.
3
4
## Capabilities
5
6
### SparkSQLEnv Object
7
8
Singleton object for managing the master program's SparkSQL environment. Provides centralized initialization and cleanup of Spark and SQL contexts.
9
10
```scala { .api }
11
/**
12
* A singleton object for the master program. The slaves should not access this.
13
*/
14
private[hive] object SparkSQLEnv extends Logging {
15
/** The current SQL context - null until initialized */
16
var sqlContext: SQLContext
17
/** The current Spark context - null until initialized */
18
var sparkContext: SparkContext
19
20
/**
21
* Initialize the SparkSQL environment.
22
* Creates SparkSession with Hive support and sets up contexts.
23
* Safe to call multiple times - will only initialize once.
24
*/
25
def init(): Unit
26
27
/**
28
* Cleans up and shuts down the Spark SQL environments.
29
* Stops the SparkContext and nullifies references.
30
*/
31
def stop(): Unit
32
}
33
```
34
35
**Usage Examples:**
36
37
```scala
38
import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv
39
40
// Initialize the environment - must be called before using contexts
41
SparkSQLEnv.init()
42
43
// Access the SQL context
44
val sqlContext = SparkSQLEnv.sqlContext
45
val df = sqlContext.sql("SELECT * FROM my_table")
46
47
// Access the Spark context
48
val sparkContext = SparkSQLEnv.sparkContext
49
println(s"Application ID: ${sparkContext.applicationId}")
50
51
// Clean shutdown when done
52
SparkSQLEnv.stop()
53
```
54
55
### Environment Initialization Details
56
57
The `init()` method performs the following initialization steps:
58
59
1. **Spark Configuration**: Creates a SparkConf with `loadDefaults = true`
60
2. **Application Naming**: Sets application name to `SparkSQL::<hostname>` if not specified
61
3. **Hive Support**: Enables Hive support in the SparkSession
62
4. **Context Setup**: Initializes both SparkContext and SQLContext references
63
5. **Session State**: Forces initialization of session state with proper class loader
64
6. **Metadata Setup**: Configures Hive metadata client with proper output streams
65
7. **Version Configuration**: Sets fake Hive version for compatibility
66
67
### Environment Configuration
68
69
The environment respects standard Spark configuration properties:
70
71
**Application Configuration:**
72
```scala
73
// Custom application name (optional)
74
sparkConf.set("spark.app.name", "MyThriftServer")
75
76
// Enable/disable Spark UI
77
sparkConf.set("spark.ui.enabled", "true")
78
```
79
80
**Hive Configuration:**
81
```scala
82
// Hive warehouse directory
83
sparkConf.set("spark.sql.warehouse.dir", "/path/to/warehouse")
84
85
// Hive metastore URIs
86
sparkConf.set("spark.sql.hive.metastore.uris", "thrift://localhost:9083")
87
```
88
89
**Session Configuration:**
90
```scala
91
// Single session mode (share contexts across sessions)
92
sparkConf.set("spark.sql.hive.thriftServer.singleSession", "false")
93
```
94
95
### Lifecycle Management
96
97
**Initialization Safety:**
98
- `init()` can be called multiple times safely
99
- Only initializes once, subsequent calls are no-ops
100
- Thread-safe for concurrent access
101
102
**Shutdown Behavior:**
103
- `stop()` shuts down SparkContext if it exists
104
- Nullifies context references to prevent memory leaks
105
- Integrates with shutdown hooks for clean exit
106
107
**State Checking:**
108
```scala
109
// Check if environment is initialized
110
if (SparkSQLEnv.sparkContext != null) {
111
// Environment is ready
112
println("SparkSQL environment is initialized")
113
} else {
114
// Need to initialize first
115
SparkSQLEnv.init()
116
}
117
118
// Check if environment is stopped
119
if (SparkSQLEnv.sparkContext.isStopped) {
120
println("SparkContext has been stopped")
121
}
122
```
123
124
### Integration with Other Components
125
126
The SparkSQLEnv is used throughout the thrift server components:
127
128
**CLI Integration:**
129
```scala
130
// CLI driver uses environment contexts
131
class SparkSQLCLIDriver {
132
// Forces initialization if not in remote mode
133
if (!isRemoteMode) {
134
SparkSQLEnv.init()
135
}
136
}
137
```
138
139
**Server Integration:**
140
```scala
141
// Server main method initializes environment
142
HiveThriftServer2.main(args: Array[String]) {
143
SparkSQLEnv.init()
144
// ... server setup
145
}
146
```
147
148
**Driver Integration:**
149
```scala
150
// SQL driver defaults to environment context
151
class SparkSQLDriver(val context: SQLContext = SparkSQLEnv.sqlContext)
152
```
153
154
### Error Handling
155
156
Common environment initialization errors:
157
158
- **Configuration Errors**: Invalid Spark configuration parameters
159
- **Hive Errors**: Metastore connection or configuration issues
160
- **Resource Errors**: Insufficient memory or unable to acquire resources
161
- **Classpath Errors**: Missing dependencies or version conflicts
162
163
The environment provides logging for troubleshooting initialization issues and integrates with Spark's standard error handling mechanisms.