or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli-interface.mdenvironment-management.mdindex.mdquery-operations.mdserver-management.mdsession-management.mdweb-ui-monitoring.md

environment-management.mddocs/

0

# Environment Management

1

2

Environment management handles the initialization, configuration, and lifecycle of Spark SQL environments for the Thrift Server, ensuring proper resource allocation and cleanup.

3

4

## Environment Controller

5

6

### SparkSQLEnv

7

8

Singleton environment manager that provides centralized Spark context and SQL context management.

9

10

```scala { .api }

11

private[hive] object SparkSQLEnv extends Logging {

12

var sqlContext: SQLContext

13

var sparkContext: SparkContext

14

15

def init(): Unit

16

def stop(): Unit

17

}

18

```

19

20

#### Environment Initialization

21

22

The `init` method creates and configures the Spark environment for Thrift Server operations:

23

24

**Usage Example:**

25

26

```scala

27

import org.apache.spark.sql.hive.thriftserver.SparkSQLEnv

28

29

// Initialize environment (typically called by server startup)

30

SparkSQLEnv.init()

31

32

// Environment is now available

33

val sqlContext = SparkSQLEnv.sqlContext

34

val sparkContext = SparkSQLEnv.sparkContext

35

```

36

37

**Initialization Process:**

38

39

1. **Configuration Setup**: Loads Spark configuration with defaults

40

2. **Application Naming**: Sets appropriate application name

41

3. **Spark Session Creation**: Creates Spark session with Hive support

42

4. **Context Assignment**: Assigns Spark and SQL contexts to singleton

43

5. **Session State Initialization**: Forces session state initialization

44

6. **Hive Integration**: Configures Hive metastore integration

45

7. **Version Configuration**: Sets Hive version compatibility

46

47

#### Configuration Management

48

49

The environment automatically handles configuration from multiple sources:

50

51

```scala

52

val sparkConf = new SparkConf(loadDefaults = true)

53

54

// Application name resolution

55

val maybeAppName = sparkConf

56

.getOption("spark.app.name")

57

.filterNot(_ == classOf[SparkSQLCLIDriver].getName)

58

.filterNot(_ == classOf[HiveThriftServer2].getName)

59

60

sparkConf.setAppName(maybeAppName.getOrElse(s"SparkSQL::${Utils.localHostName()}"))

61

```

62

63

**Configuration Sources:**

64

- **Default Configuration**: System-wide Spark defaults

65

- **User Configuration**: Spark configuration files and system properties

66

- **Application Overrides**: Thrift Server specific settings

67

- **Runtime Parameters**: Command-line and programmatic overrides

68

69

#### Hive Integration Setup

70

71

The environment ensures proper Hive integration for SQL compatibility:

72

73

```scala

74

val sparkSession = SparkSession.builder.config(sparkConf).enableHiveSupport().getOrCreate()

75

76

// Force session state initialization with correct class loader

77

sparkSession.sessionState

78

79

// Configure Hive metastore client

80

val metadataHive = sparkSession

81

.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client

82

metadataHive.setOut(new PrintStream(System.out, true, "UTF-8"))

83

metadataHive.setInfo(new PrintStream(System.err, true, "UTF-8"))

84

metadataHive.setError(new PrintStream(System.err, true, "UTF-8"))

85

86

// Set Hive version compatibility

87

sparkSession.conf.set(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)

88

```

89

90

**Hive Integration Features:**

91

- **Metastore Access**: Full access to Hive metastore for table metadata

92

- **UDF Support**: Hive user-defined functions available in SQL

93

- **SerDe Support**: Hive serialization/deserialization formats

94

- **Compatibility**: Maintains compatibility with existing Hive queries

95

96

#### Environment Cleanup

97

98

The `stop` method provides comprehensive cleanup of all resources:

99

100

```scala

101

def stop(): Unit = {

102

logDebug("Shutting down Spark SQL Environment")

103

// Stop the SparkContext

104

if (SparkSQLEnv.sparkContext != null) {

105

sparkContext.stop()

106

sparkContext = null

107

sqlContext = null

108

}

109

}

110

```

111

112

**Cleanup Process:**

113

1. **Context Shutdown**: Stops Spark context and releases cluster resources

114

2. **Variable Reset**: Clears singleton references to prevent memory leaks

115

3. **Resource Release**: Ensures all system resources are properly released

116

4. **Logging**: Records shutdown events for debugging

117

118

## Application Naming

119

120

### Dynamic Name Resolution

121

122

The environment uses intelligent application naming based on the startup method:

123

124

```scala

125

val maybeAppName = sparkConf

126

.getOption("spark.app.name")

127

.filterNot(_ == classOf[SparkSQLCLIDriver].getName)

128

.filterNot(_ == classOf[HiveThriftServer2].getName)

129

130

sparkConf.setAppName(maybeAppName.getOrElse(s"SparkSQL::${Utils.localHostName()}"))

131

```

132

133

**Naming Strategy:**

134

- **User-Specified**: Uses explicitly configured application name

135

- **Filtered Names**: Excludes default class names for cleaner identification

136

- **Dynamic Default**: Generates name based on hostname for uniqueness

137

138

**Application Name Examples:**

139

- Custom: `"MyThriftServerApp"`

140

- Default: `"SparkSQL::worker-node-01"`

141

- CLI Mode: `"SparkSQL::dev-machine"`

142

143

## Resource Management

144

145

### Context Lifecycle

146

147

The environment manages the complete lifecycle of Spark contexts:

148

149

**Initialization Phase:**

150

- Configuration validation and merging

151

- Resource allocation and cluster connection

152

- Service registration and startup

153

- Integration component setup

154

155

**Runtime Phase:**

156

- Context sharing across sessions

157

- Resource monitoring and management

158

- Configuration updates and refreshes

159

- Performance optimization

160

161

**Shutdown Phase:**

162

- Graceful service shutdown

163

- Resource deallocation and cleanup

164

- Cluster disconnection

165

- Memory and handle cleanup

166

167

### Session State Management

168

169

Critical session state initialization ensures proper operation:

170

171

```scala

172

// SPARK-29604: force initialization of the session state with the Spark class loader,

173

// instead of having it happen during the initialization of the Hive client (which may use a

174

// different class loader).

175

sparkSession.sessionState

176

```

177

178

This prevents class loading issues that can occur when Hive clients use different class loaders.

179

180

## Configuration Integration

181

182

### Multi-Source Configuration

183

184

The environment integrates configuration from various sources:

185

186

**Configuration Hierarchy:**

187

1. **System Defaults**: Built-in Spark and Hive defaults

188

2. **Configuration Files**: spark-defaults.conf, hive-site.xml

189

3. **Environment Variables**: SPARK_* environment variables

190

4. **Command Line**: Runtime parameters and overrides

191

5. **Programmatic**: API-specified configuration values

192

193

### Hive Compatibility Settings

194

195

Specific configuration ensures Hive compatibility:

196

197

```scala

198

sparkSession.conf.set(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)

199

```

200

201

**Compatibility Features:**

202

- **Version Emulation**: Reports compatible Hive version to clients

203

- **Metadata Compatibility**: Ensures metastore schema compatibility

204

- **Query Compatibility**: Maintains HiveQL query behavior

205

- **Function Compatibility**: Preserves Hive function semantics

206

207

## Integration Points

208

209

### Cluster Integration

210

211

The environment handles various cluster deployment modes:

212

213

**Local Mode:**

214

```scala

215

val conf = new SparkConf().setMaster("local[*]")

216

```

217

218

**Standalone Cluster:**

219

```scala

220

val conf = new SparkConf().setMaster("spark://master:7077")

221

```

222

223

**YARN Integration:**

224

```scala

225

val conf = new SparkConf().setMaster("yarn").setDeployMode("cluster")

226

```

227

228

**Kubernetes Support:**

229

```scala

230

val conf = new SparkConf().setMaster("k8s://api-server:8443")

231

```

232

233

### Security Integration

234

235

Environment initialization includes security configuration:

236

237

**Authentication:**

238

- Kerberos principal and keytab configuration

239

- Delegation token management

240

- Secure cluster communication

241

242

**Authorization:**

243

- Hive metastore authorization integration

244

- Spark SQL authorization policies

245

- Resource access control

246

247

**Encryption:**

248

- Network communication encryption

249

- Data-at-rest encryption configuration

250

- Temporary file security

251

252

### Monitoring Integration

253

254

The environment provides hooks for monitoring systems:

255

256

**Metrics Collection:**

257

- JVM metrics and garbage collection

258

- Spark context and executor metrics

259

- SQL execution and performance metrics

260

261

**Event Generation:**

262

- Application lifecycle events

263

- Context creation and destruction events

264

- Configuration change notifications

265

266

**Health Checks:**

267

- Context availability and responsiveness

268

- Resource utilization monitoring

269

- Error rate and exception tracking