or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-hive_2-11

Apache Spark Hive integration module that provides support for Hive tables, queries, and SerDes

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-hive_2.11@2.4.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-hive_2-11@2.4.0

0

# Apache Spark Hive Integration

1

2

Apache Spark Hive integration module that provides comprehensive support for accessing and manipulating Hive tables, executing HiveQL queries, and leveraging Hive SerDes. This module serves as the bridge between Apache Spark SQL and Apache Hive, enabling Spark applications to work seamlessly with existing Hive infrastructure, metastore, and data formats while maintaining full compatibility with Hive features including UDFs, partitioning, and complex data types.

3

4

## Package Information

5

6

- **Package Name**: org.apache.spark:spark-hive_2.11

7

- **Package Type**: Maven

8

- **Language**: Scala

9

- **Installation**: Add to your Maven dependencies or include in Spark classpath

10

- **Version**: 2.4.8

11

12

```xml

13

<dependency>

14

<groupId>org.apache.spark</groupId>

15

<artifactId>spark-hive_2.11</artifactId>

16

<version>2.4.8</version>

17

</dependency>

18

```

19

20

## Core Imports

21

22

For Scala applications:

23

24

```scala

25

import org.apache.spark.sql.SparkSession

26

import org.apache.spark.sql.hive.HiveContext // Deprecated - use SparkSession

27

```

28

29

For enabling Hive support (modern approach):

30

31

```scala

32

import org.apache.spark.sql.SparkSession

33

34

val spark = SparkSession.builder()

35

.appName("Hive Integration App")

36

.enableHiveSupport()

37

.getOrCreate()

38

```

39

40

## Basic Usage

41

42

```scala

43

import org.apache.spark.sql.SparkSession

44

45

// Create SparkSession with Hive support

46

val spark = SparkSession.builder()

47

.appName("Hive Integration Example")

48

.enableHiveSupport()

49

.getOrCreate()

50

51

// Execute HiveQL queries

52

spark.sql("CREATE TABLE IF NOT EXISTS users (id INT, name STRING, age INT)")

53

spark.sql("INSERT INTO users VALUES (1, 'Alice', 25), (2, 'Bob', 30)")

54

55

// Query Hive tables

56

val result = spark.sql("SELECT * FROM users WHERE age > 25")

57

result.show()

58

59

// Access Hive metastore

60

spark.catalog.listTables().show()

61

spark.catalog.listDatabases().show()

62

63

// Work with Hive partitioned tables

64

spark.sql("""

65

CREATE TABLE IF NOT EXISTS partitioned_sales (

66

product STRING,

67

amount DOUBLE

68

) PARTITIONED BY (year INT, month INT)

69

""")

70

71

// Load data into partitioned table

72

spark.sql("INSERT INTO partitioned_sales PARTITION(year=2023, month=12) VALUES ('laptop', 999.99)")

73

```

74

75

## Architecture

76

77

The Spark Hive integration is built around several key components:

78

79

- **HiveExternalCatalog**: Persistent catalog implementation using Hive metastore for database, table, and partition metadata management

80

- **HiveClient Interface**: Low-level interface for direct Hive metastore operations and HiveQL execution

81

- **Session Integration**: HiveSessionCatalog and HiveSessionStateBuilder for Hive-aware Spark SQL sessions

82

- **UDF Support**: Comprehensive wrappers for Hive UDFs, UDAFs, and UDTFs with automatic type conversion

83

- **Data Format Integration**: Native support for Hive SerDes, ORC files, and table format conversion

84

- **Query Planning**: Hive-specific optimization strategies and execution operators

85

86

## Capabilities

87

88

### Session Management

89

90

Core functionality for creating and managing Spark sessions with Hive integration, including legacy HiveContext support and modern SparkSession configuration.

91

92

```scala { .api }

93

// Modern approach (recommended)

94

def enableHiveSupport(): SparkSession.Builder

95

96

// Legacy approach (FULLY DEPRECATED since 2.0.0 - DO NOT USE)

97

@deprecated("Use SparkSession.builder.enableHiveSupport instead", "2.0.0")

98

class HiveContext(sc: SparkContext) extends SQLContext

99

```

100

101

[Session Management](./session-management.md)

102

103

### Hive Metastore Operations

104

105

Direct access to Hive metastore for programmatic database, table, partition, and function management through the HiveClient interface.

106

107

```scala { .api }

108

trait HiveClient {

109

def listDatabases(pattern: String): Seq[String]

110

def getDatabase(name: String): CatalogDatabase

111

def listTables(dbName: String): Seq[String]

112

def getTable(dbName: String, tableName: String): CatalogTable

113

def createTable(table: CatalogTable, ignoreIfExists: Boolean): Unit

114

def getPartitions(catalogTable: CatalogTable, partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition]

115

def runSqlHive(sql: String): Seq[String]

116

}

117

```

118

119

[Metastore Operations](./metastore-operations.md)

120

121

### Hive UDF Integration

122

123

Comprehensive support for Hive User-Defined Functions including simple UDFs, generic UDFs, table-generating functions (UDTFs), and aggregate functions (UDAFs).

124

125

```scala { .api }

126

case class HiveSimpleUDF(funcWrapper: HiveFunctionWrapper, children: Seq[Expression]) extends Expression

127

case class HiveGenericUDF(funcWrapper: HiveFunctionWrapper, children: Seq[Expression]) extends Expression

128

case class HiveGenericUDTF(funcWrapper: HiveFunctionWrapper, children: Seq[Expression]) extends Generator

129

case class HiveUDAFFunction(funcWrapper: HiveFunctionWrapper, children: Seq[Expression]) extends TypedImperativeAggregate[Any]

130

```

131

132

[UDF Integration](./udf-integration.md)

133

134

### Configuration and Utilities

135

136

Configuration options, utilities, and constants for customizing Hive integration behavior including metastore settings, file format conversion, and compatibility options.

137

138

```scala { .api }

139

object HiveUtils {

140

val builtinHiveVersion: String = "1.2.1"

141

val HIVE_METASTORE_VERSION: ConfigEntry[String]

142

val CONVERT_METASTORE_PARQUET: ConfigEntry[Boolean]

143

val CONVERT_METASTORE_ORC: ConfigEntry[Boolean]

144

def newTemporaryConfiguration(useInMemoryDerby: Boolean): Map[String, String]

145

}

146

```

147

148

[Configuration](./configuration.md)

149

150

### Data Type Conversion

151

152

Utilities for converting between Hive and Catalyst data types, handling ObjectInspectors, and managing SerDe operations.

153

154

```scala { .api }

155

trait HiveInspectors {

156

def javaTypeToDataType(clz: Type): DataType

157

def toInspector(dataType: DataType): ObjectInspector

158

def inspectorToDataType(inspector: ObjectInspector): DataType

159

def wrapperFor(oi: ObjectInspector, dataType: DataType): Any => Any

160

def unwrapperFor(objectInspector: ObjectInspector): Any => Any

161

}

162

```

163

164

[Data Type Conversion](./data-type-conversion.md)

165

166

### File Format Support

167

168

Native support for Hive file formats including traditional Hive tables and optimized ORC files with Hive compatibility.

169

170

```scala { .api }

171

class HiveFileFormat extends FileFormat with DataSourceRegister {

172

override def shortName(): String = "hive"

173

}

174

175

class OrcFileFormat extends FileFormat with DataSourceRegister {

176

override def shortName(): String = "orc"

177

}

178

```

179

180

[File Formats](./file-formats.md)

181

182

## Types

183

184

### Core Catalog Types

185

186

```scala { .api }

187

// From Spark SQL Catalyst - used throughout Hive integration

188

case class CatalogDatabase(

189

name: String,

190

description: String,

191

locationUri: String,

192

properties: Map[String, String]

193

)

194

195

case class CatalogTable(

196

identifier: TableIdentifier,

197

tableType: CatalogTableType,

198

storage: CatalogStorageFormat,

199

schema: StructType,

200

partitionColumnNames: Seq[String] = Seq.empty,

201

properties: Map[String, String] = Map.empty

202

)

203

204

case class CatalogTablePartition(

205

spec: TablePartitionSpec,

206

storage: CatalogStorageFormat,

207

parameters: Map[String, String] = Map.empty

208

)

209

210

case class CatalogFunction(

211

identifier: FunctionIdentifier,

212

className: String,

213

resources: Seq[FunctionResource]

214

)

215

```

216

217

### Hive-Specific Types

218

219

```scala { .api }

220

// Hive version support

221

abstract class HiveVersion(

222

val fullVersion: String,

223

val extraDeps: Seq[String] = Nil,

224

val exclusions: Seq[String] = Nil

225

)

226

227

// Configuration for Hive data sources

228

class HiveOptions(parameters: Map[String, String]) {

229

val fileFormat: Option[String]

230

val inputFormat: Option[String]

231

val outputFormat: Option[String]

232

val serde: Option[String]

233

def serdeProperties: Map[String, String]

234

}

235

```