Tessl Tile for maven/org.apache.spark/spark-hive_2.11@2.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-hive_2-11

Apache Spark Hive integration module that provides support for Hive tables, queries, and SerDes

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:maven/org.apache.spark/spark-hive_2.11@2.4.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-hive_2-11@2.4.0

0
# Apache Spark Hive Integration
1

2
Apache Spark Hive integration module that provides comprehensive support for accessing and manipulating Hive tables, executing HiveQL queries, and leveraging Hive SerDes. This module serves as the bridge between Apache Spark SQL and Apache Hive, enabling Spark applications to work seamlessly with existing Hive infrastructure, metastore, and data formats while maintaining full compatibility with Hive features including UDFs, partitioning, and complex data types.
3

4
## Package Information
5

6
- **Package Name**: org.apache.spark:spark-hive_2.11
7
- **Package Type**: Maven
8
- **Language**: Scala
9
- **Installation**: Add to your Maven dependencies or include in Spark classpath
10
- **Version**: 2.4.8
11

12
```xml
13
<dependency>
14
    <groupId>org.apache.spark</groupId>
15
    <artifactId>spark-hive_2.11</artifactId>
16
    <version>2.4.8</version>
17
</dependency>
18
```
19

20
## Core Imports
21

22
For Scala applications:
23

24
```scala
25
import org.apache.spark.sql.SparkSession
26
import org.apache.spark.sql.hive.HiveContext  // Deprecated - use SparkSession
27
```
28

29
For enabling Hive support (modern approach):
30

31
```scala
32
import org.apache.spark.sql.SparkSession
33

34
val spark = SparkSession.builder()
35
  .appName("Hive Integration App")
36
  .enableHiveSupport()
37
  .getOrCreate()
38
```
39

40
## Basic Usage
41

42
```scala
43
import org.apache.spark.sql.SparkSession
44

45
// Create SparkSession with Hive support
46
val spark = SparkSession.builder()
47
  .appName("Hive Integration Example")
48
  .enableHiveSupport()
49
  .getOrCreate()
50

51
// Execute HiveQL queries
52
spark.sql("CREATE TABLE IF NOT EXISTS users (id INT, name STRING, age INT)")
53
spark.sql("INSERT INTO users VALUES (1, 'Alice', 25), (2, 'Bob', 30)")
54

55
// Query Hive tables
56
val result = spark.sql("SELECT * FROM users WHERE age > 25")
57
result.show()
58

59
// Access Hive metastore
60
spark.catalog.listTables().show()
61
spark.catalog.listDatabases().show()
62

63
// Work with Hive partitioned tables
64
spark.sql("""
65
  CREATE TABLE IF NOT EXISTS partitioned_sales (
66
    product STRING, 
67
    amount DOUBLE
68
  ) PARTITIONED BY (year INT, month INT)
69
""")
70

71
// Load data into partitioned table
72
spark.sql("INSERT INTO partitioned_sales PARTITION(year=2023, month=12) VALUES ('laptop', 999.99)")
73
```
74

75
## Architecture
76

77
The Spark Hive integration is built around several key components:
78

79
- **HiveExternalCatalog**: Persistent catalog implementation using Hive metastore for database, table, and partition metadata management
80
- **HiveClient Interface**: Low-level interface for direct Hive metastore operations and HiveQL execution
81
- **Session Integration**: HiveSessionCatalog and HiveSessionStateBuilder for Hive-aware Spark SQL sessions
82
- **UDF Support**: Comprehensive wrappers for Hive UDFs, UDAFs, and UDTFs with automatic type conversion
83
- **Data Format Integration**: Native support for Hive SerDes, ORC files, and table format conversion
84
- **Query Planning**: Hive-specific optimization strategies and execution operators
85

86
## Capabilities
87

88
### Session Management
89

90
Core functionality for creating and managing Spark sessions with Hive integration, including legacy HiveContext support and modern SparkSession configuration.
91

92
```scala { .api }
93
// Modern approach (recommended)
94
def enableHiveSupport(): SparkSession.Builder
95

96
// Legacy approach (FULLY DEPRECATED since 2.0.0 - DO NOT USE)
97
@deprecated("Use SparkSession.builder.enableHiveSupport instead", "2.0.0")
98
class HiveContext(sc: SparkContext) extends SQLContext
99
```
100

101
[Session Management](./session-management.md)
102

103
### Hive Metastore Operations  
104

105
Direct access to Hive metastore for programmatic database, table, partition, and function management through the HiveClient interface.
106

107
```scala { .api }
108
trait HiveClient {
109
  def listDatabases(pattern: String): Seq[String]
110
  def getDatabase(name: String): CatalogDatabase  
111
  def listTables(dbName: String): Seq[String]
112
  def getTable(dbName: String, tableName: String): CatalogTable
113
  def createTable(table: CatalogTable, ignoreIfExists: Boolean): Unit
114
  def getPartitions(catalogTable: CatalogTable, partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition]
115
  def runSqlHive(sql: String): Seq[String]
116
}
117
```
118

119
[Metastore Operations](./metastore-operations.md)
120

121
### Hive UDF Integration
122

123
Comprehensive support for Hive User-Defined Functions including simple UDFs, generic UDFs, table-generating functions (UDTFs), and aggregate functions (UDAFs).
124

125
```scala { .api }
126
case class HiveSimpleUDF(funcWrapper: HiveFunctionWrapper, children: Seq[Expression]) extends Expression
127
case class HiveGenericUDF(funcWrapper: HiveFunctionWrapper, children: Seq[Expression]) extends Expression  
128
case class HiveGenericUDTF(funcWrapper: HiveFunctionWrapper, children: Seq[Expression]) extends Generator
129
case class HiveUDAFFunction(funcWrapper: HiveFunctionWrapper, children: Seq[Expression]) extends TypedImperativeAggregate[Any]
130
```
131

132
[UDF Integration](./udf-integration.md)
133

134
### Configuration and Utilities
135

136
Configuration options, utilities, and constants for customizing Hive integration behavior including metastore settings, file format conversion, and compatibility options.
137

138
```scala { .api }
139
object HiveUtils {
140
  val builtinHiveVersion: String = "1.2.1"
141
  val HIVE_METASTORE_VERSION: ConfigEntry[String]
142
  val CONVERT_METASTORE_PARQUET: ConfigEntry[Boolean] 
143
  val CONVERT_METASTORE_ORC: ConfigEntry[Boolean]
144
  def newTemporaryConfiguration(useInMemoryDerby: Boolean): Map[String, String]
145
}
146
```
147

148
[Configuration](./configuration.md)
149

150
### Data Type Conversion
151

152
Utilities for converting between Hive and Catalyst data types, handling ObjectInspectors, and managing SerDe operations.
153

154
```scala { .api }
155
trait HiveInspectors {
156
  def javaTypeToDataType(clz: Type): DataType
157
  def toInspector(dataType: DataType): ObjectInspector
158
  def inspectorToDataType(inspector: ObjectInspector): DataType
159
  def wrapperFor(oi: ObjectInspector, dataType: DataType): Any => Any
160
  def unwrapperFor(objectInspector: ObjectInspector): Any => Any
161
}
162
```
163

164
[Data Type Conversion](./data-type-conversion.md)
165

166
### File Format Support
167

168
Native support for Hive file formats including traditional Hive tables and optimized ORC files with Hive compatibility.
169

170
```scala { .api }
171
class HiveFileFormat extends FileFormat with DataSourceRegister {
172
  override def shortName(): String = "hive"
173
}
174

175
class OrcFileFormat extends FileFormat with DataSourceRegister {
176
  override def shortName(): String = "orc"  
177
}
178
```
179

180
[File Formats](./file-formats.md)
181

182
## Types
183

184
### Core Catalog Types
185

186
```scala { .api }
187
// From Spark SQL Catalyst - used throughout Hive integration
188
case class CatalogDatabase(
189
  name: String,
190
  description: String,
191
  locationUri: String,
192
  properties: Map[String, String]
193
)
194

195
case class CatalogTable(
196
  identifier: TableIdentifier,
197
  tableType: CatalogTableType,
198
  storage: CatalogStorageFormat,
199
  schema: StructType,
200
  partitionColumnNames: Seq[String] = Seq.empty,
201
  properties: Map[String, String] = Map.empty
202
)
203

204
case class CatalogTablePartition(
205
  spec: TablePartitionSpec,
206
  storage: CatalogStorageFormat,
207
  parameters: Map[String, String] = Map.empty
208
)
209

210
case class CatalogFunction(
211
  identifier: FunctionIdentifier,
212
  className: String,
213
  resources: Seq[FunctionResource]
214
)
215
```
216

217
### Hive-Specific Types
218

219
```scala { .api }
220
// Hive version support
221
abstract class HiveVersion(
222
  val fullVersion: String,
223
  val extraDeps: Seq[String] = Nil,
224
  val exclusions: Seq[String] = Nil
225
)
226

227
// Configuration for Hive data sources  
228
class HiveOptions(parameters: Map[String, String]) {
229
  val fileFormat: Option[String]
230
  val inputFormat: Option[String] 
231
  val outputFormat: Option[String]
232
  val serde: Option[String] 
233
  def serdeProperties: Map[String, String]
234
}
235
```