or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-docker-integration-tests_2-11

Docker-based integration testing framework for Apache Spark JDBC connectivity with multiple database systems

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-docker-integration-tests_2.11@4.1.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-docker-integration-tests_2-11@4.1.0

0

# Spark Docker Integration Tests

1

2

A comprehensive Docker-based integration testing framework specifically designed for Apache Spark's JDBC connectivity features. This framework enables automated testing of Spark's ability to connect to and interact with various database systems including PostgreSQL, MySQL, MariaDB, Oracle, DB2, and Microsoft SQL Server by spinning up Docker containers for each database type.

3

4

## Package Information

5

6

- **Package Name**: spark-docker-integration-tests_2.11

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Build System**: Maven

10

- **Version**: 4.1.0-SNAPSHOT

11

- **Installation**: Add Maven dependency:

12

13

```xml

14

<dependency>

15

<groupId>org.apache.spark</groupId>

16

<artifactId>spark-docker-integration-tests_2.11</artifactId>

17

<version>4.1.0-SNAPSHOT</version>

18

<scope>test</scope>

19

</dependency>

20

```

21

22

## Core Imports

23

24

```scala

25

import org.apache.spark.sql.test.SharedSparkSession

26

import org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite

27

import org.apache.spark.sql.jdbc.DatabaseContainerManager

28

import org.apache.spark.sql.jdbc.JDBCConnectionUtil

29

import org.apache.spark.sql.jdbc.TestDataGenerator

30

```

31

32

## Basic Usage

33

34

```scala

35

import org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite

36

import org.apache.spark.sql.DataFrame

37

import java.sql.Connection

38

39

class MyDatabaseTest extends DockerJDBCIntegrationSuite {

40

val databaseType = "postgresql"

41

val databaseImage = "postgres:13"

42

43

override def beforeAll(): Unit = {

44

super.beforeAll()

45

startDockerContainer()

46

setupTestData()

47

}

48

49

override def afterAll(): Unit = {

50

try {

51

stopDockerContainer()

52

} finally {

53

super.afterAll()

54

}

55

}

56

57

test("basic JDBC connectivity") {

58

val connection = getJdbcConnection()

59

val df = spark.read

60

.format("jdbc")

61

.option("url", getJdbcUrl())

62

.option("dbtable", "test_table")

63

.load()

64

65

assert(df.count() > 0)

66

}

67

}

68

```

69

70

## Architecture

71

72

The framework is built around several key components that provide comprehensive testing capabilities:

73

74

- **Base Test Framework**: `DockerJDBCIntegrationSuite` provides the foundational testing infrastructure with Docker container lifecycle management

75

- **Container Management**: `DatabaseContainerManager` handles Docker container creation, startup, shutdown, and resource cleanup

76

- **Database-Specific Suites**: Specialized test classes for each supported database system with database-specific test scenarios

77

- **Connection Utilities**: `JDBCConnectionUtil` provides robust JDBC connection management and query execution

78

- **Configuration System**: `DockerTestConfig` manages Docker images, timeouts, network settings, and test data configurations

79

- **Data Generation**: `TestDataGenerator` creates consistent test datasets across different database systems

80

81

This design enables systematic testing of Spark's JDBC integration across multiple database systems in isolated, reproducible Docker environments with comprehensive validation of connectivity, data operations, authentication, and performance optimizations.

82

83

## Capabilities

84

85

### Core Test Framework

86

87

Base testing infrastructure providing Docker container lifecycle management and shared test utilities for database integration testing.

88

89

```scala { .api }

90

abstract class DockerJDBCIntegrationSuite extends SharedSparkSession {

91

def startDockerContainer(): Unit

92

def stopDockerContainer(): Unit

93

def getJdbcUrl(): String

94

def getJdbcConnection(): Connection

95

def setupTestData(): Unit

96

}

97

```

98

99

[Core Framework](./core-framework.md)

100

101

### Container Management

102

103

Docker container lifecycle management for database testing environments with support for multiple database systems and configurable timeouts.

104

105

```scala { .api }

106

class DatabaseContainerManager {

107

def createContainer(dbType: String, imageTag: String): String

108

def startContainer(containerId: String): ContainerInfo

109

def stopContainer(containerId: String): Unit

110

def getConnectionInfo(containerId: String): ConnectionInfo

111

def configureTimeout(timeout: Duration): Unit

112

}

113

114

case class ContainerInfo(

115

containerId: String,

116

jdbcUrl: String,

117

hostPort: Int,

118

username: String,

119

password: String

120

)

121

```

122

123

[Container Management](./container-management.md)

124

125

### Database-Specific Testing

126

127

Specialized test suites for individual database systems with database-specific functionality testing, SQL dialect compatibility, and unique feature validation.

128

129

```scala { .api }

130

class PostgreSQLIntegrationSuite extends DockerJDBCIntegrationSuite

131

class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite

132

class MariaDBIntegrationSuite extends DockerJDBCIntegrationSuite

133

class OracleIntegrationSuite extends DockerJDBCIntegrationSuite

134

class DB2IntegrationSuite extends DockerJDBCIntegrationSuite

135

class SQLServerIntegrationSuite extends DockerJDBCIntegrationSuite

136

```

137

138

[Database Testing](./database-testing.md)

139

140

### JDBC Connection Utilities

141

142

Utility functions for JDBC connection management, query execution, and resource cleanup with robust error handling and connection validation.

143

144

```scala { .api }

145

object JDBCConnectionUtil {

146

def createConnection(url: String, properties: Properties): Connection

147

def executeQuery(connection: Connection, sql: String): ResultSet

148

def validateConnection(connection: Connection): Boolean

149

def closeResources(resources: AutoCloseable*): Unit

150

}

151

```

152

153

[JDBC Utilities](./jdbc-utilities.md)

154

155

### Advanced Testing Features

156

157

Specialized testing capabilities including cross-database compatibility, DataSource V2 integration, Kerberos authentication, and join pushdown optimization testing.

158

159

```scala { .api }

160

class CrossDatabaseQuerySuite extends DockerJDBCIntegrationSuite

161

class DataSourceV2TestSuite extends DockerJDBCIntegrationSuite

162

class KerberosTestSuite extends DockerJDBCIntegrationSuite

163

class JoinPushdownTestSuite extends DockerJDBCIntegrationSuite

164

```

165

166

[Advanced Features](./advanced-features.md)

167

168

## Types

169

170

```scala { .api }

171

case class DockerTestConfig(

172

dockerImageVersions: Map[String, String],

173

containerTimeouts: Map[String, Duration],

174

testDataSources: List[TestDataSource],

175

networkSettings: NetworkConfig

176

)

177

178

case class TestDataSource(

179

name: String,

180

schema: StructType,

181

data: List[Row]

182

)

183

184

case class NetworkConfig(

185

networkName: String,

186

driverClass: String,

187

portRange: Range

188

)

189

190

case class ConnectionInfo(

191

jdbcUrl: String,

192

username: String,

193

password: String,

194

driverClass: String

195

)

196

```