Docker-based integration testing framework for Apache Spark JDBC connectivity with multiple database systems
npx @tessl/cli install tessl/maven-org-apache-spark--spark-docker-integration-tests_2-11@4.1.00
# Spark Docker Integration Tests
1
2
A comprehensive Docker-based integration testing framework specifically designed for Apache Spark's JDBC connectivity features. This framework enables automated testing of Spark's ability to connect to and interact with various database systems including PostgreSQL, MySQL, MariaDB, Oracle, DB2, and Microsoft SQL Server by spinning up Docker containers for each database type.
3
4
## Package Information
5
6
- **Package Name**: spark-docker-integration-tests_2.11
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Build System**: Maven
10
- **Version**: 4.1.0-SNAPSHOT
11
- **Installation**: Add Maven dependency:
12
13
```xml
14
<dependency>
15
<groupId>org.apache.spark</groupId>
16
<artifactId>spark-docker-integration-tests_2.11</artifactId>
17
<version>4.1.0-SNAPSHOT</version>
18
<scope>test</scope>
19
</dependency>
20
```
21
22
## Core Imports
23
24
```scala
25
import org.apache.spark.sql.test.SharedSparkSession
26
import org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite
27
import org.apache.spark.sql.jdbc.DatabaseContainerManager
28
import org.apache.spark.sql.jdbc.JDBCConnectionUtil
29
import org.apache.spark.sql.jdbc.TestDataGenerator
30
```
31
32
## Basic Usage
33
34
```scala
35
import org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite
36
import org.apache.spark.sql.DataFrame
37
import java.sql.Connection
38
39
class MyDatabaseTest extends DockerJDBCIntegrationSuite {
40
val databaseType = "postgresql"
41
val databaseImage = "postgres:13"
42
43
override def beforeAll(): Unit = {
44
super.beforeAll()
45
startDockerContainer()
46
setupTestData()
47
}
48
49
override def afterAll(): Unit = {
50
try {
51
stopDockerContainer()
52
} finally {
53
super.afterAll()
54
}
55
}
56
57
test("basic JDBC connectivity") {
58
val connection = getJdbcConnection()
59
val df = spark.read
60
.format("jdbc")
61
.option("url", getJdbcUrl())
62
.option("dbtable", "test_table")
63
.load()
64
65
assert(df.count() > 0)
66
}
67
}
68
```
69
70
## Architecture
71
72
The framework is built around several key components that provide comprehensive testing capabilities:
73
74
- **Base Test Framework**: `DockerJDBCIntegrationSuite` provides the foundational testing infrastructure with Docker container lifecycle management
75
- **Container Management**: `DatabaseContainerManager` handles Docker container creation, startup, shutdown, and resource cleanup
76
- **Database-Specific Suites**: Specialized test classes for each supported database system with database-specific test scenarios
77
- **Connection Utilities**: `JDBCConnectionUtil` provides robust JDBC connection management and query execution
78
- **Configuration System**: `DockerTestConfig` manages Docker images, timeouts, network settings, and test data configurations
79
- **Data Generation**: `TestDataGenerator` creates consistent test datasets across different database systems
80
81
This design enables systematic testing of Spark's JDBC integration across multiple database systems in isolated, reproducible Docker environments with comprehensive validation of connectivity, data operations, authentication, and performance optimizations.
82
83
## Capabilities
84
85
### Core Test Framework
86
87
Base testing infrastructure providing Docker container lifecycle management and shared test utilities for database integration testing.
88
89
```scala { .api }
90
abstract class DockerJDBCIntegrationSuite extends SharedSparkSession {
91
def startDockerContainer(): Unit
92
def stopDockerContainer(): Unit
93
def getJdbcUrl(): String
94
def getJdbcConnection(): Connection
95
def setupTestData(): Unit
96
}
97
```
98
99
[Core Framework](./core-framework.md)
100
101
### Container Management
102
103
Docker container lifecycle management for database testing environments with support for multiple database systems and configurable timeouts.
104
105
```scala { .api }
106
class DatabaseContainerManager {
107
def createContainer(dbType: String, imageTag: String): String
108
def startContainer(containerId: String): ContainerInfo
109
def stopContainer(containerId: String): Unit
110
def getConnectionInfo(containerId: String): ConnectionInfo
111
def configureTimeout(timeout: Duration): Unit
112
}
113
114
case class ContainerInfo(
115
containerId: String,
116
jdbcUrl: String,
117
hostPort: Int,
118
username: String,
119
password: String
120
)
121
```
122
123
[Container Management](./container-management.md)
124
125
### Database-Specific Testing
126
127
Specialized test suites for individual database systems with database-specific functionality testing, SQL dialect compatibility, and unique feature validation.
128
129
```scala { .api }
130
class PostgreSQLIntegrationSuite extends DockerJDBCIntegrationSuite
131
class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite
132
class MariaDBIntegrationSuite extends DockerJDBCIntegrationSuite
133
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite
134
class DB2IntegrationSuite extends DockerJDBCIntegrationSuite
135
class SQLServerIntegrationSuite extends DockerJDBCIntegrationSuite
136
```
137
138
[Database Testing](./database-testing.md)
139
140
### JDBC Connection Utilities
141
142
Utility functions for JDBC connection management, query execution, and resource cleanup with robust error handling and connection validation.
143
144
```scala { .api }
145
object JDBCConnectionUtil {
146
def createConnection(url: String, properties: Properties): Connection
147
def executeQuery(connection: Connection, sql: String): ResultSet
148
def validateConnection(connection: Connection): Boolean
149
def closeResources(resources: AutoCloseable*): Unit
150
}
151
```
152
153
[JDBC Utilities](./jdbc-utilities.md)
154
155
### Advanced Testing Features
156
157
Specialized testing capabilities including cross-database compatibility, DataSource V2 integration, Kerberos authentication, and join pushdown optimization testing.
158
159
```scala { .api }
160
class CrossDatabaseQuerySuite extends DockerJDBCIntegrationSuite
161
class DataSourceV2TestSuite extends DockerJDBCIntegrationSuite
162
class KerberosTestSuite extends DockerJDBCIntegrationSuite
163
class JoinPushdownTestSuite extends DockerJDBCIntegrationSuite
164
```
165
166
[Advanced Features](./advanced-features.md)
167
168
## Types
169
170
```scala { .api }
171
case class DockerTestConfig(
172
dockerImageVersions: Map[String, String],
173
containerTimeouts: Map[String, Duration],
174
testDataSources: List[TestDataSource],
175
networkSettings: NetworkConfig
176
)
177
178
case class TestDataSource(
179
name: String,
180
schema: StructType,
181
data: List[Row]
182
)
183
184
case class NetworkConfig(
185
networkName: String,
186
driverClass: String,
187
portRange: Range
188
)
189
190
case class ConnectionInfo(
191
jdbcUrl: String,
192
username: String,
193
password: String,
194
driverClass: String
195
)
196
```