Docker-based integration testing framework for Apache Spark JDBC connectivity with multiple database systems
npx @tessl/cli install tessl/maven-org-apache-spark--spark-docker-integration-tests_2-11@4.1.0A comprehensive Docker-based integration testing framework specifically designed for Apache Spark's JDBC connectivity features. This framework enables automated testing of Spark's ability to connect to and interact with various database systems including PostgreSQL, MySQL, MariaDB, Oracle, DB2, and Microsoft SQL Server by spinning up Docker containers for each database type.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-docker-integration-tests_2.11</artifactId>
<version>4.1.0-SNAPSHOT</version>
<scope>test</scope>
</dependency>import org.apache.spark.sql.test.SharedSparkSession
import org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite
import org.apache.spark.sql.jdbc.DatabaseContainerManager
import org.apache.spark.sql.jdbc.JDBCConnectionUtil
import org.apache.spark.sql.jdbc.TestDataGeneratorimport org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite
import org.apache.spark.sql.DataFrame
import java.sql.Connection
class MyDatabaseTest extends DockerJDBCIntegrationSuite {
val databaseType = "postgresql"
val databaseImage = "postgres:13"
override def beforeAll(): Unit = {
super.beforeAll()
startDockerContainer()
setupTestData()
}
override def afterAll(): Unit = {
try {
stopDockerContainer()
} finally {
super.afterAll()
}
}
test("basic JDBC connectivity") {
val connection = getJdbcConnection()
val df = spark.read
.format("jdbc")
.option("url", getJdbcUrl())
.option("dbtable", "test_table")
.load()
assert(df.count() > 0)
}
}The framework is built around several key components that provide comprehensive testing capabilities:
DockerJDBCIntegrationSuite provides the foundational testing infrastructure with Docker container lifecycle managementDatabaseContainerManager handles Docker container creation, startup, shutdown, and resource cleanupJDBCConnectionUtil provides robust JDBC connection management and query executionDockerTestConfig manages Docker images, timeouts, network settings, and test data configurationsTestDataGenerator creates consistent test datasets across different database systemsThis design enables systematic testing of Spark's JDBC integration across multiple database systems in isolated, reproducible Docker environments with comprehensive validation of connectivity, data operations, authentication, and performance optimizations.
Base testing infrastructure providing Docker container lifecycle management and shared test utilities for database integration testing.
abstract class DockerJDBCIntegrationSuite extends SharedSparkSession {
def startDockerContainer(): Unit
def stopDockerContainer(): Unit
def getJdbcUrl(): String
def getJdbcConnection(): Connection
def setupTestData(): Unit
}Docker container lifecycle management for database testing environments with support for multiple database systems and configurable timeouts.
class DatabaseContainerManager {
def createContainer(dbType: String, imageTag: String): String
def startContainer(containerId: String): ContainerInfo
def stopContainer(containerId: String): Unit
def getConnectionInfo(containerId: String): ConnectionInfo
def configureTimeout(timeout: Duration): Unit
}
case class ContainerInfo(
containerId: String,
jdbcUrl: String,
hostPort: Int,
username: String,
password: String
)Specialized test suites for individual database systems with database-specific functionality testing, SQL dialect compatibility, and unique feature validation.
class PostgreSQLIntegrationSuite extends DockerJDBCIntegrationSuite
class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite
class MariaDBIntegrationSuite extends DockerJDBCIntegrationSuite
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite
class DB2IntegrationSuite extends DockerJDBCIntegrationSuite
class SQLServerIntegrationSuite extends DockerJDBCIntegrationSuiteUtility functions for JDBC connection management, query execution, and resource cleanup with robust error handling and connection validation.
object JDBCConnectionUtil {
def createConnection(url: String, properties: Properties): Connection
def executeQuery(connection: Connection, sql: String): ResultSet
def validateConnection(connection: Connection): Boolean
def closeResources(resources: AutoCloseable*): Unit
}Specialized testing capabilities including cross-database compatibility, DataSource V2 integration, Kerberos authentication, and join pushdown optimization testing.
class CrossDatabaseQuerySuite extends DockerJDBCIntegrationSuite
class DataSourceV2TestSuite extends DockerJDBCIntegrationSuite
class KerberosTestSuite extends DockerJDBCIntegrationSuite
class JoinPushdownTestSuite extends DockerJDBCIntegrationSuitecase class DockerTestConfig(
dockerImageVersions: Map[String, String],
containerTimeouts: Map[String, Duration],
testDataSources: List[TestDataSource],
networkSettings: NetworkConfig
)
case class TestDataSource(
name: String,
schema: StructType,
data: List[Row]
)
case class NetworkConfig(
networkName: String,
driverClass: String,
portRange: Range
)
case class ConnectionInfo(
jdbcUrl: String,
username: String,
password: String,
driverClass: String
)