Docker integration tests for Apache Spark providing automated JDBC database testing with containerized environments.
npx @tessl/cli install tessl/maven-org-apache-spark--spark-docker-integration-tests_2-10@1.6.0Spark Docker Integration Tests provides a Docker-based testing framework for validating Apache Spark's JDBC functionality with various database systems. It automates the creation and management of database containers, enabling comprehensive integration testing of Spark's SQL capabilities in isolated, reproducible environments.
org.apache.spark:spark-docker-integration-tests_2.10:1.6.3import org.apache.spark.sql.jdbc.{DockerJDBCIntegrationSuite, DatabaseOnDocker}
import org.apache.spark.util.DockerUtils
import org.apache.spark.tags.DockerTest
import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.test.SharedSQLContext
import org.scalatest.{BeforeAndAfterAll, Eventually}
import com.spotify.docker.client.DockerClient
import java.sql.Connection
import java.util.Propertiesimport org.apache.spark.sql.jdbc.{DockerJDBCIntegrationSuite, DatabaseOnDocker}
import java.sql.Connection
// Define a database configuration
val mysqlConfig = new DatabaseOnDocker {
override val imageName = "mysql:5.7.9"
override val env = Map("MYSQL_ROOT_PASSWORD" -> "rootpass")
override val jdbcPort = 3306
override def getJdbcUrl(ip: String, port: Int): String =
s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass"
}
// Create an integration test suite
class MyIntegrationSuite extends DockerJDBCIntegrationSuite {
override val db = mysqlConfig
override def dataPreparation(conn: Connection): Unit = {
conn.prepareStatement("CREATE TABLE test (id INT, name VARCHAR(50))").executeUpdate()
conn.prepareStatement("INSERT INTO test VALUES (1, 'test')").executeUpdate()
}
test("Basic connectivity") {
val df = sqlContext.read.jdbc(jdbcUrl, "test", new Properties)
assert(df.collect().length > 0)
}
}The framework is built around several key components:
DatabaseOnDocker trait provides database-specific configurationAbstract interface for defining database-specific Docker container configurations.
abstract class DatabaseOnDocker {
/**
* The docker image to be pulled.
*/
val imageName: String
/**
* Environment variables to set inside of the Docker container while launching it.
*/
val env: Map[String, String]
/**
* The container-internal JDBC port that the database listens on.
*/
val jdbcPort: Int
/**
* Return a JDBC URL that connects to the database running at the given IP address and port.
*/
def getJdbcUrl(ip: String, port: Int): String
}imageName: Docker image name to pull (e.g., "mysql:5.7.9", "postgres:9.4.5")env: Environment variables to set in the container (e.g., database passwords)jdbcPort: Port number the database listens on inside the containergetJdbcUrl(): Constructs JDBC URL for connecting to the databaseBase class providing complete Docker-based integration testing infrastructure for JDBC databases.
abstract class DockerJDBCIntegrationSuite
extends SparkFunSuite
with BeforeAndAfterAll
with Eventually
with SharedSQLContext {
val db: DatabaseOnDocker
private var docker: DockerClient
private var containerId: String
protected var jdbcUrl: String
/**
* Prepare databases and tables for testing.
*/
def dataPreparation(connection: Connection): Unit
override def beforeAll(): Unit
override def afterAll(): Unit
}db: Database configuration implementing DatabaseOnDockerdocker: DockerClient instance for container management (private field)containerId: Unique identifier for the created container (private field)jdbcUrl: JDBC URL available after container setup (protected field)dataPreparation(): Abstract method for setting up test data and schemabeforeAll(): Handles Docker client setup, image pulling, container creation, and connection establishmentafterAll(): Cleans up Docker containers and closes connectionsThe beforeAll() method performs these operations:
dataPreparation() for test data setupThe afterAll() method ensures proper cleanup:
Concrete implementation providing MySQL-specific integration testing capabilities.
class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite {
override val db: DatabaseOnDocker
override def dataPreparation(connection: Connection): Unit
}The MySQL implementation uses:
mysql:5.7.9MYSQL_ROOT_PASSWORD=rootpassjdbc:mysql://host:port/mysql?user=root&password=rootpassTest coverage includes:
Concrete implementation providing PostgreSQL-specific integration testing capabilities.
class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
override val db: DatabaseOnDocker
override def dataPreparation(connection: Connection): Unit
}The PostgreSQL implementation uses:
postgres:9.4.5POSTGRES_PASSWORD=rootpassjdbc:postgresql://host:port/postgres?user=postgres&password=rootpassTest coverage includes:
Concrete implementation providing Oracle-specific integration testing capabilities (typically disabled due to licensing).
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLContext {
override val db: DatabaseOnDocker
override def dataPreparation(connection: Connection): Unit
}The Oracle implementation uses:
wnameless/oracle-xe-11g:latestORACLE_ROOT_PASSWORD=oraclejdbc:oracle:thin:system/oracle@//host:port/xeNote: Oracle tests are typically ignored in standard builds due to Oracle JDBC driver licensing restrictions. The implementation requires manual installation of the Oracle JDBC driver (ojdbc6-11.2.0.2.0.jar) in the local Maven repository.
To enable Oracle testing:
docker pull wnameless/oracle-xe-11g./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"The framework validates Spark's JDBC type mappings for various database-specific types:
@DockerTest
class YourIntegrationSuite extends DockerJDBCIntegrationSuite {
// Test implementation
}The @DockerTest annotation categorizes tests that require Docker infrastructure, allowing for selective test execution in environments where Docker may not be available.
// External types from com.spotify.docker-client
trait DockerClient {
def ping(): Unit
def inspectImage(image: String): Image
def pull(image: String): Unit
def createContainer(config: ContainerConfig): ContainerCreation
def startContainer(containerId: String): Unit
def killContainer(containerId: String): Unit
def removeContainer(containerId: String): Unit
def close(): Unit
}
// Standard Java types
class Properties extends java.util.Hashtable[Object, Object]
// Scala collections
type Map[K, V] = scala.collection.immutable.Map[K, V]
type Seq[T] = scala.collection.immutable.Seq[T]The framework provides comprehensive error handling for: