or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/maven-org-apache-spark--spark-docker-integration-tests_2-10

Docker integration tests for Apache Spark providing automated JDBC database testing with containerized environments.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-docker-integration-tests_2.10@1.6.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-docker-integration-tests_2-10@1.6.0

index.mddocs/

Spark Docker Integration Tests

Spark Docker Integration Tests provides a Docker-based testing framework for validating Apache Spark's JDBC functionality with various database systems. It automates the creation and management of database containers, enabling comprehensive integration testing of Spark's SQL capabilities in isolated, reproducible environments.

Package Information

  • Package Name: spark-docker-integration-tests_2.10
  • Package Type: maven
  • Language: Scala
  • Installation: Available as part of Apache Spark source distribution
  • Maven Coordinates: org.apache.spark:spark-docker-integration-tests_2.10:1.6.3

Core Imports

import org.apache.spark.sql.jdbc.{DockerJDBCIntegrationSuite, DatabaseOnDocker}
import org.apache.spark.util.DockerUtils
import org.apache.spark.tags.DockerTest
import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.test.SharedSQLContext
import org.scalatest.{BeforeAndAfterAll, Eventually}
import com.spotify.docker.client.DockerClient
import java.sql.Connection
import java.util.Properties

Basic Usage

import org.apache.spark.sql.jdbc.{DockerJDBCIntegrationSuite, DatabaseOnDocker}
import java.sql.Connection

// Define a database configuration
val mysqlConfig = new DatabaseOnDocker {
  override val imageName = "mysql:5.7.9"
  override val env = Map("MYSQL_ROOT_PASSWORD" -> "rootpass")
  override val jdbcPort = 3306
  override def getJdbcUrl(ip: String, port: Int): String =
    s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass"
}

// Create an integration test suite
class MyIntegrationSuite extends DockerJDBCIntegrationSuite {
  override val db = mysqlConfig
  
  override def dataPreparation(conn: Connection): Unit = {
    conn.prepareStatement("CREATE TABLE test (id INT, name VARCHAR(50))").executeUpdate()
    conn.prepareStatement("INSERT INTO test VALUES (1, 'test')").executeUpdate()
  }
  
  test("Basic connectivity") {
    val df = sqlContext.read.jdbc(jdbcUrl, "test", new Properties)
    assert(df.collect().length > 0)
  }
}

Architecture

The framework is built around several key components:

  • Docker Management: Automated container lifecycle management using Spotify Docker Client
  • Database Abstraction: DatabaseOnDocker trait provides database-specific configuration
  • Test Framework Integration: Seamless integration with ScalaTest and Spark test utilities
  • Network Configuration: Automatic port binding and IP address discovery for various Docker environments
  • Resource Management: Comprehensive cleanup of Docker containers and database connections

Capabilities

Database Configuration Interface

Abstract interface for defining database-specific Docker container configurations.

abstract class DatabaseOnDocker {
  /**
   * The docker image to be pulled.
   */
  val imageName: String
  
  /**
   * Environment variables to set inside of the Docker container while launching it.
   */
  val env: Map[String, String]
  
  /**
   * The container-internal JDBC port that the database listens on.
   */
  val jdbcPort: Int
  
  /**
   * Return a JDBC URL that connects to the database running at the given IP address and port.
   */
  def getJdbcUrl(ip: String, port: Int): String
}
  • imageName: Docker image name to pull (e.g., "mysql:5.7.9", "postgres:9.4.5")
  • env: Environment variables to set in the container (e.g., database passwords)
  • jdbcPort: Port number the database listens on inside the container
  • getJdbcUrl(): Constructs JDBC URL for connecting to the database

Integration Test Framework

Base class providing complete Docker-based integration testing infrastructure for JDBC databases.

abstract class DockerJDBCIntegrationSuite 
  extends SparkFunSuite 
  with BeforeAndAfterAll 
  with Eventually 
  with SharedSQLContext {
  
  val db: DatabaseOnDocker
  private var docker: DockerClient
  private var containerId: String
  protected var jdbcUrl: String
  
  /**
   * Prepare databases and tables for testing.
   */
  def dataPreparation(connection: Connection): Unit
  
  override def beforeAll(): Unit
  override def afterAll(): Unit
}
  • db: Database configuration implementing DatabaseOnDocker
  • docker: DockerClient instance for container management (private field)
  • containerId: Unique identifier for the created container (private field)
  • jdbcUrl: JDBC URL available after container setup (protected field)
  • dataPreparation(): Abstract method for setting up test data and schema
  • beforeAll(): Handles Docker client setup, image pulling, container creation, and connection establishment
  • afterAll(): Cleans up Docker containers and closes connections

The beforeAll() method performs these operations:

  1. Initializes Docker client and verifies Docker connectivity
  2. Pulls the specified Docker image if not already available
  3. Configures networking with automatic port binding
  4. Creates and starts the database container
  5. Waits for database to accept connections (with 60-second timeout)
  6. Calls dataPreparation() for test data setup

The afterAll() method ensures proper cleanup:

  1. Kills and removes the Docker container
  2. Closes the Docker client connection
  3. Handles cleanup errors gracefully with logging

MySQL Integration Testing

Concrete implementation providing MySQL-specific integration testing capabilities.

class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite {
  override val db: DatabaseOnDocker
  override def dataPreparation(connection: Connection): Unit
}

The MySQL implementation uses:

  • Docker image: mysql:5.7.9
  • Environment: MYSQL_ROOT_PASSWORD=rootpass
  • JDBC Port: 3306
  • JDBC URL format: jdbc:mysql://host:port/mysql?user=root&password=rootpass

Test coverage includes:

  • Basic JDBC connectivity and data reading
  • Numeric type mappings (BIT, SMALLINT, MEDIUMINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE)
  • Date/time type mappings (DATE, TIME, DATETIME, TIMESTAMP, YEAR)
  • String type mappings (CHAR, VARCHAR, TEXT variants, BINARY, VARBINARY, BLOB)
  • Write operations and data persistence

PostgreSQL Integration Testing

Concrete implementation providing PostgreSQL-specific integration testing capabilities.

class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
  override val db: DatabaseOnDocker
  override def dataPreparation(connection: Connection): Unit
}

The PostgreSQL implementation uses:

  • Docker image: postgres:9.4.5
  • Environment: POSTGRES_PASSWORD=rootpass
  • JDBC Port: 5432
  • JDBC URL format: jdbc:postgresql://host:port/postgres?user=postgres&password=rootpass

Test coverage includes:

  • PostgreSQL-specific type mappings
  • Array types (integer[], text[], real[])
  • Network types (inet, cidr)
  • Binary data (bytea)
  • Boolean types

Oracle Integration Testing

Concrete implementation providing Oracle-specific integration testing capabilities (typically disabled due to licensing).

class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLContext {
  override val db: DatabaseOnDocker
  override def dataPreparation(connection: Connection): Unit
}

The Oracle implementation uses:

  • Docker image: wnameless/oracle-xe-11g:latest
  • Environment: ORACLE_ROOT_PASSWORD=oracle
  • JDBC Port: 1521
  • JDBC URL format: jdbc:oracle:thin:system/oracle@//host:port/xe

Note: Oracle tests are typically ignored in standard builds due to Oracle JDBC driver licensing restrictions. The implementation requires manual installation of the Oracle JDBC driver (ojdbc6-11.2.0.2.0.jar) in the local Maven repository.

To enable Oracle testing:

  1. Pull Oracle Docker image: docker pull wnameless/oracle-xe-11g
  2. Download and install Oracle JDBC driver in Maven local repository
  3. Increase timeout values for Oracle container startup
  4. Run with: ./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"

Type Mappings

The framework validates Spark's JDBC type mappings for various database-specific types:

MySQL Type Mappings

  • Numeric: BIT → Boolean/Long, SMALLINT → Integer, MEDIUMINT → Integer, INT → Integer, BIGINT → Long, DECIMAL → BigDecimal, FLOAT → Double, DOUBLE → Double
  • Date/Time: DATE → java.sql.Date, TIME → java.sql.Timestamp, DATETIME → java.sql.Timestamp, TIMESTAMP → java.sql.Timestamp, YEAR → java.sql.Date
  • String/Binary: CHAR → String, VARCHAR → String, TEXT variants → String, BINARY → byte[], VARBINARY → byte[], BLOB → byte[]

PostgreSQL Type Mappings

  • Array Types: integer[] → Java arrays, text[] → String arrays, real[] → Float arrays
  • Network Types: inet → String representation, cidr → String representation
  • Binary: bytea → byte[]
  • Boolean: boolean → Boolean

Dependencies

External Dependencies

  • com.spotify:docker-client (shaded classifier) - Docker container management and lifecycle
  • mysql:mysql-connector-java - MySQL JDBC driver
  • org.postgresql:postgresql - PostgreSQL JDBC driver
  • org.apache.httpcomponents:httpclient (4.5) - HTTP client for Docker API communication
  • org.apache.httpcomponents:httpcore (4.4.1) - HTTP core components
  • com.google.guava:guava (18.0) - Utility libraries
  • com.sun.jersey:jersey-server (1.19) - Jersey server components
  • com.sun.jersey:jersey-core (1.19) - Jersey core components
  • com.sun.jersey:jersey-servlet (1.19) - Jersey servlet support
  • com.sun.jersey:jersey-json (1.19) - Jersey JSON support

Internal Spark Dependencies

  • org.apache.spark:spark-core_2.10 - Core Spark functionality
  • org.apache.spark:spark-sql_2.10 - Spark SQL engine
  • org.apache.spark:spark-test-tags_2.10 - Test categorization (@DockerTest annotation)

Environment Requirements

  • Docker must be installed and running
  • Docker images for target databases must be available or pullable
  • Network access for Docker registry and container networking
  • Sufficient memory for running database containers alongside Spark tests
  • Available ports for container port binding

Test Annotations

@DockerTest
class YourIntegrationSuite extends DockerJDBCIntegrationSuite {
  // Test implementation
}

The @DockerTest annotation categorizes tests that require Docker infrastructure, allowing for selective test execution in environments where Docker may not be available.

Type Definitions

// External types from com.spotify.docker-client
trait DockerClient {
  def ping(): Unit
  def inspectImage(image: String): Image
  def pull(image: String): Unit
  def createContainer(config: ContainerConfig): ContainerCreation
  def startContainer(containerId: String): Unit
  def killContainer(containerId: String): Unit
  def removeContainer(containerId: String): Unit
  def close(): Unit
}

// Standard Java types
class Properties extends java.util.Hashtable[Object, Object]

// Scala collections
type Map[K, V] = scala.collection.immutable.Map[K, V]
type Seq[T] = scala.collection.immutable.Seq[T]

Error Handling

The framework provides comprehensive error handling for:

  • Docker connectivity issues - Graceful failure when Docker is unavailable
  • Image availability - Automatic pulling of missing Docker images
  • Container startup failures - Proper cleanup and error reporting
  • Database connection timeouts - Configurable timeout periods with retry logic
  • Resource cleanup - Guaranteed cleanup even when tests fail or are interrupted

Limitations

  • Oracle licensing - Oracle JDBC driver must be manually installed due to licensing restrictions
  • Docker dependency - Tests require Docker to be available and running
  • Platform compatibility - Tested primarily on Linux and macOS environments
  • Resource requirements - Database containers require additional memory and CPU resources
  • Network configuration - May require additional configuration in complex Docker networking environments