or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

advanced-features.mdcontainer-management.mdcore-framework.mddatabase-testing.mdindex.mdjdbc-utilities.md
tile.json

tessl/maven-org-apache-spark--spark-docker-integration-tests_2-11

Docker-based integration testing framework for Apache Spark JDBC connectivity with multiple database systems

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-docker-integration-tests_2.11@4.1.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-docker-integration-tests_2-11@4.1.0

index.mddocs/

Spark Docker Integration Tests

A comprehensive Docker-based integration testing framework specifically designed for Apache Spark's JDBC connectivity features. This framework enables automated testing of Spark's ability to connect to and interact with various database systems including PostgreSQL, MySQL, MariaDB, Oracle, DB2, and Microsoft SQL Server by spinning up Docker containers for each database type.

Package Information

  • Package Name: spark-docker-integration-tests_2.11
  • Package Type: maven
  • Language: Scala
  • Build System: Maven
  • Version: 4.1.0-SNAPSHOT
  • Installation: Add Maven dependency:
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-docker-integration-tests_2.11</artifactId>
    <version>4.1.0-SNAPSHOT</version>
    <scope>test</scope>
</dependency>

Core Imports

import org.apache.spark.sql.test.SharedSparkSession
import org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite
import org.apache.spark.sql.jdbc.DatabaseContainerManager
import org.apache.spark.sql.jdbc.JDBCConnectionUtil
import org.apache.spark.sql.jdbc.TestDataGenerator

Basic Usage

import org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite
import org.apache.spark.sql.DataFrame
import java.sql.Connection

class MyDatabaseTest extends DockerJDBCIntegrationSuite {
  val databaseType = "postgresql"
  val databaseImage = "postgres:13"
  
  override def beforeAll(): Unit = {
    super.beforeAll()
    startDockerContainer()
    setupTestData()
  }
  
  override def afterAll(): Unit = {
    try {
      stopDockerContainer()
    } finally {
      super.afterAll()
    }
  }
  
  test("basic JDBC connectivity") {
    val connection = getJdbcConnection()
    val df = spark.read
      .format("jdbc")
      .option("url", getJdbcUrl())
      .option("dbtable", "test_table")
      .load()
    
    assert(df.count() > 0)
  }
}

Architecture

The framework is built around several key components that provide comprehensive testing capabilities:

  • Base Test Framework: DockerJDBCIntegrationSuite provides the foundational testing infrastructure with Docker container lifecycle management
  • Container Management: DatabaseContainerManager handles Docker container creation, startup, shutdown, and resource cleanup
  • Database-Specific Suites: Specialized test classes for each supported database system with database-specific test scenarios
  • Connection Utilities: JDBCConnectionUtil provides robust JDBC connection management and query execution
  • Configuration System: DockerTestConfig manages Docker images, timeouts, network settings, and test data configurations
  • Data Generation: TestDataGenerator creates consistent test datasets across different database systems

This design enables systematic testing of Spark's JDBC integration across multiple database systems in isolated, reproducible Docker environments with comprehensive validation of connectivity, data operations, authentication, and performance optimizations.

Capabilities

Core Test Framework

Base testing infrastructure providing Docker container lifecycle management and shared test utilities for database integration testing.

abstract class DockerJDBCIntegrationSuite extends SharedSparkSession {
  def startDockerContainer(): Unit
  def stopDockerContainer(): Unit
  def getJdbcUrl(): String
  def getJdbcConnection(): Connection
  def setupTestData(): Unit
}

Core Framework

Container Management

Docker container lifecycle management for database testing environments with support for multiple database systems and configurable timeouts.

class DatabaseContainerManager {
  def createContainer(dbType: String, imageTag: String): String
  def startContainer(containerId: String): ContainerInfo
  def stopContainer(containerId: String): Unit
  def getConnectionInfo(containerId: String): ConnectionInfo
  def configureTimeout(timeout: Duration): Unit
}

case class ContainerInfo(
  containerId: String,
  jdbcUrl: String,
  hostPort: Int,
  username: String,
  password: String
)

Container Management

Database-Specific Testing

Specialized test suites for individual database systems with database-specific functionality testing, SQL dialect compatibility, and unique feature validation.

class PostgreSQLIntegrationSuite extends DockerJDBCIntegrationSuite
class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite  
class MariaDBIntegrationSuite extends DockerJDBCIntegrationSuite
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite
class DB2IntegrationSuite extends DockerJDBCIntegrationSuite
class SQLServerIntegrationSuite extends DockerJDBCIntegrationSuite

Database Testing

JDBC Connection Utilities

Utility functions for JDBC connection management, query execution, and resource cleanup with robust error handling and connection validation.

object JDBCConnectionUtil {
  def createConnection(url: String, properties: Properties): Connection
  def executeQuery(connection: Connection, sql: String): ResultSet
  def validateConnection(connection: Connection): Boolean
  def closeResources(resources: AutoCloseable*): Unit
}

JDBC Utilities

Advanced Testing Features

Specialized testing capabilities including cross-database compatibility, DataSource V2 integration, Kerberos authentication, and join pushdown optimization testing.

class CrossDatabaseQuerySuite extends DockerJDBCIntegrationSuite
class DataSourceV2TestSuite extends DockerJDBCIntegrationSuite  
class KerberosTestSuite extends DockerJDBCIntegrationSuite
class JoinPushdownTestSuite extends DockerJDBCIntegrationSuite

Advanced Features

Types

case class DockerTestConfig(
  dockerImageVersions: Map[String, String],
  containerTimeouts: Map[String, Duration],
  testDataSources: List[TestDataSource],
  networkSettings: NetworkConfig
)

case class TestDataSource(
  name: String,
  schema: StructType,
  data: List[Row]
)

case class NetworkConfig(
  networkName: String,
  driverClass: String,
  portRange: Range
)

case class ConnectionInfo(
  jdbcUrl: String,
  username: String,
  password: String,
  driverClass: String
)