Docker integration tests for Apache Spark providing automated JDBC database testing with containerized environments.
npx @tessl/cli install tessl/maven-org-apache-spark--spark-docker-integration-tests_2-10@1.6.00
# Spark Docker Integration Tests
1
2
Spark Docker Integration Tests provides a Docker-based testing framework for validating Apache Spark's JDBC functionality with various database systems. It automates the creation and management of database containers, enabling comprehensive integration testing of Spark's SQL capabilities in isolated, reproducible environments.
3
4
## Package Information
5
6
- **Package Name**: spark-docker-integration-tests_2.10
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: Available as part of Apache Spark source distribution
10
- **Maven Coordinates**: `org.apache.spark:spark-docker-integration-tests_2.10:1.6.3`
11
12
## Core Imports
13
14
```scala
15
import org.apache.spark.sql.jdbc.{DockerJDBCIntegrationSuite, DatabaseOnDocker}
16
import org.apache.spark.util.DockerUtils
17
import org.apache.spark.tags.DockerTest
18
import org.apache.spark.SparkFunSuite
19
import org.apache.spark.sql.test.SharedSQLContext
20
import org.scalatest.{BeforeAndAfterAll, Eventually}
21
import com.spotify.docker.client.DockerClient
22
import java.sql.Connection
23
import java.util.Properties
24
```
25
26
## Basic Usage
27
28
```scala
29
import org.apache.spark.sql.jdbc.{DockerJDBCIntegrationSuite, DatabaseOnDocker}
30
import java.sql.Connection
31
32
// Define a database configuration
33
val mysqlConfig = new DatabaseOnDocker {
34
override val imageName = "mysql:5.7.9"
35
override val env = Map("MYSQL_ROOT_PASSWORD" -> "rootpass")
36
override val jdbcPort = 3306
37
override def getJdbcUrl(ip: String, port: Int): String =
38
s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass"
39
}
40
41
// Create an integration test suite
42
class MyIntegrationSuite extends DockerJDBCIntegrationSuite {
43
override val db = mysqlConfig
44
45
override def dataPreparation(conn: Connection): Unit = {
46
conn.prepareStatement("CREATE TABLE test (id INT, name VARCHAR(50))").executeUpdate()
47
conn.prepareStatement("INSERT INTO test VALUES (1, 'test')").executeUpdate()
48
}
49
50
test("Basic connectivity") {
51
val df = sqlContext.read.jdbc(jdbcUrl, "test", new Properties)
52
assert(df.collect().length > 0)
53
}
54
}
55
```
56
57
## Architecture
58
59
The framework is built around several key components:
60
61
- **Docker Management**: Automated container lifecycle management using Spotify Docker Client
62
- **Database Abstraction**: `DatabaseOnDocker` trait provides database-specific configuration
63
- **Test Framework Integration**: Seamless integration with ScalaTest and Spark test utilities
64
- **Network Configuration**: Automatic port binding and IP address discovery for various Docker environments
65
- **Resource Management**: Comprehensive cleanup of Docker containers and database connections
66
67
## Capabilities
68
69
### Database Configuration Interface
70
71
Abstract interface for defining database-specific Docker container configurations.
72
73
```scala { .api }
74
abstract class DatabaseOnDocker {
75
/**
76
* The docker image to be pulled.
77
*/
78
val imageName: String
79
80
/**
81
* Environment variables to set inside of the Docker container while launching it.
82
*/
83
val env: Map[String, String]
84
85
/**
86
* The container-internal JDBC port that the database listens on.
87
*/
88
val jdbcPort: Int
89
90
/**
91
* Return a JDBC URL that connects to the database running at the given IP address and port.
92
*/
93
def getJdbcUrl(ip: String, port: Int): String
94
}
95
```
96
97
- `imageName`: Docker image name to pull (e.g., "mysql:5.7.9", "postgres:9.4.5")
98
- `env`: Environment variables to set in the container (e.g., database passwords)
99
- `jdbcPort`: Port number the database listens on inside the container
100
- `getJdbcUrl()`: Constructs JDBC URL for connecting to the database
101
102
### Integration Test Framework
103
104
Base class providing complete Docker-based integration testing infrastructure for JDBC databases.
105
106
```scala { .api }
107
abstract class DockerJDBCIntegrationSuite
108
extends SparkFunSuite
109
with BeforeAndAfterAll
110
with Eventually
111
with SharedSQLContext {
112
113
val db: DatabaseOnDocker
114
private var docker: DockerClient
115
private var containerId: String
116
protected var jdbcUrl: String
117
118
/**
119
* Prepare databases and tables for testing.
120
*/
121
def dataPreparation(connection: Connection): Unit
122
123
override def beforeAll(): Unit
124
override def afterAll(): Unit
125
}
126
```
127
128
- `db`: Database configuration implementing `DatabaseOnDocker`
129
- `docker`: DockerClient instance for container management (private field)
130
- `containerId`: Unique identifier for the created container (private field)
131
- `jdbcUrl`: JDBC URL available after container setup (protected field)
132
- `dataPreparation()`: Abstract method for setting up test data and schema
133
- `beforeAll()`: Handles Docker client setup, image pulling, container creation, and connection establishment
134
- `afterAll()`: Cleans up Docker containers and closes connections
135
136
The `beforeAll()` method performs these operations:
137
1. Initializes Docker client and verifies Docker connectivity
138
2. Pulls the specified Docker image if not already available
139
3. Configures networking with automatic port binding
140
4. Creates and starts the database container
141
5. Waits for database to accept connections (with 60-second timeout)
142
6. Calls `dataPreparation()` for test data setup
143
144
The `afterAll()` method ensures proper cleanup:
145
1. Kills and removes the Docker container
146
2. Closes the Docker client connection
147
3. Handles cleanup errors gracefully with logging
148
149
### MySQL Integration Testing
150
151
Concrete implementation providing MySQL-specific integration testing capabilities.
152
153
```scala { .api }
154
class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite {
155
override val db: DatabaseOnDocker
156
override def dataPreparation(connection: Connection): Unit
157
}
158
```
159
160
The MySQL implementation uses:
161
- Docker image: `mysql:5.7.9`
162
- Environment: `MYSQL_ROOT_PASSWORD=rootpass`
163
- JDBC Port: 3306
164
- JDBC URL format: `jdbc:mysql://host:port/mysql?user=root&password=rootpass`
165
166
Test coverage includes:
167
- Basic JDBC connectivity and data reading
168
- Numeric type mappings (BIT, SMALLINT, MEDIUMINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE)
169
- Date/time type mappings (DATE, TIME, DATETIME, TIMESTAMP, YEAR)
170
- String type mappings (CHAR, VARCHAR, TEXT variants, BINARY, VARBINARY, BLOB)
171
- Write operations and data persistence
172
173
### PostgreSQL Integration Testing
174
175
Concrete implementation providing PostgreSQL-specific integration testing capabilities.
176
177
```scala { .api }
178
class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
179
override val db: DatabaseOnDocker
180
override def dataPreparation(connection: Connection): Unit
181
}
182
```
183
184
The PostgreSQL implementation uses:
185
- Docker image: `postgres:9.4.5`
186
- Environment: `POSTGRES_PASSWORD=rootpass`
187
- JDBC Port: 5432
188
- JDBC URL format: `jdbc:postgresql://host:port/postgres?user=postgres&password=rootpass`
189
190
Test coverage includes:
191
- PostgreSQL-specific type mappings
192
- Array types (integer[], text[], real[])
193
- Network types (inet, cidr)
194
- Binary data (bytea)
195
- Boolean types
196
197
### Oracle Integration Testing
198
199
Concrete implementation providing Oracle-specific integration testing capabilities (typically disabled due to licensing).
200
201
```scala { .api }
202
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLContext {
203
override val db: DatabaseOnDocker
204
override def dataPreparation(connection: Connection): Unit
205
}
206
```
207
208
The Oracle implementation uses:
209
- Docker image: `wnameless/oracle-xe-11g:latest`
210
- Environment: `ORACLE_ROOT_PASSWORD=oracle`
211
- JDBC Port: 1521
212
- JDBC URL format: `jdbc:oracle:thin:system/oracle@//host:port/xe`
213
214
**Note**: Oracle tests are typically ignored in standard builds due to Oracle JDBC driver licensing restrictions. The implementation requires manual installation of the Oracle JDBC driver (`ojdbc6-11.2.0.2.0.jar`) in the local Maven repository.
215
216
To enable Oracle testing:
217
1. Pull Oracle Docker image: `docker pull wnameless/oracle-xe-11g`
218
2. Download and install Oracle JDBC driver in Maven local repository
219
3. Increase timeout values for Oracle container startup
220
4. Run with: `./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"`
221
222
## Type Mappings
223
224
The framework validates Spark's JDBC type mappings for various database-specific types:
225
226
### MySQL Type Mappings
227
- **Numeric**: BIT → Boolean/Long, SMALLINT → Integer, MEDIUMINT → Integer, INT → Integer, BIGINT → Long, DECIMAL → BigDecimal, FLOAT → Double, DOUBLE → Double
228
- **Date/Time**: DATE → java.sql.Date, TIME → java.sql.Timestamp, DATETIME → java.sql.Timestamp, TIMESTAMP → java.sql.Timestamp, YEAR → java.sql.Date
229
- **String/Binary**: CHAR → String, VARCHAR → String, TEXT variants → String, BINARY → byte[], VARBINARY → byte[], BLOB → byte[]
230
231
### PostgreSQL Type Mappings
232
- **Array Types**: integer[] → Java arrays, text[] → String arrays, real[] → Float arrays
233
- **Network Types**: inet → String representation, cidr → String representation
234
- **Binary**: bytea → byte[]
235
- **Boolean**: boolean → Boolean
236
237
## Dependencies
238
239
### External Dependencies
240
- **com.spotify:docker-client** (shaded classifier) - Docker container management and lifecycle
241
- **mysql:mysql-connector-java** - MySQL JDBC driver
242
- **org.postgresql:postgresql** - PostgreSQL JDBC driver
243
- **org.apache.httpcomponents:httpclient** (4.5) - HTTP client for Docker API communication
244
- **org.apache.httpcomponents:httpcore** (4.4.1) - HTTP core components
245
- **com.google.guava:guava** (18.0) - Utility libraries
246
- **com.sun.jersey:jersey-server** (1.19) - Jersey server components
247
- **com.sun.jersey:jersey-core** (1.19) - Jersey core components
248
- **com.sun.jersey:jersey-servlet** (1.19) - Jersey servlet support
249
- **com.sun.jersey:jersey-json** (1.19) - Jersey JSON support
250
251
### Internal Spark Dependencies
252
- **org.apache.spark:spark-core_2.10** - Core Spark functionality
253
- **org.apache.spark:spark-sql_2.10** - Spark SQL engine
254
- **org.apache.spark:spark-test-tags_2.10** - Test categorization (@DockerTest annotation)
255
256
## Environment Requirements
257
258
- **Docker** must be installed and running
259
- **Docker images** for target databases must be available or pullable
260
- **Network access** for Docker registry and container networking
261
- **Sufficient memory** for running database containers alongside Spark tests
262
- **Available ports** for container port binding
263
264
## Test Annotations
265
266
```scala { .api }
267
@DockerTest
268
class YourIntegrationSuite extends DockerJDBCIntegrationSuite {
269
// Test implementation
270
}
271
```
272
273
The `@DockerTest` annotation categorizes tests that require Docker infrastructure, allowing for selective test execution in environments where Docker may not be available.
274
275
## Type Definitions
276
277
```scala { .api }
278
// External types from com.spotify.docker-client
279
trait DockerClient {
280
def ping(): Unit
281
def inspectImage(image: String): Image
282
def pull(image: String): Unit
283
def createContainer(config: ContainerConfig): ContainerCreation
284
def startContainer(containerId: String): Unit
285
def killContainer(containerId: String): Unit
286
def removeContainer(containerId: String): Unit
287
def close(): Unit
288
}
289
290
// Standard Java types
291
class Properties extends java.util.Hashtable[Object, Object]
292
293
// Scala collections
294
type Map[K, V] = scala.collection.immutable.Map[K, V]
295
type Seq[T] = scala.collection.immutable.Seq[T]
296
```
297
298
## Error Handling
299
300
The framework provides comprehensive error handling for:
301
- **Docker connectivity issues** - Graceful failure when Docker is unavailable
302
- **Image availability** - Automatic pulling of missing Docker images
303
- **Container startup failures** - Proper cleanup and error reporting
304
- **Database connection timeouts** - Configurable timeout periods with retry logic
305
- **Resource cleanup** - Guaranteed cleanup even when tests fail or are interrupted
306
307
## Limitations
308
309
- **Oracle licensing** - Oracle JDBC driver must be manually installed due to licensing restrictions
310
- **Docker dependency** - Tests require Docker to be available and running
311
- **Platform compatibility** - Tested primarily on Linux and macOS environments
312
- **Resource requirements** - Database containers require additional memory and CPU resources
313
- **Network configuration** - May require additional configuration in complex Docker networking environments