or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-docker-integration-tests_2-10

Docker integration tests for Apache Spark providing automated JDBC database testing with containerized environments.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-docker-integration-tests_2.10@1.6.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-docker-integration-tests_2-10@1.6.0

0

# Spark Docker Integration Tests

1

2

Spark Docker Integration Tests provides a Docker-based testing framework for validating Apache Spark's JDBC functionality with various database systems. It automates the creation and management of database containers, enabling comprehensive integration testing of Spark's SQL capabilities in isolated, reproducible environments.

3

4

## Package Information

5

6

- **Package Name**: spark-docker-integration-tests_2.10

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Installation**: Available as part of Apache Spark source distribution

10

- **Maven Coordinates**: `org.apache.spark:spark-docker-integration-tests_2.10:1.6.3`

11

12

## Core Imports

13

14

```scala

15

import org.apache.spark.sql.jdbc.{DockerJDBCIntegrationSuite, DatabaseOnDocker}

16

import org.apache.spark.util.DockerUtils

17

import org.apache.spark.tags.DockerTest

18

import org.apache.spark.SparkFunSuite

19

import org.apache.spark.sql.test.SharedSQLContext

20

import org.scalatest.{BeforeAndAfterAll, Eventually}

21

import com.spotify.docker.client.DockerClient

22

import java.sql.Connection

23

import java.util.Properties

24

```

25

26

## Basic Usage

27

28

```scala

29

import org.apache.spark.sql.jdbc.{DockerJDBCIntegrationSuite, DatabaseOnDocker}

30

import java.sql.Connection

31

32

// Define a database configuration

33

val mysqlConfig = new DatabaseOnDocker {

34

override val imageName = "mysql:5.7.9"

35

override val env = Map("MYSQL_ROOT_PASSWORD" -> "rootpass")

36

override val jdbcPort = 3306

37

override def getJdbcUrl(ip: String, port: Int): String =

38

s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass"

39

}

40

41

// Create an integration test suite

42

class MyIntegrationSuite extends DockerJDBCIntegrationSuite {

43

override val db = mysqlConfig

44

45

override def dataPreparation(conn: Connection): Unit = {

46

conn.prepareStatement("CREATE TABLE test (id INT, name VARCHAR(50))").executeUpdate()

47

conn.prepareStatement("INSERT INTO test VALUES (1, 'test')").executeUpdate()

48

}

49

50

test("Basic connectivity") {

51

val df = sqlContext.read.jdbc(jdbcUrl, "test", new Properties)

52

assert(df.collect().length > 0)

53

}

54

}

55

```

56

57

## Architecture

58

59

The framework is built around several key components:

60

61

- **Docker Management**: Automated container lifecycle management using Spotify Docker Client

62

- **Database Abstraction**: `DatabaseOnDocker` trait provides database-specific configuration

63

- **Test Framework Integration**: Seamless integration with ScalaTest and Spark test utilities

64

- **Network Configuration**: Automatic port binding and IP address discovery for various Docker environments

65

- **Resource Management**: Comprehensive cleanup of Docker containers and database connections

66

67

## Capabilities

68

69

### Database Configuration Interface

70

71

Abstract interface for defining database-specific Docker container configurations.

72

73

```scala { .api }

74

abstract class DatabaseOnDocker {

75

/**

76

* The docker image to be pulled.

77

*/

78

val imageName: String

79

80

/**

81

* Environment variables to set inside of the Docker container while launching it.

82

*/

83

val env: Map[String, String]

84

85

/**

86

* The container-internal JDBC port that the database listens on.

87

*/

88

val jdbcPort: Int

89

90

/**

91

* Return a JDBC URL that connects to the database running at the given IP address and port.

92

*/

93

def getJdbcUrl(ip: String, port: Int): String

94

}

95

```

96

97

- `imageName`: Docker image name to pull (e.g., "mysql:5.7.9", "postgres:9.4.5")

98

- `env`: Environment variables to set in the container (e.g., database passwords)

99

- `jdbcPort`: Port number the database listens on inside the container

100

- `getJdbcUrl()`: Constructs JDBC URL for connecting to the database

101

102

### Integration Test Framework

103

104

Base class providing complete Docker-based integration testing infrastructure for JDBC databases.

105

106

```scala { .api }

107

abstract class DockerJDBCIntegrationSuite

108

extends SparkFunSuite

109

with BeforeAndAfterAll

110

with Eventually

111

with SharedSQLContext {

112

113

val db: DatabaseOnDocker

114

private var docker: DockerClient

115

private var containerId: String

116

protected var jdbcUrl: String

117

118

/**

119

* Prepare databases and tables for testing.

120

*/

121

def dataPreparation(connection: Connection): Unit

122

123

override def beforeAll(): Unit

124

override def afterAll(): Unit

125

}

126

```

127

128

- `db`: Database configuration implementing `DatabaseOnDocker`

129

- `docker`: DockerClient instance for container management (private field)

130

- `containerId`: Unique identifier for the created container (private field)

131

- `jdbcUrl`: JDBC URL available after container setup (protected field)

132

- `dataPreparation()`: Abstract method for setting up test data and schema

133

- `beforeAll()`: Handles Docker client setup, image pulling, container creation, and connection establishment

134

- `afterAll()`: Cleans up Docker containers and closes connections

135

136

The `beforeAll()` method performs these operations:

137

1. Initializes Docker client and verifies Docker connectivity

138

2. Pulls the specified Docker image if not already available

139

3. Configures networking with automatic port binding

140

4. Creates and starts the database container

141

5. Waits for database to accept connections (with 60-second timeout)

142

6. Calls `dataPreparation()` for test data setup

143

144

The `afterAll()` method ensures proper cleanup:

145

1. Kills and removes the Docker container

146

2. Closes the Docker client connection

147

3. Handles cleanup errors gracefully with logging

148

149

### MySQL Integration Testing

150

151

Concrete implementation providing MySQL-specific integration testing capabilities.

152

153

```scala { .api }

154

class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite {

155

override val db: DatabaseOnDocker

156

override def dataPreparation(connection: Connection): Unit

157

}

158

```

159

160

The MySQL implementation uses:

161

- Docker image: `mysql:5.7.9`

162

- Environment: `MYSQL_ROOT_PASSWORD=rootpass`

163

- JDBC Port: 3306

164

- JDBC URL format: `jdbc:mysql://host:port/mysql?user=root&password=rootpass`

165

166

Test coverage includes:

167

- Basic JDBC connectivity and data reading

168

- Numeric type mappings (BIT, SMALLINT, MEDIUMINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE)

169

- Date/time type mappings (DATE, TIME, DATETIME, TIMESTAMP, YEAR)

170

- String type mappings (CHAR, VARCHAR, TEXT variants, BINARY, VARBINARY, BLOB)

171

- Write operations and data persistence

172

173

### PostgreSQL Integration Testing

174

175

Concrete implementation providing PostgreSQL-specific integration testing capabilities.

176

177

```scala { .api }

178

class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {

179

override val db: DatabaseOnDocker

180

override def dataPreparation(connection: Connection): Unit

181

}

182

```

183

184

The PostgreSQL implementation uses:

185

- Docker image: `postgres:9.4.5`

186

- Environment: `POSTGRES_PASSWORD=rootpass`

187

- JDBC Port: 5432

188

- JDBC URL format: `jdbc:postgresql://host:port/postgres?user=postgres&password=rootpass`

189

190

Test coverage includes:

191

- PostgreSQL-specific type mappings

192

- Array types (integer[], text[], real[])

193

- Network types (inet, cidr)

194

- Binary data (bytea)

195

- Boolean types

196

197

### Oracle Integration Testing

198

199

Concrete implementation providing Oracle-specific integration testing capabilities (typically disabled due to licensing).

200

201

```scala { .api }

202

class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLContext {

203

override val db: DatabaseOnDocker

204

override def dataPreparation(connection: Connection): Unit

205

}

206

```

207

208

The Oracle implementation uses:

209

- Docker image: `wnameless/oracle-xe-11g:latest`

210

- Environment: `ORACLE_ROOT_PASSWORD=oracle`

211

- JDBC Port: 1521

212

- JDBC URL format: `jdbc:oracle:thin:system/oracle@//host:port/xe`

213

214

**Note**: Oracle tests are typically ignored in standard builds due to Oracle JDBC driver licensing restrictions. The implementation requires manual installation of the Oracle JDBC driver (`ojdbc6-11.2.0.2.0.jar`) in the local Maven repository.

215

216

To enable Oracle testing:

217

1. Pull Oracle Docker image: `docker pull wnameless/oracle-xe-11g`

218

2. Download and install Oracle JDBC driver in Maven local repository

219

3. Increase timeout values for Oracle container startup

220

4. Run with: `./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"`

221

222

## Type Mappings

223

224

The framework validates Spark's JDBC type mappings for various database-specific types:

225

226

### MySQL Type Mappings

227

- **Numeric**: BIT → Boolean/Long, SMALLINT → Integer, MEDIUMINT → Integer, INT → Integer, BIGINT → Long, DECIMAL → BigDecimal, FLOAT → Double, DOUBLE → Double

228

- **Date/Time**: DATE → java.sql.Date, TIME → java.sql.Timestamp, DATETIME → java.sql.Timestamp, TIMESTAMP → java.sql.Timestamp, YEAR → java.sql.Date

229

- **String/Binary**: CHAR → String, VARCHAR → String, TEXT variants → String, BINARY → byte[], VARBINARY → byte[], BLOB → byte[]

230

231

### PostgreSQL Type Mappings

232

- **Array Types**: integer[] → Java arrays, text[] → String arrays, real[] → Float arrays

233

- **Network Types**: inet → String representation, cidr → String representation

234

- **Binary**: bytea → byte[]

235

- **Boolean**: boolean → Boolean

236

237

## Dependencies

238

239

### External Dependencies

240

- **com.spotify:docker-client** (shaded classifier) - Docker container management and lifecycle

241

- **mysql:mysql-connector-java** - MySQL JDBC driver

242

- **org.postgresql:postgresql** - PostgreSQL JDBC driver

243

- **org.apache.httpcomponents:httpclient** (4.5) - HTTP client for Docker API communication

244

- **org.apache.httpcomponents:httpcore** (4.4.1) - HTTP core components

245

- **com.google.guava:guava** (18.0) - Utility libraries

246

- **com.sun.jersey:jersey-server** (1.19) - Jersey server components

247

- **com.sun.jersey:jersey-core** (1.19) - Jersey core components

248

- **com.sun.jersey:jersey-servlet** (1.19) - Jersey servlet support

249

- **com.sun.jersey:jersey-json** (1.19) - Jersey JSON support

250

251

### Internal Spark Dependencies

252

- **org.apache.spark:spark-core_2.10** - Core Spark functionality

253

- **org.apache.spark:spark-sql_2.10** - Spark SQL engine

254

- **org.apache.spark:spark-test-tags_2.10** - Test categorization (@DockerTest annotation)

255

256

## Environment Requirements

257

258

- **Docker** must be installed and running

259

- **Docker images** for target databases must be available or pullable

260

- **Network access** for Docker registry and container networking

261

- **Sufficient memory** for running database containers alongside Spark tests

262

- **Available ports** for container port binding

263

264

## Test Annotations

265

266

```scala { .api }

267

@DockerTest

268

class YourIntegrationSuite extends DockerJDBCIntegrationSuite {

269

// Test implementation

270

}

271

```

272

273

The `@DockerTest` annotation categorizes tests that require Docker infrastructure, allowing for selective test execution in environments where Docker may not be available.

274

275

## Type Definitions

276

277

```scala { .api }

278

// External types from com.spotify.docker-client

279

trait DockerClient {

280

def ping(): Unit

281

def inspectImage(image: String): Image

282

def pull(image: String): Unit

283

def createContainer(config: ContainerConfig): ContainerCreation

284

def startContainer(containerId: String): Unit

285

def killContainer(containerId: String): Unit

286

def removeContainer(containerId: String): Unit

287

def close(): Unit

288

}

289

290

// Standard Java types

291

class Properties extends java.util.Hashtable[Object, Object]

292

293

// Scala collections

294

type Map[K, V] = scala.collection.immutable.Map[K, V]

295

type Seq[T] = scala.collection.immutable.Seq[T]

296

```

297

298

## Error Handling

299

300

The framework provides comprehensive error handling for:

301

- **Docker connectivity issues** - Graceful failure when Docker is unavailable

302

- **Image availability** - Automatic pulling of missing Docker images

303

- **Container startup failures** - Proper cleanup and error reporting

304

- **Database connection timeouts** - Configurable timeout periods with retry logic

305

- **Resource cleanup** - Guaranteed cleanup even when tests fail or are interrupted

306

307

## Limitations

308

309

- **Oracle licensing** - Oracle JDBC driver must be manually installed due to licensing restrictions

310

- **Docker dependency** - Tests require Docker to be available and running

311

- **Platform compatibility** - Tested primarily on Linux and macOS environments

312

- **Resource requirements** - Database containers require additional memory and CPU resources

313

- **Network configuration** - May require additional configuration in complex Docker networking environments