or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-pytest-spark

pytest plugin to run the tests with support of pyspark.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pytest-spark@0.8.x

To install, run

npx @tessl/cli install tessl/pypi-pytest-spark@0.8.0

0

# pytest-spark

1

2

A pytest plugin that enables seamless integration of Apache Spark (PySpark) with pytest testing framework. Provides session-scoped fixtures including spark_context and spark_session that can be reused across test sessions, supports flexible Spark configuration through pytest.ini including external library loading via spark.jars.packages, and includes support for both traditional Spark deployments and modern Spark Connect architectures.

3

4

## Package Information

5

6

- **Package Name**: pytest-spark

7

- **Language**: Python

8

- **Installation**: `pip install pytest-spark`

9

- **Plugin Entry Point**: Automatically discovered by pytest as 'spark' plugin

10

11

## Core Imports

12

13

```python

14

import pytest

15

```

16

17

The plugin automatically registers fixtures when installed:

18

19

```python

20

def test_my_case(spark_context):

21

# spark_context fixture available automatically

22

pass

23

24

def test_spark_session_dataframe(spark_session):

25

# spark_session fixture available automatically

26

pass

27

```

28

29

## Basic Usage

30

31

```python

32

# Example test using spark_context fixture

33

def test_rdd_operations(spark_context):

34

test_rdd = spark_context.parallelize([1, 2, 3, 4])

35

result = test_rdd.map(lambda x: x * 2).collect()

36

assert result == [2, 4, 6, 8]

37

38

# Example test using spark_session fixture (Spark 2.0+)

39

def test_dataframe_operations(spark_session):

40

test_df = spark_session.createDataFrame([[1, 3], [2, 4]], "a: int, b: int")

41

result = test_df.select("a").collect()

42

assert len(result) == 2

43

```

44

45

## Configuration

46

47

### Command Line Options

48

49

```bash

50

# Specify Spark installation directory

51

pytest --spark_home=/opt/spark

52

53

# Specify Spark Connect server URL

54

pytest --spark_connect_url=sc://localhost:15002

55

```

56

57

### pytest.ini Configuration

58

59

```ini

60

[pytest]

61

spark_home = /opt/spark

62

spark_connect_url = sc://localhost:15002

63

spark_options =

64

spark.app.name: my-pytest-spark-tests

65

spark.executor.instances: 1

66

spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0

67

```

68

69

## Capabilities

70

71

### Spark Context Fixture

72

73

Creates a SparkContext instance with reduced logging that persists across the entire test session.

74

75

```python { .api }

76

@pytest.fixture(scope='session')

77

def spark_context(_spark_session):

78

"""

79

Return a SparkContext instance with reduced logging (session scope).

80

81

Note: Not supported with Spark Connect functionality.

82

83

Returns:

84

SparkContext: Configured SparkContext instance

85

86

Raises:

87

NotImplemented: If used in Spark Connect mode

88

"""

89

```

90

91

### Spark Session Fixture

92

93

Creates a Hive-enabled SparkSession instance with reduced logging that persists across the entire test session.

94

95

```python { .api }

96

@pytest.fixture(scope='session')

97

def spark_session(_spark_session):

98

"""

99

Return a Hive enabled SparkSession instance with reduced logging (session scope).

100

101

Available from Spark 2.0 onwards.

102

103

Returns:

104

SparkSession: Configured SparkSession instance with Hive support

105

106

Raises:

107

Exception: If used with Spark versions < 2.0

108

"""

109

```

110

111

112

### Pytest Integration Hooks

113

114

Integration hooks that pytest automatically calls to configure Spark support.

115

116

```python { .api }

117

def pytest_addoption(parser):

118

"""

119

Add command-line and ini options for spark configuration.

120

121

Args:

122

parser: pytest argument parser

123

"""

124

125

def pytest_configure(config):

126

"""

127

Configure Spark based on pytest configuration.

128

129

Args:

130

config: pytest configuration object

131

"""

132

133

def pytest_report_header(config):

134

"""

135

Add Spark version and configuration to pytest report header.

136

137

Args:

138

config: pytest configuration object

139

140

Returns:

141

str: Header lines with Spark information

142

"""

143

```

144

145

## Spark Connect Support

146

147

For remote Spark server execution (requires Spark 3.4+ with pyspark[connect] or pyspark-connect):

148

149

- Supports Spark Connect mode for remote server execution

150

- Automatically disables incompatible configuration options in Connect mode

151

- spark_context fixture raises NotImplemented error in Connect mode (RDD API not supported)

152

- spark_session fixture works normally with Connect servers

153

154

### Spark Connect Configuration

155

156

```ini

157

[pytest]

158

spark_connect_url = sc://remote-spark-server:15002

159

```

160

161

Or via environment variable:

162

```bash

163

export SPARK_REMOTE=sc://remote-spark-server:15002

164

```

165

166

## Default Configuration

167

168

The plugin provides optimized defaults for testing environments that minimize resource usage while maintaining functionality:

169

170

```python { .api }

171

DEFAULTS = {

172

'spark.app.name': 'pytest-spark',

173

'spark.default.parallelism': 1,

174

'spark.dynamicAllocation.enabled': 'false',

175

'spark.executor.cores': 1,

176

'spark.executor.instances': 1,

177

'spark.io.compression.codec': 'lz4',

178

'spark.rdd.compress': 'false',

179

'spark.sql.shuffle.partitions': 1,

180

'spark.shuffle.compress': 'false',

181

'spark.sql.catalogImplementation': 'hive'

182

}

183

```

184

185

These can be overridden via spark_options in pytest.ini.

186

187

## Error Handling

188

189

Common exceptions and error conditions:

190

191

- **OSError**: Raised if specified SPARK_HOME path doesn't exist

192

- **Exception**: Raised if spark_session fixture used with unsupported Spark version (< 2.0)

193

- **NotImplemented**: Raised if spark_context used in Spark Connect mode

194

- **ImportError**: Handled gracefully when pyspark components are unavailable

195

196

## Dependencies

197

198

- **pytest**: Core testing framework (required)

199

- **findspark**: Spark installation discovery (required)

200

- **pyspark**: Apache Spark Python API (runtime dependency, must be available)

201

- **pyspark[connect]** or **pyspark-connect**: For Spark Connect functionality (optional)