pytest plugin to run the tests with support of pyspark.
npx @tessl/cli install tessl/pypi-pytest-spark@0.8.00
# pytest-spark
1
2
A pytest plugin that enables seamless integration of Apache Spark (PySpark) with pytest testing framework. Provides session-scoped fixtures including spark_context and spark_session that can be reused across test sessions, supports flexible Spark configuration through pytest.ini including external library loading via spark.jars.packages, and includes support for both traditional Spark deployments and modern Spark Connect architectures.
3
4
## Package Information
5
6
- **Package Name**: pytest-spark
7
- **Language**: Python
8
- **Installation**: `pip install pytest-spark`
9
- **Plugin Entry Point**: Automatically discovered by pytest as 'spark' plugin
10
11
## Core Imports
12
13
```python
14
import pytest
15
```
16
17
The plugin automatically registers fixtures when installed:
18
19
```python
20
def test_my_case(spark_context):
21
# spark_context fixture available automatically
22
pass
23
24
def test_spark_session_dataframe(spark_session):
25
# spark_session fixture available automatically
26
pass
27
```
28
29
## Basic Usage
30
31
```python
32
# Example test using spark_context fixture
33
def test_rdd_operations(spark_context):
34
test_rdd = spark_context.parallelize([1, 2, 3, 4])
35
result = test_rdd.map(lambda x: x * 2).collect()
36
assert result == [2, 4, 6, 8]
37
38
# Example test using spark_session fixture (Spark 2.0+)
39
def test_dataframe_operations(spark_session):
40
test_df = spark_session.createDataFrame([[1, 3], [2, 4]], "a: int, b: int")
41
result = test_df.select("a").collect()
42
assert len(result) == 2
43
```
44
45
## Configuration
46
47
### Command Line Options
48
49
```bash
50
# Specify Spark installation directory
51
pytest --spark_home=/opt/spark
52
53
# Specify Spark Connect server URL
54
pytest --spark_connect_url=sc://localhost:15002
55
```
56
57
### pytest.ini Configuration
58
59
```ini
60
[pytest]
61
spark_home = /opt/spark
62
spark_connect_url = sc://localhost:15002
63
spark_options =
64
spark.app.name: my-pytest-spark-tests
65
spark.executor.instances: 1
66
spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0
67
```
68
69
## Capabilities
70
71
### Spark Context Fixture
72
73
Creates a SparkContext instance with reduced logging that persists across the entire test session.
74
75
```python { .api }
76
@pytest.fixture(scope='session')
77
def spark_context(_spark_session):
78
"""
79
Return a SparkContext instance with reduced logging (session scope).
80
81
Note: Not supported with Spark Connect functionality.
82
83
Returns:
84
SparkContext: Configured SparkContext instance
85
86
Raises:
87
NotImplemented: If used in Spark Connect mode
88
"""
89
```
90
91
### Spark Session Fixture
92
93
Creates a Hive-enabled SparkSession instance with reduced logging that persists across the entire test session.
94
95
```python { .api }
96
@pytest.fixture(scope='session')
97
def spark_session(_spark_session):
98
"""
99
Return a Hive enabled SparkSession instance with reduced logging (session scope).
100
101
Available from Spark 2.0 onwards.
102
103
Returns:
104
SparkSession: Configured SparkSession instance with Hive support
105
106
Raises:
107
Exception: If used with Spark versions < 2.0
108
"""
109
```
110
111
112
### Pytest Integration Hooks
113
114
Integration hooks that pytest automatically calls to configure Spark support.
115
116
```python { .api }
117
def pytest_addoption(parser):
118
"""
119
Add command-line and ini options for spark configuration.
120
121
Args:
122
parser: pytest argument parser
123
"""
124
125
def pytest_configure(config):
126
"""
127
Configure Spark based on pytest configuration.
128
129
Args:
130
config: pytest configuration object
131
"""
132
133
def pytest_report_header(config):
134
"""
135
Add Spark version and configuration to pytest report header.
136
137
Args:
138
config: pytest configuration object
139
140
Returns:
141
str: Header lines with Spark information
142
"""
143
```
144
145
## Spark Connect Support
146
147
For remote Spark server execution (requires Spark 3.4+ with pyspark[connect] or pyspark-connect):
148
149
- Supports Spark Connect mode for remote server execution
150
- Automatically disables incompatible configuration options in Connect mode
151
- spark_context fixture raises NotImplemented error in Connect mode (RDD API not supported)
152
- spark_session fixture works normally with Connect servers
153
154
### Spark Connect Configuration
155
156
```ini
157
[pytest]
158
spark_connect_url = sc://remote-spark-server:15002
159
```
160
161
Or via environment variable:
162
```bash
163
export SPARK_REMOTE=sc://remote-spark-server:15002
164
```
165
166
## Default Configuration
167
168
The plugin provides optimized defaults for testing environments that minimize resource usage while maintaining functionality:
169
170
```python { .api }
171
DEFAULTS = {
172
'spark.app.name': 'pytest-spark',
173
'spark.default.parallelism': 1,
174
'spark.dynamicAllocation.enabled': 'false',
175
'spark.executor.cores': 1,
176
'spark.executor.instances': 1,
177
'spark.io.compression.codec': 'lz4',
178
'spark.rdd.compress': 'false',
179
'spark.sql.shuffle.partitions': 1,
180
'spark.shuffle.compress': 'false',
181
'spark.sql.catalogImplementation': 'hive'
182
}
183
```
184
185
These can be overridden via spark_options in pytest.ini.
186
187
## Error Handling
188
189
Common exceptions and error conditions:
190
191
- **OSError**: Raised if specified SPARK_HOME path doesn't exist
192
- **Exception**: Raised if spark_session fixture used with unsupported Spark version (< 2.0)
193
- **NotImplemented**: Raised if spark_context used in Spark Connect mode
194
- **ImportError**: Handled gracefully when pyspark components are unavailable
195
196
## Dependencies
197
198
- **pytest**: Core testing framework (required)
199
- **findspark**: Spark installation discovery (required)
200
- **pyspark**: Apache Spark Python API (runtime dependency, must be available)
201
- **pyspark[connect]** or **pyspark-connect**: For Spark Connect functionality (optional)