A decoupled client-server architecture component for Apache Spark that enables remote connectivity to Spark clusters using the DataFrame API and gRPC protocol.
npx @tessl/cli install tessl/maven-org-apache-spark--spark-connect_2-13@3.5.00
# Apache Spark Connect Server
1
2
Apache Spark Connect Server provides a decoupled client-server architecture that enables remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The server acts as a gRPC service that receives requests from Spark Connect clients and executes them on the Spark cluster, enabling Spark to be leveraged from various environments including modern data applications, IDEs, notebooks, and different programming languages.
3
4
## Package Information
5
6
- **Package Name**: spark-connect_2.13
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Group ID**: org.apache.spark
10
- **Installation**: Add dependency to your build.sbt or pom.xml
11
12
Maven:
13
```xml
14
<dependency>
15
<groupId>org.apache.spark</groupId>
16
<artifactId>spark-connect_2.13</artifactId>
17
<version>3.5.6</version>
18
</dependency>
19
```
20
21
SBT:
22
```scala
23
libraryDependencies += "org.apache.spark" %% "spark-connect" % "3.5.6"
24
```
25
26
## Core Imports
27
28
```scala
29
import org.apache.spark.sql.connect.service.{SparkConnectService, SparkConnectServer}
30
import org.apache.spark.sql.connect.plugin.{RelationPlugin, ExpressionPlugin, CommandPlugin}
31
import org.apache.spark.sql.connect.planner.SparkConnectPlanner
32
import org.apache.spark.sql.connect.config.Connect
33
```
34
35
## Basic Usage
36
37
### Starting the Server
38
39
```scala
40
import org.apache.spark.sql.SparkSession
41
import org.apache.spark.sql.connect.service.SparkConnectService
42
43
// Start Spark session
44
val session = SparkSession.builder.getOrCreate()
45
46
// Start Connect server
47
SparkConnectService.start(session.sparkContext)
48
49
// Server is now listening on configured port (default: 15002)
50
```
51
52
### Standalone Server Application
53
54
```scala
55
import org.apache.spark.sql.connect.service.SparkConnectServer
56
57
// Run standalone server
58
SparkConnectServer.main(Array.empty)
59
```
60
61
## Architecture
62
63
The Spark Connect Server architecture consists of several key layers:
64
65
- **gRPC Service Layer**: Handles client requests via protocol buffer definitions
66
- **Request Handlers**: Process different types of requests (execute, analyze, artifacts, etc.)
67
- **Planning Layer**: Converts protocol buffer plans to Catalyst logical plans
68
- **Plugin System**: Extensible architecture for custom functionality
69
- **Session Management**: Manages client sessions and execution state
70
- **Artifact Management**: Handles JAR uploads and dynamic class loading
71
- **Monitoring & UI**: Web interface for server monitoring and debugging
72
73
## Capabilities
74
75
### Server Management and Configuration
76
77
Core server functionality including startup, configuration, and lifecycle management.
78
79
```scala { .api }
80
object SparkConnectService {
81
def start(sc: SparkContext): Unit
82
def stop(timeout: Option[Long] = None, unit: Option[TimeUnit] = None): Unit
83
}
84
85
object SparkConnectServer {
86
def main(args: Array[String]): Unit
87
}
88
```
89
90
[Server Management](./server-management.md)
91
92
### Plugin System
93
94
Extensible plugin architecture for custom relations, expressions, and commands.
95
96
```scala { .api }
97
trait RelationPlugin {
98
def transform(relation: com.google.protobuf.Any, planner: SparkConnectPlanner): Option[LogicalPlan]
99
}
100
101
trait ExpressionPlugin {
102
def transform(expression: com.google.protobuf.Any, planner: SparkConnectPlanner): Option[Expression]
103
}
104
105
trait CommandPlugin {
106
def process(command: com.google.protobuf.Any, planner: SparkConnectPlanner): Option[Unit]
107
}
108
```
109
110
[Plugin System](./plugin-system.md)
111
112
### Plan Processing and Execution
113
114
Convert protocol buffer plans to Catalyst plans and manage execution lifecycle.
115
116
```scala { .api }
117
class SparkConnectPlanner(sessionHolder: SessionHolder) {
118
def transformRelation(rel: proto.Relation): LogicalPlan
119
def transformExpression(exp: proto.Expression): Expression
120
def process(command: proto.Command, responseObserver: StreamObserver[ExecutePlanResponse], executeHolder: ExecuteHolder): Unit
121
}
122
```
123
124
[Plan Processing](./plan-processing.md)
125
126
### Session and State Management
127
128
Manage client sessions, execution state, and concurrent operations.
129
130
```scala { .api }
131
object SparkConnectService {
132
def getOrCreateIsolatedSession(userId: String, sessionId: String): SessionHolder
133
def getIsolatedSession(userId: String, sessionId: String): SessionHolder
134
def listActiveExecutions: Either[Long, Seq[ExecuteInfo]]
135
}
136
```
137
138
[Session Management](./session-management.md)
139
140
### Artifact Management
141
142
Handle JAR uploads, file management, and dynamic class loading for user code.
143
144
```scala { .api }
145
class SparkConnectArtifactManager(sessionHolder: SessionHolder) {
146
def getSparkConnectAddedJars: Seq[URL]
147
def getSparkConnectPythonIncludes: Seq[String]
148
def classloader: ClassLoader
149
}
150
```
151
152
[Artifact Management](./artifact-management.md)
153
154
### Monitoring and Web UI
155
156
Web interface components for server monitoring, session tracking, and debugging.
157
158
```scala { .api }
159
class SparkConnectServerTab(sparkContext: SparkContext, store: SparkConnectServerAppStatusStore, appName: String) {
160
def detach(): Unit
161
def displayOrder: Int
162
}
163
```
164
165
[Monitoring and UI](./monitoring-ui.md)
166
167
### Configuration System
168
169
Comprehensive configuration options for server behavior, security, and performance tuning.
170
171
```scala { .api }
172
object Connect {
173
val CONNECT_GRPC_BINDING_PORT: ConfigEntry[Int]
174
val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE: ConfigEntry[Int]
175
val CONNECT_EXECUTE_REATTACHABLE_ENABLED: ConfigEntry[Boolean]
176
// ... additional configuration entries
177
}
178
```
179
180
[Configuration](./configuration.md)
181
182
## Error Handling
183
184
The server uses centralized error handling through the ErrorUtils object, which converts Spark exceptions to appropriate gRPC status codes and error messages for client consumption.
185
186
## Security Considerations
187
188
- Configure authentication and authorization via Spark security settings
189
- Use TLS for encrypted communication between clients and server
190
- Implement custom interceptors for request validation and logging
191
- Manage artifacts securely with proper sandboxing and validation
192
193
## Performance and Scalability
194
195
- Supports concurrent client sessions with isolated execution contexts
196
- Implements reattachable executions for fault tolerance
197
- Provides streaming responses for large result sets
198
- Configurable resource limits and timeouts
199
- Efficient artifact caching and class loading