or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-connect_2-13

A decoupled client-server architecture component for Apache Spark that enables remote connectivity to Spark clusters using the DataFrame API and gRPC protocol.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-connect_2.13@3.5.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-connect_2-13@3.5.0

0

# Apache Spark Connect Server

1

2

Apache Spark Connect Server provides a decoupled client-server architecture that enables remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The server acts as a gRPC service that receives requests from Spark Connect clients and executes them on the Spark cluster, enabling Spark to be leveraged from various environments including modern data applications, IDEs, notebooks, and different programming languages.

3

4

## Package Information

5

6

- **Package Name**: spark-connect_2.13

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Group ID**: org.apache.spark

10

- **Installation**: Add dependency to your build.sbt or pom.xml

11

12

Maven:

13

```xml

14

<dependency>

15

<groupId>org.apache.spark</groupId>

16

<artifactId>spark-connect_2.13</artifactId>

17

<version>3.5.6</version>

18

</dependency>

19

```

20

21

SBT:

22

```scala

23

libraryDependencies += "org.apache.spark" %% "spark-connect" % "3.5.6"

24

```

25

26

## Core Imports

27

28

```scala

29

import org.apache.spark.sql.connect.service.{SparkConnectService, SparkConnectServer}

30

import org.apache.spark.sql.connect.plugin.{RelationPlugin, ExpressionPlugin, CommandPlugin}

31

import org.apache.spark.sql.connect.planner.SparkConnectPlanner

32

import org.apache.spark.sql.connect.config.Connect

33

```

34

35

## Basic Usage

36

37

### Starting the Server

38

39

```scala

40

import org.apache.spark.sql.SparkSession

41

import org.apache.spark.sql.connect.service.SparkConnectService

42

43

// Start Spark session

44

val session = SparkSession.builder.getOrCreate()

45

46

// Start Connect server

47

SparkConnectService.start(session.sparkContext)

48

49

// Server is now listening on configured port (default: 15002)

50

```

51

52

### Standalone Server Application

53

54

```scala

55

import org.apache.spark.sql.connect.service.SparkConnectServer

56

57

// Run standalone server

58

SparkConnectServer.main(Array.empty)

59

```

60

61

## Architecture

62

63

The Spark Connect Server architecture consists of several key layers:

64

65

- **gRPC Service Layer**: Handles client requests via protocol buffer definitions

66

- **Request Handlers**: Process different types of requests (execute, analyze, artifacts, etc.)

67

- **Planning Layer**: Converts protocol buffer plans to Catalyst logical plans

68

- **Plugin System**: Extensible architecture for custom functionality

69

- **Session Management**: Manages client sessions and execution state

70

- **Artifact Management**: Handles JAR uploads and dynamic class loading

71

- **Monitoring & UI**: Web interface for server monitoring and debugging

72

73

## Capabilities

74

75

### Server Management and Configuration

76

77

Core server functionality including startup, configuration, and lifecycle management.

78

79

```scala { .api }

80

object SparkConnectService {

81

def start(sc: SparkContext): Unit

82

def stop(timeout: Option[Long] = None, unit: Option[TimeUnit] = None): Unit

83

}

84

85

object SparkConnectServer {

86

def main(args: Array[String]): Unit

87

}

88

```

89

90

[Server Management](./server-management.md)

91

92

### Plugin System

93

94

Extensible plugin architecture for custom relations, expressions, and commands.

95

96

```scala { .api }

97

trait RelationPlugin {

98

def transform(relation: com.google.protobuf.Any, planner: SparkConnectPlanner): Option[LogicalPlan]

99

}

100

101

trait ExpressionPlugin {

102

def transform(expression: com.google.protobuf.Any, planner: SparkConnectPlanner): Option[Expression]

103

}

104

105

trait CommandPlugin {

106

def process(command: com.google.protobuf.Any, planner: SparkConnectPlanner): Option[Unit]

107

}

108

```

109

110

[Plugin System](./plugin-system.md)

111

112

### Plan Processing and Execution

113

114

Convert protocol buffer plans to Catalyst plans and manage execution lifecycle.

115

116

```scala { .api }

117

class SparkConnectPlanner(sessionHolder: SessionHolder) {

118

def transformRelation(rel: proto.Relation): LogicalPlan

119

def transformExpression(exp: proto.Expression): Expression

120

def process(command: proto.Command, responseObserver: StreamObserver[ExecutePlanResponse], executeHolder: ExecuteHolder): Unit

121

}

122

```

123

124

[Plan Processing](./plan-processing.md)

125

126

### Session and State Management

127

128

Manage client sessions, execution state, and concurrent operations.

129

130

```scala { .api }

131

object SparkConnectService {

132

def getOrCreateIsolatedSession(userId: String, sessionId: String): SessionHolder

133

def getIsolatedSession(userId: String, sessionId: String): SessionHolder

134

def listActiveExecutions: Either[Long, Seq[ExecuteInfo]]

135

}

136

```

137

138

[Session Management](./session-management.md)

139

140

### Artifact Management

141

142

Handle JAR uploads, file management, and dynamic class loading for user code.

143

144

```scala { .api }

145

class SparkConnectArtifactManager(sessionHolder: SessionHolder) {

146

def getSparkConnectAddedJars: Seq[URL]

147

def getSparkConnectPythonIncludes: Seq[String]

148

def classloader: ClassLoader

149

}

150

```

151

152

[Artifact Management](./artifact-management.md)

153

154

### Monitoring and Web UI

155

156

Web interface components for server monitoring, session tracking, and debugging.

157

158

```scala { .api }

159

class SparkConnectServerTab(sparkContext: SparkContext, store: SparkConnectServerAppStatusStore, appName: String) {

160

def detach(): Unit

161

def displayOrder: Int

162

}

163

```

164

165

[Monitoring and UI](./monitoring-ui.md)

166

167

### Configuration System

168

169

Comprehensive configuration options for server behavior, security, and performance tuning.

170

171

```scala { .api }

172

object Connect {

173

val CONNECT_GRPC_BINDING_PORT: ConfigEntry[Int]

174

val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE: ConfigEntry[Int]

175

val CONNECT_EXECUTE_REATTACHABLE_ENABLED: ConfigEntry[Boolean]

176

// ... additional configuration entries

177

}

178

```

179

180

[Configuration](./configuration.md)

181

182

## Error Handling

183

184

The server uses centralized error handling through the ErrorUtils object, which converts Spark exceptions to appropriate gRPC status codes and error messages for client consumption.

185

186

## Security Considerations

187

188

- Configure authentication and authorization via Spark security settings

189

- Use TLS for encrypted communication between clients and server

190

- Implement custom interceptors for request validation and logging

191

- Manage artifacts securely with proper sandboxing and validation

192

193

## Performance and Scalability

194

195

- Supports concurrent client sessions with isolated execution contexts

196

- Implements reattachable executions for fault tolerance

197

- Provides streaming responses for large result sets

198

- Configurable resource limits and timeouts

199

- Efficient artifact caching and class loading