or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

artifact-management.md configuration.md index.md monitoring-ui.md plan-processing.md plugin-system.md server-management.md session-management.md

tile.json

tessl/maven-org-apache-spark--spark-connect_2-13

A decoupled client-server architecture component for Apache Spark that enables remote connectivity to Spark clusters using the DataFrame API and gRPC protocol.

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:maven/org.apache.spark/spark-connect_2.13@3.5.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-connect_2-13@3.5.0

Apache Spark Connect Server

Apache Spark Connect Server provides a decoupled client-server architecture that enables remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The server acts as a gRPC service that receives requests from Spark Connect clients and executes them on the Spark cluster, enabling Spark to be leveraged from various environments including modern data applications, IDEs, notebooks, and different programming languages.

Package Information

Package Name: spark-connect_2.13
Package Type: maven
Language: Scala
Group ID: org.apache.spark
Installation: Add dependency to your build.sbt or pom.xml

Maven:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-connect_2.13</artifactId>
    <version>3.5.6</version>
</dependency>

SBT:

libraryDependencies += "org.apache.spark" %% "spark-connect" % "3.5.6"

Core Imports

import org.apache.spark.sql.connect.service.{SparkConnectService, SparkConnectServer}
import org.apache.spark.sql.connect.plugin.{RelationPlugin, ExpressionPlugin, CommandPlugin}
import org.apache.spark.sql.connect.planner.SparkConnectPlanner
import org.apache.spark.sql.connect.config.Connect

Basic Usage

Starting the Server

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.connect.service.SparkConnectService

// Start Spark session
val session = SparkSession.builder.getOrCreate()

// Start Connect server
SparkConnectService.start(session.sparkContext)

// Server is now listening on configured port (default: 15002)

Standalone Server Application

import org.apache.spark.sql.connect.service.SparkConnectServer

// Run standalone server
SparkConnectServer.main(Array.empty)

Architecture

The Spark Connect Server architecture consists of several key layers:

gRPC Service Layer: Handles client requests via protocol buffer definitions
Request Handlers: Process different types of requests (execute, analyze, artifacts, etc.)
Planning Layer: Converts protocol buffer plans to Catalyst logical plans
Plugin System: Extensible architecture for custom functionality
Session Management: Manages client sessions and execution state
Artifact Management: Handles JAR uploads and dynamic class loading
Monitoring & UI: Web interface for server monitoring and debugging

Capabilities

Server Management and Configuration

Core server functionality including startup, configuration, and lifecycle management.

object SparkConnectService {
  def start(sc: SparkContext): Unit
  def stop(timeout: Option[Long] = None, unit: Option[TimeUnit] = None): Unit
}

object SparkConnectServer {
  def main(args: Array[String]): Unit
}

Server Management

Plugin System

Extensible plugin architecture for custom relations, expressions, and commands.

trait RelationPlugin {
  def transform(relation: com.google.protobuf.Any, planner: SparkConnectPlanner): Option[LogicalPlan]
}

trait ExpressionPlugin {
  def transform(expression: com.google.protobuf.Any, planner: SparkConnectPlanner): Option[Expression]
}

trait CommandPlugin {
  def process(command: com.google.protobuf.Any, planner: SparkConnectPlanner): Option[Unit]
}

Plugin System

Plan Processing and Execution

Convert protocol buffer plans to Catalyst plans and manage execution lifecycle.

class SparkConnectPlanner(sessionHolder: SessionHolder) {
  def transformRelation(rel: proto.Relation): LogicalPlan
  def transformExpression(exp: proto.Expression): Expression
  def process(command: proto.Command, responseObserver: StreamObserver[ExecutePlanResponse], executeHolder: ExecuteHolder): Unit
}

Plan Processing

Session and State Management

Manage client sessions, execution state, and concurrent operations.

object SparkConnectService {
  def getOrCreateIsolatedSession(userId: String, sessionId: String): SessionHolder
  def getIsolatedSession(userId: String, sessionId: String): SessionHolder
  def listActiveExecutions: Either[Long, Seq[ExecuteInfo]]
}

Session Management

Artifact Management

Handle JAR uploads, file management, and dynamic class loading for user code.

class SparkConnectArtifactManager(sessionHolder: SessionHolder) {
  def getSparkConnectAddedJars: Seq[URL]
  def getSparkConnectPythonIncludes: Seq[String]
  def classloader: ClassLoader
}

Artifact Management

Monitoring and Web UI

Web interface components for server monitoring, session tracking, and debugging.

class SparkConnectServerTab(sparkContext: SparkContext, store: SparkConnectServerAppStatusStore, appName: String) {
  def detach(): Unit
  def displayOrder: Int
}

Monitoring and UI

Configuration System

Comprehensive configuration options for server behavior, security, and performance tuning.

object Connect {
  val CONNECT_GRPC_BINDING_PORT: ConfigEntry[Int]
  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE: ConfigEntry[Int]
  val CONNECT_EXECUTE_REATTACHABLE_ENABLED: ConfigEntry[Boolean]
  // ... additional configuration entries
}

Configuration

Error Handling

The server uses centralized error handling through the ErrorUtils object, which converts Spark exceptions to appropriate gRPC status codes and error messages for client consumption.

Security Considerations

Configure authentication and authorization via Spark security settings
Use TLS for encrypted communication between clients and server
Implement custom interceptors for request validation and logging
Manage artifacts securely with proper sandboxing and validation

Performance and Scalability

Supports concurrent client sessions with isolated execution contexts
Implements reattachable executions for fault tolerance
Provides streaming responses for large result sets
Configurable resource limits and timeouts
Efficient artifact caching and class loading

Version

Tile

Files

tessl/maven-org-apache-spark--spark-connect_2-13

To install, run

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Apache Spark Connect Server

Package Information

Core Imports

Basic Usage

Starting the Server

Standalone Server Application

Architecture

Capabilities

Server Management and Configuration

Plugin System

Plan Processing and Execution

Session and State Management

Artifact Management

Monitoring and Web UI

Configuration System

Error Handling

Security Considerations

Performance and Scalability

index.mddocs/