0
# Monitoring and Web UI
1
2
The Spark Connect Server provides comprehensive web-based monitoring and debugging capabilities through integration with Spark's web UI system. This includes server status, session monitoring, execution tracking, and performance metrics.
3
4
## Core UI Components
5
6
### SparkConnectServerTab
7
8
Main web UI tab for Connect server monitoring.
9
10
```scala { .api }
11
class SparkConnectServerTab(
12
store: SparkConnectServerAppStatusStore,
13
sparkUI: SparkUI
14
) extends SparkUITab(sparkUI, "connect") {
15
def detach(): Unit
16
def displayOrder: Int
17
}
18
```
19
20
**Key Methods:**
21
- `detach()`: Remove the tab from the Spark UI
22
- `displayOrder`: Determines tab ordering in the UI
23
24
### SparkConnectServerPage
25
26
Main server monitoring page showing overall server status.
27
28
```scala { .api }
29
class SparkConnectServerPage(
30
parent: SparkConnectServerTab,
31
store: SparkConnectServerAppStatusStore
32
) extends WebUIPage("") {
33
def render(request: HttpServletRequest): Seq[Node]
34
}
35
```
36
37
### SparkConnectServerSessionPage
38
39
Session-specific monitoring page with detailed session information.
40
41
```scala { .api }
42
class SparkConnectServerSessionPage(
43
parent: SparkConnectServerTab,
44
store: SparkConnectServerAppStatusStore
45
) extends WebUIPage("session") {
46
def render(request: HttpServletRequest): Seq[Node]
47
}
48
```
49
50
## Event Listening and Data Collection
51
52
### SparkConnectServerListener
53
54
Event listener that collects data for the web UI.
55
56
```scala { .api }
57
class SparkConnectServerListener(
58
store: SparkConnectServerAppStatusStore,
59
sparkConf: SparkConf
60
) extends SparkListener {
61
def onSessionStart(event: SparkConnectSessionStartEvent): Unit
62
def onSessionEnd(event: SparkConnectSessionEndEvent): Unit
63
def onExecutionStart(event: SparkConnectExecutionStartEvent): Unit
64
def onExecutionEnd(event: SparkConnectExecutionEndEvent): Unit
65
}
66
```
67
68
**Event Handlers:**
69
- `onSessionStart`: Record new session creation
70
- `onSessionEnd`: Record session termination
71
- `onExecutionStart`: Track new execution start
72
- `onExecutionEnd`: Record execution completion
73
74
### SparkConnectServerAppStatusStore
75
76
Data store for UI information with configurable retention policies.
77
78
```scala { .api }
79
class SparkConnectServerAppStatusStore(
80
sparkConf: SparkConf,
81
store: ElementTrackingStore
82
) {
83
def getSessionInfo(sessionId: String): Option[SessionInfo]
84
def getAllSessions: Seq[SessionInfo]
85
def getExecutionInfo(executeId: String): Option[ExecutionInfo]
86
def getActiveExecutions: Seq[ExecutionInfo]
87
def getServerMetrics: ServerMetrics
88
}
89
```
90
91
**Key Methods:**
92
- `getSessionInfo`: Get detailed information about a specific session
93
- `getAllSessions`: Get information about all sessions
94
- `getExecutionInfo`: Get details about a specific execution
95
- `getActiveExecutions`: Get all currently running executions
96
- `getServerMetrics`: Get overall server performance metrics
97
98
## History Server Integration
99
100
### SparkConnectServerHistoryServerPlugin
101
102
Plugin for Spark History Server integration.
103
104
```scala { .api }
105
class SparkConnectServerHistoryServerPlugin extends AppHistoryServerPlugin {
106
def createApplicationInfo(info: ApplicationInfo): ConnectApplicationInfo
107
def setupUI(ui: ApplicationHistoryUI): Unit
108
}
109
```
110
111
## Data Models
112
113
### Session Information
114
115
```scala { .api }
116
case class SessionInfo(
117
sessionId: String,
118
userId: String,
119
startTime: Long,
120
endTime: Option[Long],
121
executionCount: Int,
122
artifactCount: Int,
123
status: SessionStatus
124
)
125
126
sealed trait SessionStatus
127
case object SessionActive extends SessionStatus
128
case object SessionIdle extends SessionStatus
129
case object SessionClosed extends SessionStatus
130
```
131
132
### Execution Information
133
134
```scala { .api }
135
case class ExecutionInfo(
136
executeId: String,
137
sessionId: String,
138
userId: String,
139
startTime: Long,
140
endTime: Option[Long],
141
status: ExecutionStatus,
142
planType: String,
143
metrics: ExecutionMetrics
144
)
145
146
sealed trait ExecutionStatus
147
case object ExecutionRunning extends ExecutionStatus
148
case object ExecutionCompleted extends ExecutionStatus
149
case object ExecutionFailed extends ExecutionStatus
150
case object ExecutionCancelled extends ExecutionStatus
151
```
152
153
### Server Metrics
154
155
```scala { .api }
156
case class ServerMetrics(
157
uptime: Long,
158
totalSessions: Long,
159
activeSessions: Int,
160
totalExecutions: Long,
161
activeExecutions: Int,
162
totalArtifacts: Long,
163
memoryUsage: MemoryUsage,
164
requestRate: Double
165
)
166
167
case class MemoryUsage(
168
used: Long,
169
committed: Long,
170
max: Long
171
)
172
```
173
174
## Usage Examples
175
176
### Setting Up UI Monitoring
177
178
```scala
179
import org.apache.spark.sql.SparkSession
180
import org.apache.spark.sql.connect.ui.{SparkConnectServerTab, SparkConnectServerListener, SparkConnectServerAppStatusStore}
181
182
// Create Spark session with UI enabled
183
val spark = SparkSession.builder()
184
.appName("MyConnectApp")
185
.config("spark.ui.enabled", "true")
186
.config("spark.ui.port", "4040")
187
.getOrCreate()
188
189
// Set up Connect server UI components
190
val store = new SparkConnectServerAppStatusStore(spark.conf, elementStore)
191
val listener = new SparkConnectServerListener(store, spark.conf)
192
val tab = new SparkConnectServerTab(spark.sparkContext, store, "Connect Server")
193
194
// Register listener to collect events
195
spark.sparkContext.addSparkListener(listener)
196
197
// UI is now available at http://localhost:4040/connect/
198
```
199
200
### Accessing Server Metrics
201
202
```scala
203
import org.apache.spark.sql.connect.ui.SparkConnectServerAppStatusStore
204
205
// Get server metrics
206
val metrics = store.getServerMetrics
207
println(s"Server uptime: ${metrics.uptime / 1000} seconds")
208
println(s"Active sessions: ${metrics.activeSessions}")
209
println(s"Active executions: ${metrics.activeExecutions}")
210
println(s"Memory usage: ${metrics.memoryUsage.used / 1024 / 1024} MB")
211
```
212
213
### Monitoring Sessions
214
215
```scala
216
// Get all sessions
217
val allSessions = store.getAllSessions
218
allSessions.foreach { session =>
219
println(s"Session ${session.sessionId} (${session.userId}): ${session.status}")
220
println(s" Started: ${new Date(session.startTime)}")
221
println(s" Executions: ${session.executionCount}")
222
println(s" Artifacts: ${session.artifactCount}")
223
}
224
225
// Get specific session details
226
val sessionInfo = store.getSessionInfo("session123")
227
sessionInfo.foreach { info =>
228
println(s"Session details: $info")
229
}
230
```
231
232
### Tracking Executions
233
234
```scala
235
// Get active executions
236
val activeExecutions = store.getActiveExecutions
237
println(s"Currently running ${activeExecutions.length} executions")
238
239
activeExecutions.foreach { execution =>
240
val duration = System.currentTimeMillis() - execution.startTime
241
println(s"Execution ${execution.executeId}: ${execution.planType} (${duration}ms)")
242
}
243
244
// Get execution history
245
val executionInfo = store.getExecutionInfo("exec456")
246
executionInfo.foreach { info =>
247
println(s"Execution completed in ${info.endTime.get - info.startTime}ms")
248
println(s"Status: ${info.status}")
249
println(s"Metrics: ${info.metrics}")
250
}
251
```
252
253
## Web UI Features
254
255
### Server Overview Page
256
257
The main server page displays:
258
259
- **Server Status**: Uptime, version, configuration
260
- **Active Sessions**: Current session count and details
261
- **Execution Summary**: Running and completed execution statistics
262
- **Resource Usage**: Memory, CPU, and network metrics
263
- **Request Statistics**: Request rates and response times
264
265
### Session Detail Pages
266
267
Individual session pages show:
268
269
- **Session Information**: User, start time, duration, status
270
- **Execution History**: All executions within the session
271
- **Artifact Management**: Uploaded JARs and files
272
- **Configuration**: Session-specific Spark configuration
273
- **Streaming Queries**: Active streaming operations
274
- **Error History**: Any errors or exceptions encountered
275
276
### Execution Detail Pages
277
278
Execution detail pages include:
279
280
- **Plan Information**: Query plan type and complexity
281
- **Timing Metrics**: Start time, duration, stages
282
- **Data Metrics**: Rows processed, data size, partitions
283
- **Resource Usage**: CPU time, memory consumption
284
- **Error Details**: Stack traces and error context (if failed)
285
286
## Configuration Options
287
288
### UI Configuration
289
290
Key configuration parameters for monitoring:
291
292
```scala
293
// UI enablement and port
294
spark.ui.enabled=true
295
spark.ui.port=4040
296
297
// Connect-specific UI settings
298
spark.connect.ui.enabled=true
299
spark.connect.ui.retainedSessions=200
300
spark.connect.ui.retainedExecutions=1000
301
spark.connect.ui.retainedQueries=100
302
303
// History retention
304
spark.connect.ui.session.timeout=7d
305
spark.connect.ui.execution.timeout=1d
306
```
307
308
### Metrics Collection
309
310
```scala
311
// Event collection settings
312
spark.connect.ui.listener.enabled=true
313
spark.connect.ui.metrics.collection.interval=10s
314
spark.connect.ui.metrics.retention.period=24h
315
316
// Performance monitoring
317
spark.connect.ui.monitoring.detailed=true
318
spark.connect.ui.monitoring.query.plans=true
319
```
320
321
## Custom Monitoring
322
323
### Custom Event Listeners
324
325
```scala
326
import org.apache.spark.scheduler.SparkListener
327
import org.apache.spark.sql.connect.ui.SparkConnectServerAppStatusStore
328
329
class CustomConnectListener(store: SparkConnectServerAppStatusStore) extends SparkListener {
330
override def onSessionStart(event: SparkConnectSessionStartEvent): Unit = {
331
// Custom session start handling
332
logInfo(s"New Connect session: ${event.sessionId}")
333
334
// Update custom metrics
335
incrementSessionCounter()
336
337
// Send to external monitoring system
338
sendToMetricsSystem(event)
339
}
340
341
override def onExecutionEnd(event: SparkConnectExecutionEndEvent): Unit = {
342
// Track execution patterns
343
analyzeExecutionPattern(event)
344
345
// Update performance metrics
346
updatePerformanceStats(event)
347
}
348
}
349
```
350
351
### External Metrics Integration
352
353
```scala
354
// Integration with external monitoring systems
355
class ExternalMetricsReporter(store: SparkConnectServerAppStatusStore) {
356
def reportMetrics(): Unit = {
357
val metrics = store.getServerMetrics
358
359
// Send to Prometheus
360
prometheusRegistry.gauge("spark_connect_active_sessions").set(metrics.activeSessions)
361
prometheusRegistry.gauge("spark_connect_active_executions").set(metrics.activeExecutions)
362
363
// Send to DataDog
364
statsd.gauge("spark.connect.memory.used", metrics.memoryUsage.used)
365
statsd.gauge("spark.connect.request.rate", metrics.requestRate)
366
}
367
}
368
```
369
370
## Troubleshooting and Debugging
371
372
### Common UI Issues
373
374
- **UI Not Loading**: Check that `spark.ui.enabled=true` and port is accessible
375
- **Missing Data**: Verify listener is registered and event collection is enabled
376
- **Performance Issues**: Tune retention settings and collection intervals
377
- **Memory Usage**: Configure appropriate data retention limits
378
379
### Debug Information
380
381
The UI provides detailed debug information for:
382
383
- **Request Processing**: gRPC request/response details
384
- **Plan Conversion**: Protocol buffer to Catalyst plan conversion
385
- **Execution Stages**: Detailed execution phase timing
386
- **Error Context**: Full stack traces and error propagation
387
- **Resource Allocation**: Memory and CPU usage patterns
388
389
### Log Integration
390
391
The monitoring system integrates with Spark's logging:
392
393
```scala
394
// Log levels for debugging
395
spark.sql.connect.ui.logLevel=DEBUG
396
spark.sql.connect.listener.logLevel=INFO
397
398
// Custom log appenders for UI events
399
spark.sql.connect.ui.logAppender=UIEventAppender
400
```
401
402
## Performance Considerations
403
404
### Data Retention
405
406
- Configure appropriate retention limits to avoid memory issues
407
- Use time-based cleanup for old session and execution data
408
- Implement custom retention policies for different data types
409
410
### Collection Overhead
411
412
- Monitor the performance impact of event collection
413
- Tune collection intervals based on monitoring requirements
414
- Consider sampling for high-volume environments
415
416
### UI Responsiveness
417
418
- Optimize page rendering for large datasets
419
- Implement pagination for long lists
420
- Use asynchronous loading for detailed views