Ganglia integration module for Apache Spark metrics system
npx @tessl/cli install tessl/maven-org-apache-spark--spark-ganglia-lgpl_2-11@2.4.00
# Spark Ganglia LGPL
1
2
Apache Spark metrics sink for Ganglia monitoring system integration. This module provides a metrics sink that reports Spark application performance metrics to Ganglia clusters, enabling comprehensive monitoring and visualization of Spark applications through existing Ganglia infrastructure.
3
4
## Package Information
5
6
- **Package Name**: spark-ganglia-lgpl_2.11
7
- **Package Type**: Maven
8
- **Language**: Scala
9
- **Group ID**: org.apache.spark
10
- **Artifact ID**: spark-ganglia-lgpl_2.11
11
- **Version**: 2.4.8
12
- **Installation**: Add dependency to your project's pom.xml or include in Spark classpath
13
14
### Maven Dependency
15
16
```xml
17
<dependency>
18
<groupId>org.apache.spark</groupId>
19
<artifactId>spark-ganglia-lgpl_2.11</artifactId>
20
<version>2.4.8</version>
21
</dependency>
22
```
23
24
## Core Imports
25
26
```scala
27
import org.apache.spark.metrics.sink.GangliaSink
28
```
29
30
Required for configuration:
31
32
```scala
33
import java.util.Properties
34
import com.codahale.metrics.MetricRegistry
35
import org.apache.spark.{SecurityManager, SparkConf}
36
```
37
38
## Basic Usage
39
40
```scala
41
import org.apache.spark.metrics.sink.GangliaSink
42
import java.util.Properties
43
import com.codahale.metrics.MetricRegistry
44
import org.apache.spark.{SecurityManager, SparkConf}
45
46
// Set up configuration properties
47
val properties = new Properties()
48
properties.setProperty("host", "ganglia.example.com")
49
properties.setProperty("port", "8649")
50
properties.setProperty("period", "10")
51
properties.setProperty("unit", "SECONDS")
52
properties.setProperty("mode", "MULTICAST")
53
properties.setProperty("ttl", "1")
54
55
// Create metrics registry and security manager
56
val registry = new MetricRegistry()
57
val sparkConf = new SparkConf()
58
val securityMgr = new SecurityManager(sparkConf)
59
60
// Create and start the Ganglia sink
61
val gangliaSink = new GangliaSink(properties, registry, securityMgr)
62
gangliaSink.start()
63
64
// The sink will now automatically report metrics to Ganglia
65
// Stop when done
66
gangliaSink.stop()
67
```
68
69
## Configuration
70
71
The GangliaSink requires specific configuration properties to connect to a Ganglia cluster:
72
73
### Required Properties
74
- **host**: Ganglia server hostname or IP address
75
- **port**: Ganglia server port number (typically 8649)
76
77
### Optional Properties
78
- **period**: Reporting interval (default: 10)
79
- **unit**: Time unit for reporting interval (default: SECONDS)
80
- **mode**: UDP addressing mode - MULTICAST or UNICAST (default: MULTICAST)
81
- **ttl**: Time-to-live for multicast messages (default: 1)
82
- **dmax**: Dmax parameter for Ganglia (default: 0)
83
84
## Capabilities
85
86
### Metrics Sink
87
88
Core metrics sink functionality for reporting Spark metrics to Ganglia monitoring system.
89
90
```scala { .api }
91
class GangliaSink(
92
val property: Properties,
93
val registry: MetricRegistry,
94
securityMgr: SecurityManager
95
) extends Sink
96
```
97
98
### Lifecycle Management
99
100
Control the reporting lifecycle of the Ganglia sink.
101
102
```scala { .api }
103
def start(): Unit
104
def stop(): Unit
105
def report(): Unit
106
```
107
108
### Configuration Utility
109
110
Helper method for accessing configuration properties safely.
111
112
```scala { .api }
113
def propertyToOption(prop: String): Option[String]
114
```
115
116
### Configuration Constants
117
118
Predefined configuration keys and default values for Ganglia sink setup.
119
120
```scala { .api }
121
val GANGLIA_KEY_PERIOD: String
122
val GANGLIA_DEFAULT_PERIOD: Int
123
val GANGLIA_KEY_UNIT: String
124
val GANGLIA_DEFAULT_UNIT: TimeUnit
125
val GANGLIA_KEY_MODE: String
126
val GANGLIA_DEFAULT_MODE: UDPAddressingMode
127
val GANGLIA_KEY_TTL: String
128
val GANGLIA_DEFAULT_TTL: Int
129
val GANGLIA_KEY_HOST: String
130
val GANGLIA_KEY_PORT: String
131
val GANGLIA_KEY_DMAX: String
132
val GANGLIA_DEFAULT_DMAX: Int
133
```
134
135
### Runtime Configuration Properties
136
137
Configuration values extracted from properties during initialization.
138
139
```scala { .api }
140
val host: String
141
val port: Int
142
val ttl: Int
143
val dmax: Int
144
val mode: UDPAddressingMode
145
val pollPeriod: Int
146
val pollUnit: TimeUnit
147
```
148
149
### Internal Components
150
151
Core Ganglia integration components managed by the sink.
152
153
```scala { .api }
154
val ganglia: GMetric
155
val reporter: GangliaReporter
156
```
157
158
## Types
159
160
### From Spark Core
161
162
```scala { .api }
163
private[spark] trait Sink {
164
def start(): Unit
165
def stop(): Unit
166
def report(): Unit
167
}
168
169
class SecurityManager(sparkConf: SparkConf)
170
171
class SparkConf {
172
def this()
173
}
174
```
175
176
### From Dropwizard Metrics
177
178
```scala { .api }
179
class MetricRegistry
180
181
class GangliaReporter {
182
def start(period: Long, unit: TimeUnit): Unit
183
def stop(): Unit
184
def report(): Unit
185
}
186
```
187
188
### From gmetric4j
189
190
```scala { .api }
191
class GMetric(host: String, port: Int, mode: UDPAddressingMode, ttl: Int)
192
193
enum UDPAddressingMode {
194
MULTICAST, UNICAST
195
}
196
```
197
198
### From Java Standard Library
199
200
```scala { .api }
201
class Properties {
202
def getProperty(key: String): String
203
def setProperty(key: String, value: String): Object
204
}
205
206
enum TimeUnit {
207
NANOSECONDS, MICROSECONDS, MILLISECONDS, SECONDS, MINUTES, HOURS, DAYS
208
}
209
```
210
211
## Error Handling
212
213
The GangliaSink throws exceptions during initialization for invalid configuration:
214
215
- **Exception**: Thrown if 'host' property is not provided
216
- **Exception**: Thrown if 'port' property is not provided
217
- **Validation**: Polling period is validated using `MetricsSystem.checkMinimalPollingPeriod` to ensure minimum acceptable reporting intervals
218
219
## Integration Notes
220
221
- This module is distributed separately from core Spark due to LGPL licensing requirements from Ganglia dependencies
222
- Include this module only when LGPL licensing terms are acceptable for your deployment
223
- The sink integrates with Spark's existing metrics system and automatically reports configured metrics
224
- Supports both multicast and unicast modes for different network configurations
225
- TTL setting is important for multicast mode - must be at least the number of network hops to reach listeners