or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-ganglia-lgpl_2-11

Ganglia integration module for Apache Spark metrics system

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-ganglia-lgpl_2.11@2.4.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-ganglia-lgpl_2-11@2.4.0

0

# Spark Ganglia LGPL

1

2

Apache Spark metrics sink for Ganglia monitoring system integration. This module provides a metrics sink that reports Spark application performance metrics to Ganglia clusters, enabling comprehensive monitoring and visualization of Spark applications through existing Ganglia infrastructure.

3

4

## Package Information

5

6

- **Package Name**: spark-ganglia-lgpl_2.11

7

- **Package Type**: Maven

8

- **Language**: Scala

9

- **Group ID**: org.apache.spark

10

- **Artifact ID**: spark-ganglia-lgpl_2.11

11

- **Version**: 2.4.8

12

- **Installation**: Add dependency to your project's pom.xml or include in Spark classpath

13

14

### Maven Dependency

15

16

```xml

17

<dependency>

18

<groupId>org.apache.spark</groupId>

19

<artifactId>spark-ganglia-lgpl_2.11</artifactId>

20

<version>2.4.8</version>

21

</dependency>

22

```

23

24

## Core Imports

25

26

```scala

27

import org.apache.spark.metrics.sink.GangliaSink

28

```

29

30

Required for configuration:

31

32

```scala

33

import java.util.Properties

34

import com.codahale.metrics.MetricRegistry

35

import org.apache.spark.{SecurityManager, SparkConf}

36

```

37

38

## Basic Usage

39

40

```scala

41

import org.apache.spark.metrics.sink.GangliaSink

42

import java.util.Properties

43

import com.codahale.metrics.MetricRegistry

44

import org.apache.spark.{SecurityManager, SparkConf}

45

46

// Set up configuration properties

47

val properties = new Properties()

48

properties.setProperty("host", "ganglia.example.com")

49

properties.setProperty("port", "8649")

50

properties.setProperty("period", "10")

51

properties.setProperty("unit", "SECONDS")

52

properties.setProperty("mode", "MULTICAST")

53

properties.setProperty("ttl", "1")

54

55

// Create metrics registry and security manager

56

val registry = new MetricRegistry()

57

val sparkConf = new SparkConf()

58

val securityMgr = new SecurityManager(sparkConf)

59

60

// Create and start the Ganglia sink

61

val gangliaSink = new GangliaSink(properties, registry, securityMgr)

62

gangliaSink.start()

63

64

// The sink will now automatically report metrics to Ganglia

65

// Stop when done

66

gangliaSink.stop()

67

```

68

69

## Configuration

70

71

The GangliaSink requires specific configuration properties to connect to a Ganglia cluster:

72

73

### Required Properties

74

- **host**: Ganglia server hostname or IP address

75

- **port**: Ganglia server port number (typically 8649)

76

77

### Optional Properties

78

- **period**: Reporting interval (default: 10)

79

- **unit**: Time unit for reporting interval (default: SECONDS)

80

- **mode**: UDP addressing mode - MULTICAST or UNICAST (default: MULTICAST)

81

- **ttl**: Time-to-live for multicast messages (default: 1)

82

- **dmax**: Dmax parameter for Ganglia (default: 0)

83

84

## Capabilities

85

86

### Metrics Sink

87

88

Core metrics sink functionality for reporting Spark metrics to Ganglia monitoring system.

89

90

```scala { .api }

91

class GangliaSink(

92

val property: Properties,

93

val registry: MetricRegistry,

94

securityMgr: SecurityManager

95

) extends Sink

96

```

97

98

### Lifecycle Management

99

100

Control the reporting lifecycle of the Ganglia sink.

101

102

```scala { .api }

103

def start(): Unit

104

def stop(): Unit

105

def report(): Unit

106

```

107

108

### Configuration Utility

109

110

Helper method for accessing configuration properties safely.

111

112

```scala { .api }

113

def propertyToOption(prop: String): Option[String]

114

```

115

116

### Configuration Constants

117

118

Predefined configuration keys and default values for Ganglia sink setup.

119

120

```scala { .api }

121

val GANGLIA_KEY_PERIOD: String

122

val GANGLIA_DEFAULT_PERIOD: Int

123

val GANGLIA_KEY_UNIT: String

124

val GANGLIA_DEFAULT_UNIT: TimeUnit

125

val GANGLIA_KEY_MODE: String

126

val GANGLIA_DEFAULT_MODE: UDPAddressingMode

127

val GANGLIA_KEY_TTL: String

128

val GANGLIA_DEFAULT_TTL: Int

129

val GANGLIA_KEY_HOST: String

130

val GANGLIA_KEY_PORT: String

131

val GANGLIA_KEY_DMAX: String

132

val GANGLIA_DEFAULT_DMAX: Int

133

```

134

135

### Runtime Configuration Properties

136

137

Configuration values extracted from properties during initialization.

138

139

```scala { .api }

140

val host: String

141

val port: Int

142

val ttl: Int

143

val dmax: Int

144

val mode: UDPAddressingMode

145

val pollPeriod: Int

146

val pollUnit: TimeUnit

147

```

148

149

### Internal Components

150

151

Core Ganglia integration components managed by the sink.

152

153

```scala { .api }

154

val ganglia: GMetric

155

val reporter: GangliaReporter

156

```

157

158

## Types

159

160

### From Spark Core

161

162

```scala { .api }

163

private[spark] trait Sink {

164

def start(): Unit

165

def stop(): Unit

166

def report(): Unit

167

}

168

169

class SecurityManager(sparkConf: SparkConf)

170

171

class SparkConf {

172

def this()

173

}

174

```

175

176

### From Dropwizard Metrics

177

178

```scala { .api }

179

class MetricRegistry

180

181

class GangliaReporter {

182

def start(period: Long, unit: TimeUnit): Unit

183

def stop(): Unit

184

def report(): Unit

185

}

186

```

187

188

### From gmetric4j

189

190

```scala { .api }

191

class GMetric(host: String, port: Int, mode: UDPAddressingMode, ttl: Int)

192

193

enum UDPAddressingMode {

194

MULTICAST, UNICAST

195

}

196

```

197

198

### From Java Standard Library

199

200

```scala { .api }

201

class Properties {

202

def getProperty(key: String): String

203

def setProperty(key: String, value: String): Object

204

}

205

206

enum TimeUnit {

207

NANOSECONDS, MICROSECONDS, MILLISECONDS, SECONDS, MINUTES, HOURS, DAYS

208

}

209

```

210

211

## Error Handling

212

213

The GangliaSink throws exceptions during initialization for invalid configuration:

214

215

- **Exception**: Thrown if 'host' property is not provided

216

- **Exception**: Thrown if 'port' property is not provided

217

- **Validation**: Polling period is validated using `MetricsSystem.checkMinimalPollingPeriod` to ensure minimum acceptable reporting intervals

218

219

## Integration Notes

220

221

- This module is distributed separately from core Spark due to LGPL licensing requirements from Ganglia dependencies

222

- Include this module only when LGPL licensing terms are acceptable for your deployment

223

- The sink integrates with Spark's existing metrics system and automatically reports configured metrics

224

- Supports both multicast and unicast modes for different network configurations

225

- TTL setting is important for multicast mode - must be at least the number of network hops to reach listeners