or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/maven-org-apache-spark--spark-ganglia-lgpl_2-11

Ganglia integration module for Apache Spark metrics system

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-ganglia-lgpl_2.11@2.4.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-ganglia-lgpl_2-11@2.4.0

index.mddocs/

Spark Ganglia LGPL

Apache Spark metrics sink for Ganglia monitoring system integration. This module provides a metrics sink that reports Spark application performance metrics to Ganglia clusters, enabling comprehensive monitoring and visualization of Spark applications through existing Ganglia infrastructure.

Package Information

  • Package Name: spark-ganglia-lgpl_2.11
  • Package Type: Maven
  • Language: Scala
  • Group ID: org.apache.spark
  • Artifact ID: spark-ganglia-lgpl_2.11
  • Version: 2.4.8
  • Installation: Add dependency to your project's pom.xml or include in Spark classpath

Maven Dependency

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-ganglia-lgpl_2.11</artifactId>
  <version>2.4.8</version>
</dependency>

Core Imports

import org.apache.spark.metrics.sink.GangliaSink

Required for configuration:

import java.util.Properties
import com.codahale.metrics.MetricRegistry
import org.apache.spark.{SecurityManager, SparkConf}

Basic Usage

import org.apache.spark.metrics.sink.GangliaSink
import java.util.Properties
import com.codahale.metrics.MetricRegistry
import org.apache.spark.{SecurityManager, SparkConf}

// Set up configuration properties
val properties = new Properties()
properties.setProperty("host", "ganglia.example.com")
properties.setProperty("port", "8649")
properties.setProperty("period", "10")
properties.setProperty("unit", "SECONDS")
properties.setProperty("mode", "MULTICAST")
properties.setProperty("ttl", "1")

// Create metrics registry and security manager
val registry = new MetricRegistry()
val sparkConf = new SparkConf()
val securityMgr = new SecurityManager(sparkConf)

// Create and start the Ganglia sink
val gangliaSink = new GangliaSink(properties, registry, securityMgr)
gangliaSink.start()

// The sink will now automatically report metrics to Ganglia
// Stop when done
gangliaSink.stop()

Configuration

The GangliaSink requires specific configuration properties to connect to a Ganglia cluster:

Required Properties

  • host: Ganglia server hostname or IP address
  • port: Ganglia server port number (typically 8649)

Optional Properties

  • period: Reporting interval (default: 10)
  • unit: Time unit for reporting interval (default: SECONDS)
  • mode: UDP addressing mode - MULTICAST or UNICAST (default: MULTICAST)
  • ttl: Time-to-live for multicast messages (default: 1)
  • dmax: Dmax parameter for Ganglia (default: 0)

Capabilities

Metrics Sink

Core metrics sink functionality for reporting Spark metrics to Ganglia monitoring system.

class GangliaSink(
  val property: Properties,
  val registry: MetricRegistry,
  securityMgr: SecurityManager
) extends Sink

Lifecycle Management

Control the reporting lifecycle of the Ganglia sink.

def start(): Unit
def stop(): Unit
def report(): Unit

Configuration Utility

Helper method for accessing configuration properties safely.

def propertyToOption(prop: String): Option[String]

Configuration Constants

Predefined configuration keys and default values for Ganglia sink setup.

val GANGLIA_KEY_PERIOD: String
val GANGLIA_DEFAULT_PERIOD: Int
val GANGLIA_KEY_UNIT: String
val GANGLIA_DEFAULT_UNIT: TimeUnit
val GANGLIA_KEY_MODE: String
val GANGLIA_DEFAULT_MODE: UDPAddressingMode
val GANGLIA_KEY_TTL: String
val GANGLIA_DEFAULT_TTL: Int
val GANGLIA_KEY_HOST: String
val GANGLIA_KEY_PORT: String
val GANGLIA_KEY_DMAX: String
val GANGLIA_DEFAULT_DMAX: Int

Runtime Configuration Properties

Configuration values extracted from properties during initialization.

val host: String
val port: Int
val ttl: Int
val dmax: Int
val mode: UDPAddressingMode
val pollPeriod: Int
val pollUnit: TimeUnit

Internal Components

Core Ganglia integration components managed by the sink.

val ganglia: GMetric
val reporter: GangliaReporter

Types

From Spark Core

private[spark] trait Sink {
  def start(): Unit
  def stop(): Unit
  def report(): Unit
}

class SecurityManager(sparkConf: SparkConf)

class SparkConf {
  def this()
}

From Dropwizard Metrics

class MetricRegistry

class GangliaReporter {
  def start(period: Long, unit: TimeUnit): Unit
  def stop(): Unit
  def report(): Unit
}

From gmetric4j

class GMetric(host: String, port: Int, mode: UDPAddressingMode, ttl: Int)

enum UDPAddressingMode {
  MULTICAST, UNICAST
}

From Java Standard Library

class Properties {
  def getProperty(key: String): String
  def setProperty(key: String, value: String): Object
}

enum TimeUnit {
  NANOSECONDS, MICROSECONDS, MILLISECONDS, SECONDS, MINUTES, HOURS, DAYS
}

Error Handling

The GangliaSink throws exceptions during initialization for invalid configuration:

  • Exception: Thrown if 'host' property is not provided
  • Exception: Thrown if 'port' property is not provided
  • Validation: Polling period is validated using MetricsSystem.checkMinimalPollingPeriod to ensure minimum acceptable reporting intervals

Integration Notes

  • This module is distributed separately from core Spark due to LGPL licensing requirements from Ganglia dependencies
  • Include this module only when LGPL licensing terms are acceptable for your deployment
  • The sink integrates with Spark's existing metrics system and automatically reports configured metrics
  • Supports both multicast and unicast modes for different network configurations
  • TTL setting is important for multicast mode - must be at least the number of network hops to reach listeners