CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-spark--spark-ganglia-lgpl-2-11

Ganglia integration module for Apache Spark metrics system

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

Spark Ganglia LGPL

Apache Spark metrics sink for Ganglia monitoring system integration. This module provides a metrics sink that reports Spark application performance metrics to Ganglia clusters, enabling comprehensive monitoring and visualization of Spark applications through existing Ganglia infrastructure.

Package Information

  • Package Name: spark-ganglia-lgpl_2.11
  • Package Type: Maven
  • Language: Scala
  • Group ID: org.apache.spark
  • Artifact ID: spark-ganglia-lgpl_2.11
  • Version: 2.4.8
  • Installation: Add dependency to your project's pom.xml or include in Spark classpath

Maven Dependency

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-ganglia-lgpl_2.11</artifactId>
  <version>2.4.8</version>
</dependency>

Core Imports

import org.apache.spark.metrics.sink.GangliaSink

Required for configuration:

import java.util.Properties
import com.codahale.metrics.MetricRegistry
import org.apache.spark.{SecurityManager, SparkConf}

Basic Usage

import org.apache.spark.metrics.sink.GangliaSink
import java.util.Properties
import com.codahale.metrics.MetricRegistry
import org.apache.spark.{SecurityManager, SparkConf}

// Set up configuration properties
val properties = new Properties()
properties.setProperty("host", "ganglia.example.com")
properties.setProperty("port", "8649")
properties.setProperty("period", "10")
properties.setProperty("unit", "SECONDS")
properties.setProperty("mode", "MULTICAST")
properties.setProperty("ttl", "1")

// Create metrics registry and security manager
val registry = new MetricRegistry()
val sparkConf = new SparkConf()
val securityMgr = new SecurityManager(sparkConf)

// Create and start the Ganglia sink
val gangliaSink = new GangliaSink(properties, registry, securityMgr)
gangliaSink.start()

// The sink will now automatically report metrics to Ganglia
// Stop when done
gangliaSink.stop()

Configuration

The GangliaSink requires specific configuration properties to connect to a Ganglia cluster:

Required Properties

  • host: Ganglia server hostname or IP address
  • port: Ganglia server port number (typically 8649)

Optional Properties

  • period: Reporting interval (default: 10)
  • unit: Time unit for reporting interval (default: SECONDS)
  • mode: UDP addressing mode - MULTICAST or UNICAST (default: MULTICAST)
  • ttl: Time-to-live for multicast messages (default: 1)
  • dmax: Dmax parameter for Ganglia (default: 0)

Capabilities

Metrics Sink

Core metrics sink functionality for reporting Spark metrics to Ganglia monitoring system.

class GangliaSink(
  val property: Properties,
  val registry: MetricRegistry,
  securityMgr: SecurityManager
) extends Sink

Lifecycle Management

Control the reporting lifecycle of the Ganglia sink.

def start(): Unit
def stop(): Unit
def report(): Unit

Configuration Utility

Helper method for accessing configuration properties safely.

def propertyToOption(prop: String): Option[String]

Configuration Constants

Predefined configuration keys and default values for Ganglia sink setup.

val GANGLIA_KEY_PERIOD: String
val GANGLIA_DEFAULT_PERIOD: Int
val GANGLIA_KEY_UNIT: String
val GANGLIA_DEFAULT_UNIT: TimeUnit
val GANGLIA_KEY_MODE: String
val GANGLIA_DEFAULT_MODE: UDPAddressingMode
val GANGLIA_KEY_TTL: String
val GANGLIA_DEFAULT_TTL: Int
val GANGLIA_KEY_HOST: String
val GANGLIA_KEY_PORT: String
val GANGLIA_KEY_DMAX: String
val GANGLIA_DEFAULT_DMAX: Int

Runtime Configuration Properties

Configuration values extracted from properties during initialization.

val host: String
val port: Int
val ttl: Int
val dmax: Int
val mode: UDPAddressingMode
val pollPeriod: Int
val pollUnit: TimeUnit

Internal Components

Core Ganglia integration components managed by the sink.

val ganglia: GMetric
val reporter: GangliaReporter

Types

From Spark Core

private[spark] trait Sink {
  def start(): Unit
  def stop(): Unit
  def report(): Unit
}

class SecurityManager(sparkConf: SparkConf)

class SparkConf {
  def this()
}

From Dropwizard Metrics

class MetricRegistry

class GangliaReporter {
  def start(period: Long, unit: TimeUnit): Unit
  def stop(): Unit
  def report(): Unit
}

From gmetric4j

class GMetric(host: String, port: Int, mode: UDPAddressingMode, ttl: Int)

enum UDPAddressingMode {
  MULTICAST, UNICAST
}

From Java Standard Library

class Properties {
  def getProperty(key: String): String
  def setProperty(key: String, value: String): Object
}

enum TimeUnit {
  NANOSECONDS, MICROSECONDS, MILLISECONDS, SECONDS, MINUTES, HOURS, DAYS
}

Error Handling

The GangliaSink throws exceptions during initialization for invalid configuration:

  • Exception: Thrown if 'host' property is not provided
  • Exception: Thrown if 'port' property is not provided
  • Validation: Polling period is validated using MetricsSystem.checkMinimalPollingPeriod to ensure minimum acceptable reporting intervals

Integration Notes

  • This module is distributed separately from core Spark due to LGPL licensing requirements from Ganglia dependencies
  • Include this module only when LGPL licensing terms are acceptable for your deployment
  • The sink integrates with Spark's existing metrics system and automatically reports configured metrics
  • Supports both multicast and unicast modes for different network configurations
  • TTL setting is important for multicast mode - must be at least the number of network hops to reach listeners
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-ganglia-lgpl_2.11@2.4.x
Publish Source
CLI
Badge
tessl/maven-org-apache-spark--spark-ganglia-lgpl-2-11 badge