CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-bdgenomics-adam--adam-cli-spark2-2-10

Command line interface for ADAM, a library and command line tool that enables the use of Apache Spark to parallelize genomic data analysis across cluster/cloud computing environments

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

ADAM CLI

ADAM CLI is a command-line interface for genomic data analysis using Apache Spark. It provides distributed processing capabilities for various genomic file formats including SAM/BAM/CRAM, BED/GFF3/GTF, VCF, and FASTA/FASTQ, with optimized Parquet columnar storage for improved performance and scalability.

Package Information

  • Package Name: adam-cli-spark2_2.10
  • Package Type: maven
  • Language: Scala (with Java support)
  • Group ID: org.bdgenomics.adam
  • Version: 0.23.0
  • Installation:
    <dependency>
      <groupId>org.bdgenomics.adam</groupId>
      <artifactId>adam-cli-spark2_2.10</artifactId>
      <version>0.23.0</version>
    </dependency>
    Or download precompiled distribution from GitHub releases

Core Usage

ADAM CLI is executed through the adam-submit script, which wraps Spark submission:

# Basic command structure
adam-submit [<spark-args> --] <command> [<command-args>]

# Example: Transform BAM to ADAM format
adam-submit transformAlignments input.bam output.adam

# Example with Spark arguments
adam-submit --master local[4] --driver-memory 8g -- transformAlignments input.bam output.adam

Architecture

ADAM CLI is organized around several key architectural components:

  • Command System: Modular command structure with 15 specialized tools organized into 3 functional groups
  • Spark Integration: Built-in Apache Spark integration for distributed processing across clusters
  • Format Support: Comprehensive support for genomic file formats with intelligent format detection
  • Parquet Optimization: Columnar storage format for improved query performance and compression
  • Streaming Processing: Ability to process large datasets that exceed single-node memory capacity

Main Entry Point

object ADAMMain {
  def main(args: Array[String]): Unit
  val defaultCommandGroups: List[CommandGroup]
}

class ADAMMain @Inject() (commandGroups: List[CommandGroup]) extends Logging {
  def apply(args: Array[String]): Unit
}

case class CommandGroup(name: String, commands: List[BDGCommandCompanion])

Capabilities

Genomic Data Processing

Core genomic data analysis operations including k-mer counting, coverage analysis, alignment transformations, and multi-format data processing.

// K-mer analysis
object CountReadKmers extends BDGCommandCompanion {
  val commandName = "countKmers"
  val commandDescription = "Counts the k-mers/q-mers from a read dataset."
}
object CountContigKmers extends BDGCommandCompanion {
  val commandName = "countContigKmers"
  val commandDescription = "Counts the k-mers/q-mers from a read dataset."
}

// Coverage analysis  
object Reads2Coverage extends BDGCommandCompanion {
  val commandName = "reads2coverage"
  val commandDescription = "Calculate the coverage from a given ADAM file"
}

// Data transformations
object TransformAlignments extends BDGCommandCompanion {
  val commandName = "transformAlignments"
  val commandDescription = "Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations"
}
object TransformFeatures extends BDGCommandCompanion {
  val commandName = "transformFeatures"
  val commandDescription = "Convert a file with sequence features into corresponding ADAM format and vice versa"
}
object TransformGenotypes extends BDGCommandCompanion {
  val commandName = "transformGenotypes"
  val commandDescription = "Convert a file with genotypes into corresponding ADAM format and vice versa"
}
object TransformVariants extends BDGCommandCompanion {
  val commandName = "transformVariants"
  val commandDescription = "Convert a file with variants into corresponding ADAM format and vice versa"
}
object TransformFragments extends BDGCommandCompanion {
  val commandName = "transformFragments"
  val commandDescription = "Convert alignment records into fragment records"
}

// Utilities
object MergeShards extends BDGCommandCompanion {
  val commandName = "mergeShards"
  val commandDescription = "Merges the shards of a file"
}

Genomic Data Processing

Format Conversion

Comprehensive format conversion utilities for transforming between various genomic file formats and ADAM's optimized Parquet format.

// FASTA conversions
object Fasta2ADAM extends BDGCommandCompanion {
  val commandName = "fasta2adam"
  val commandDescription = "Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences"
}
object ADAM2Fasta extends BDGCommandCompanion {
  val commandName = "adam2fasta"
  val commandDescription = "Convert ADAM nucleotide contig fragments to FASTA files"
}

// FASTQ conversions
object ADAM2Fastq extends BDGCommandCompanion {
  val commandName = "adam2fastq"
  val commandDescription = "Convert BAM to FASTQ files"
}

Format Conversion

Data Inspection and Analysis

Tools for viewing, analyzing, and generating statistics from genomic datasets, providing samtools-like functionality with distributed processing capabilities.

// Data viewing and filtering
object View extends BDGCommandCompanion {
  val commandName = "view"
  val commandDescription = "View certain reads from an alignment-record file."
}
object PrintADAM extends BDGCommandCompanion {
  val commandName = "print" 
  val commandDescription = "Print an ADAM formatted file"
}

// Statistics and analysis
object FlagStat extends BDGCommandCompanion {
  val commandName = "flagstat"
  val commandDescription = "Print statistics on reads in an ADAM file (similar to samtools flagstat)"
}

Data Inspection

Common Types and Patterns

Command Pattern

All ADAM CLI commands follow a consistent architectural pattern:

// Command companion object
trait BDGCommandCompanion {
  val commandName: String
  val commandDescription: String
  def apply(cmdLine: Array[String]): BDGCommand
}

// Command arguments base class
class Args4jBase extends Logging with Serializable {
  @Args4jOption(required = false, name = "-print_metrics", usage = "Print metrics to the log on completion")
  var printMetrics = false
}

// Common argument mixins
trait ParquetArgs {
  @Args4jOption(required = false, name = "-parquet_compression", usage = "Parquet compression codec")
  var compressionCodec: String = "GZIP"
  
  @Args4jOption(required = false, name = "-parquet_block_size", usage = "Parquet block size (default: 128mb)")
  var blockSize: Int = 128 * 1024 * 1024
  
  @Args4jOption(required = false, name = "-parquet_page_size", usage = "Parquet page size (default: 1mb)")
  var pageSize: Int = 1024 * 1024
}

trait ParquetSaveArgs extends ParquetArgs {
  @Args4jOption(required = false, name = "-disable_dictionary", usage = "Disable dictionary encoding")
  var disableDictionaryEncoding = false
}

trait ADAMSaveAnyArgs {
  @Args4jOption(required = false, name = "-single", usage = "Save as single file")
  var asSingleFile = false
  
  @Args4jOption(required = false, name = "-defer", usage = "Defer merging single file")
  var deferMerging = false
  
  @Args4jOption(required = false, name = "-disable_fast_concat", usage = "Disable fast concatenation")
  var disableFastConcat = false
}

// Command execution
abstract class BDGSparkCommand[T <: Args4jBase] extends BDGCommand[T] {
  val companion: BDGCommandCompanion
  def run(sc: SparkContext): Unit
}

Validation Stringency

// Validation levels for input parsing
type ValidationStringency = htsjdk.samtools.ValidationStringency
// Values: STRICT, LENIENT, SILENT

Common Arguments

Most commands support these common arguments:

  • Input/Output Paths: File system paths for source and destination data
  • Partitioning: Control over data partitioning for performance optimization
  • Validation: Stringency levels for input data validation
  • Storage: Spark storage levels for intermediate data caching
  • Format Options: Parquet-specific configuration options

Version Information

class About {
  def artifactId(): String
  def buildTimestamp(): String  
  def commit(): String
  def hadoopVersion(): String
  def scalaVersion(): String
  def sparkVersion(): String
  def version(): String
  def isSnapshot(): Boolean
}

Error Handling

ADAM CLI commands use standard exit codes and provide comprehensive error messages:

  • Exit Code 0: Successful execution
  • Exit Code 1: General errors (invalid arguments, file not found, etc.)
  • Spark Exceptions: Distributed processing errors with full stack traces
  • Validation Errors: Input data validation failures with detailed reports
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.bdgenomics.adam/adam-cli-spark2_2.10@0.23.x
Publish Source
CLI
Badge
tessl/maven-org-bdgenomics-adam--adam-cli-spark2-2-10 badge