or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

data-inspection.mdformat-conversion.mdgenomic-processing.mdindex.md
tile.json

tessl/maven-org-bdgenomics-adam--adam-cli-spark2_2-10

Command line interface for ADAM, a library and command line tool that enables the use of Apache Spark to parallelize genomic data analysis across cluster/cloud computing environments

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.bdgenomics.adam/adam-cli-spark2_2.10@0.23.x

To install, run

npx @tessl/cli install tessl/maven-org-bdgenomics-adam--adam-cli-spark2_2-10@0.23.0

index.mddocs/

ADAM CLI

ADAM CLI is a command-line interface for genomic data analysis using Apache Spark. It provides distributed processing capabilities for various genomic file formats including SAM/BAM/CRAM, BED/GFF3/GTF, VCF, and FASTA/FASTQ, with optimized Parquet columnar storage for improved performance and scalability.

Package Information

  • Package Name: adam-cli-spark2_2.10
  • Package Type: maven
  • Language: Scala (with Java support)
  • Group ID: org.bdgenomics.adam
  • Version: 0.23.0
  • Installation:
    <dependency>
      <groupId>org.bdgenomics.adam</groupId>
      <artifactId>adam-cli-spark2_2.10</artifactId>
      <version>0.23.0</version>
    </dependency>
    Or download precompiled distribution from GitHub releases

Core Usage

ADAM CLI is executed through the adam-submit script, which wraps Spark submission:

# Basic command structure
adam-submit [<spark-args> --] <command> [<command-args>]

# Example: Transform BAM to ADAM format
adam-submit transformAlignments input.bam output.adam

# Example with Spark arguments
adam-submit --master local[4] --driver-memory 8g -- transformAlignments input.bam output.adam

Architecture

ADAM CLI is organized around several key architectural components:

  • Command System: Modular command structure with 15 specialized tools organized into 3 functional groups
  • Spark Integration: Built-in Apache Spark integration for distributed processing across clusters
  • Format Support: Comprehensive support for genomic file formats with intelligent format detection
  • Parquet Optimization: Columnar storage format for improved query performance and compression
  • Streaming Processing: Ability to process large datasets that exceed single-node memory capacity

Main Entry Point

object ADAMMain {
  def main(args: Array[String]): Unit
  val defaultCommandGroups: List[CommandGroup]
}

class ADAMMain @Inject() (commandGroups: List[CommandGroup]) extends Logging {
  def apply(args: Array[String]): Unit
}

case class CommandGroup(name: String, commands: List[BDGCommandCompanion])

Capabilities

Genomic Data Processing

Core genomic data analysis operations including k-mer counting, coverage analysis, alignment transformations, and multi-format data processing.

// K-mer analysis
object CountReadKmers extends BDGCommandCompanion {
  val commandName = "countKmers"
  val commandDescription = "Counts the k-mers/q-mers from a read dataset."
}
object CountContigKmers extends BDGCommandCompanion {
  val commandName = "countContigKmers"
  val commandDescription = "Counts the k-mers/q-mers from a read dataset."
}

// Coverage analysis  
object Reads2Coverage extends BDGCommandCompanion {
  val commandName = "reads2coverage"
  val commandDescription = "Calculate the coverage from a given ADAM file"
}

// Data transformations
object TransformAlignments extends BDGCommandCompanion {
  val commandName = "transformAlignments"
  val commandDescription = "Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations"
}
object TransformFeatures extends BDGCommandCompanion {
  val commandName = "transformFeatures"
  val commandDescription = "Convert a file with sequence features into corresponding ADAM format and vice versa"
}
object TransformGenotypes extends BDGCommandCompanion {
  val commandName = "transformGenotypes"
  val commandDescription = "Convert a file with genotypes into corresponding ADAM format and vice versa"
}
object TransformVariants extends BDGCommandCompanion {
  val commandName = "transformVariants"
  val commandDescription = "Convert a file with variants into corresponding ADAM format and vice versa"
}
object TransformFragments extends BDGCommandCompanion {
  val commandName = "transformFragments"
  val commandDescription = "Convert alignment records into fragment records"
}

// Utilities
object MergeShards extends BDGCommandCompanion {
  val commandName = "mergeShards"
  val commandDescription = "Merges the shards of a file"
}

Genomic Data Processing

Format Conversion

Comprehensive format conversion utilities for transforming between various genomic file formats and ADAM's optimized Parquet format.

// FASTA conversions
object Fasta2ADAM extends BDGCommandCompanion {
  val commandName = "fasta2adam"
  val commandDescription = "Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences"
}
object ADAM2Fasta extends BDGCommandCompanion {
  val commandName = "adam2fasta"
  val commandDescription = "Convert ADAM nucleotide contig fragments to FASTA files"
}

// FASTQ conversions
object ADAM2Fastq extends BDGCommandCompanion {
  val commandName = "adam2fastq"
  val commandDescription = "Convert BAM to FASTQ files"
}

Format Conversion

Data Inspection and Analysis

Tools for viewing, analyzing, and generating statistics from genomic datasets, providing samtools-like functionality with distributed processing capabilities.

// Data viewing and filtering
object View extends BDGCommandCompanion {
  val commandName = "view"
  val commandDescription = "View certain reads from an alignment-record file."
}
object PrintADAM extends BDGCommandCompanion {
  val commandName = "print" 
  val commandDescription = "Print an ADAM formatted file"
}

// Statistics and analysis
object FlagStat extends BDGCommandCompanion {
  val commandName = "flagstat"
  val commandDescription = "Print statistics on reads in an ADAM file (similar to samtools flagstat)"
}

Data Inspection

Common Types and Patterns

Command Pattern

All ADAM CLI commands follow a consistent architectural pattern:

// Command companion object
trait BDGCommandCompanion {
  val commandName: String
  val commandDescription: String
  def apply(cmdLine: Array[String]): BDGCommand
}

// Command arguments base class
class Args4jBase extends Logging with Serializable {
  @Args4jOption(required = false, name = "-print_metrics", usage = "Print metrics to the log on completion")
  var printMetrics = false
}

// Common argument mixins
trait ParquetArgs {
  @Args4jOption(required = false, name = "-parquet_compression", usage = "Parquet compression codec")
  var compressionCodec: String = "GZIP"
  
  @Args4jOption(required = false, name = "-parquet_block_size", usage = "Parquet block size (default: 128mb)")
  var blockSize: Int = 128 * 1024 * 1024
  
  @Args4jOption(required = false, name = "-parquet_page_size", usage = "Parquet page size (default: 1mb)")
  var pageSize: Int = 1024 * 1024
}

trait ParquetSaveArgs extends ParquetArgs {
  @Args4jOption(required = false, name = "-disable_dictionary", usage = "Disable dictionary encoding")
  var disableDictionaryEncoding = false
}

trait ADAMSaveAnyArgs {
  @Args4jOption(required = false, name = "-single", usage = "Save as single file")
  var asSingleFile = false
  
  @Args4jOption(required = false, name = "-defer", usage = "Defer merging single file")
  var deferMerging = false
  
  @Args4jOption(required = false, name = "-disable_fast_concat", usage = "Disable fast concatenation")
  var disableFastConcat = false
}

// Command execution
abstract class BDGSparkCommand[T <: Args4jBase] extends BDGCommand[T] {
  val companion: BDGCommandCompanion
  def run(sc: SparkContext): Unit
}

Validation Stringency

// Validation levels for input parsing
type ValidationStringency = htsjdk.samtools.ValidationStringency
// Values: STRICT, LENIENT, SILENT

Common Arguments

Most commands support these common arguments:

  • Input/Output Paths: File system paths for source and destination data
  • Partitioning: Control over data partitioning for performance optimization
  • Validation: Stringency levels for input data validation
  • Storage: Spark storage levels for intermediate data caching
  • Format Options: Parquet-specific configuration options

Version Information

class About {
  def artifactId(): String
  def buildTimestamp(): String  
  def commit(): String
  def hadoopVersion(): String
  def scalaVersion(): String
  def sparkVersion(): String
  def version(): String
  def isSnapshot(): Boolean
}

Error Handling

ADAM CLI commands use standard exit codes and provide comprehensive error messages:

  • Exit Code 0: Successful execution
  • Exit Code 1: General errors (invalid arguments, file not found, etc.)
  • Spark Exceptions: Distributed processing errors with full stack traces
  • Validation Errors: Input data validation failures with detailed reports