Command line interface for ADAM, a library and command line tool that enables the use of Apache Spark to parallelize genomic data analysis across cluster/cloud computing environments
npx @tessl/cli install tessl/maven-org-bdgenomics-adam--adam-cli-spark2_2-10@0.23.0ADAM CLI is a command-line interface for genomic data analysis using Apache Spark. It provides distributed processing capabilities for various genomic file formats including SAM/BAM/CRAM, BED/GFF3/GTF, VCF, and FASTA/FASTQ, with optimized Parquet columnar storage for improved performance and scalability.
<dependency>
<groupId>org.bdgenomics.adam</groupId>
<artifactId>adam-cli-spark2_2.10</artifactId>
<version>0.23.0</version>
</dependency>ADAM CLI is executed through the adam-submit script, which wraps Spark submission:
# Basic command structure
adam-submit [<spark-args> --] <command> [<command-args>]
# Example: Transform BAM to ADAM format
adam-submit transformAlignments input.bam output.adam
# Example with Spark arguments
adam-submit --master local[4] --driver-memory 8g -- transformAlignments input.bam output.adamADAM CLI is organized around several key architectural components:
object ADAMMain {
def main(args: Array[String]): Unit
val defaultCommandGroups: List[CommandGroup]
}
class ADAMMain @Inject() (commandGroups: List[CommandGroup]) extends Logging {
def apply(args: Array[String]): Unit
}
case class CommandGroup(name: String, commands: List[BDGCommandCompanion])Core genomic data analysis operations including k-mer counting, coverage analysis, alignment transformations, and multi-format data processing.
// K-mer analysis
object CountReadKmers extends BDGCommandCompanion {
val commandName = "countKmers"
val commandDescription = "Counts the k-mers/q-mers from a read dataset."
}
object CountContigKmers extends BDGCommandCompanion {
val commandName = "countContigKmers"
val commandDescription = "Counts the k-mers/q-mers from a read dataset."
}
// Coverage analysis
object Reads2Coverage extends BDGCommandCompanion {
val commandName = "reads2coverage"
val commandDescription = "Calculate the coverage from a given ADAM file"
}
// Data transformations
object TransformAlignments extends BDGCommandCompanion {
val commandName = "transformAlignments"
val commandDescription = "Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations"
}
object TransformFeatures extends BDGCommandCompanion {
val commandName = "transformFeatures"
val commandDescription = "Convert a file with sequence features into corresponding ADAM format and vice versa"
}
object TransformGenotypes extends BDGCommandCompanion {
val commandName = "transformGenotypes"
val commandDescription = "Convert a file with genotypes into corresponding ADAM format and vice versa"
}
object TransformVariants extends BDGCommandCompanion {
val commandName = "transformVariants"
val commandDescription = "Convert a file with variants into corresponding ADAM format and vice versa"
}
object TransformFragments extends BDGCommandCompanion {
val commandName = "transformFragments"
val commandDescription = "Convert alignment records into fragment records"
}
// Utilities
object MergeShards extends BDGCommandCompanion {
val commandName = "mergeShards"
val commandDescription = "Merges the shards of a file"
}Comprehensive format conversion utilities for transforming between various genomic file formats and ADAM's optimized Parquet format.
// FASTA conversions
object Fasta2ADAM extends BDGCommandCompanion {
val commandName = "fasta2adam"
val commandDescription = "Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences"
}
object ADAM2Fasta extends BDGCommandCompanion {
val commandName = "adam2fasta"
val commandDescription = "Convert ADAM nucleotide contig fragments to FASTA files"
}
// FASTQ conversions
object ADAM2Fastq extends BDGCommandCompanion {
val commandName = "adam2fastq"
val commandDescription = "Convert BAM to FASTQ files"
}Tools for viewing, analyzing, and generating statistics from genomic datasets, providing samtools-like functionality with distributed processing capabilities.
// Data viewing and filtering
object View extends BDGCommandCompanion {
val commandName = "view"
val commandDescription = "View certain reads from an alignment-record file."
}
object PrintADAM extends BDGCommandCompanion {
val commandName = "print"
val commandDescription = "Print an ADAM formatted file"
}
// Statistics and analysis
object FlagStat extends BDGCommandCompanion {
val commandName = "flagstat"
val commandDescription = "Print statistics on reads in an ADAM file (similar to samtools flagstat)"
}All ADAM CLI commands follow a consistent architectural pattern:
// Command companion object
trait BDGCommandCompanion {
val commandName: String
val commandDescription: String
def apply(cmdLine: Array[String]): BDGCommand
}
// Command arguments base class
class Args4jBase extends Logging with Serializable {
@Args4jOption(required = false, name = "-print_metrics", usage = "Print metrics to the log on completion")
var printMetrics = false
}
// Common argument mixins
trait ParquetArgs {
@Args4jOption(required = false, name = "-parquet_compression", usage = "Parquet compression codec")
var compressionCodec: String = "GZIP"
@Args4jOption(required = false, name = "-parquet_block_size", usage = "Parquet block size (default: 128mb)")
var blockSize: Int = 128 * 1024 * 1024
@Args4jOption(required = false, name = "-parquet_page_size", usage = "Parquet page size (default: 1mb)")
var pageSize: Int = 1024 * 1024
}
trait ParquetSaveArgs extends ParquetArgs {
@Args4jOption(required = false, name = "-disable_dictionary", usage = "Disable dictionary encoding")
var disableDictionaryEncoding = false
}
trait ADAMSaveAnyArgs {
@Args4jOption(required = false, name = "-single", usage = "Save as single file")
var asSingleFile = false
@Args4jOption(required = false, name = "-defer", usage = "Defer merging single file")
var deferMerging = false
@Args4jOption(required = false, name = "-disable_fast_concat", usage = "Disable fast concatenation")
var disableFastConcat = false
}
// Command execution
abstract class BDGSparkCommand[T <: Args4jBase] extends BDGCommand[T] {
val companion: BDGCommandCompanion
def run(sc: SparkContext): Unit
}// Validation levels for input parsing
type ValidationStringency = htsjdk.samtools.ValidationStringency
// Values: STRICT, LENIENT, SILENTMost commands support these common arguments:
class About {
def artifactId(): String
def buildTimestamp(): String
def commit(): String
def hadoopVersion(): String
def scalaVersion(): String
def sparkVersion(): String
def version(): String
def isSnapshot(): Boolean
}ADAM CLI commands use standard exit codes and provide comprehensive error messages: