Command line interface for ADAM, a library and command line tool that enables the use of Apache Spark to parallelize genomic data analysis across cluster/cloud computing environments
This document covers ADAM CLI's format conversion capabilities for transforming between various genomic file formats and ADAM's optimized Parquet storage format.
Convert FASTA sequence files to ADAM's Parquet-based nucleotide contig format for improved performance and integration with Spark-based analysis pipelines.
object Fasta2ADAM extends BDGCommandCompanion {
val commandName = "fasta2adam"
val commandDescription = "Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences."
def apply(cmdLine: Array[String]): Fasta2ADAM
}
class Fasta2ADAMArgs extends Args4jBase with ParquetSaveArgs {
var fastaFile: String // Input FASTA file path
var outputPath: String // Output ADAM file path
var verbose: Boolean // Enhanced debugging information
var reads: String // Contig ID mapping for read compatibility
var maximumLength: Long // Maximum fragment length (default: 10,000)
var partitions: Int // Number of output partitions
}Key Features:
Usage Examples:
# Basic conversion
adam-submit fasta2adam reference.fasta reference.adam
# With verbose output and custom fragment length
adam-submit fasta2adam \
--verbose \
--fragment_length 50000 \
--repartition 100 \
reference.fasta reference.adam
# Map contig IDs to match read dataset
adam-submit fasta2adam \
--reads alignments.adam \
--verbose \
reference.fasta reference.adamConvert ADAM nucleotide contig data back to standard FASTA format for compatibility with external tools.
object ADAM2Fasta extends BDGCommandCompanion {
val commandName = "adam2fasta"
val commandDescription = "Convert ADAM nucleotide contig fragments to FASTA files"
def apply(cmdLine: Array[String]): ADAM2Fasta
}
class ADAM2FastaArgs extends Args4jBase {
var inputPath: String // Input ADAM contig file
var outputPath: String // Output FASTA file path
var lineWidth: Int // FASTA line width (default: 70)
var coalesce: Int // Number of output partitions
var disableDictionary: Boolean // Skip sequence dictionary output
}Usage Examples:
# Basic conversion
adam-submit adam2fasta contigs.adam output.fasta
# Custom line width and single output file
adam-submit adam2fasta \
--lineWidth 80 \
--coalesce 1 \
contigs.adam reference.fastaConvert ADAM alignment or fragment data to FASTQ format for compatibility with external alignment tools and quality control applications.
object ADAM2Fastq extends BDGCommandCompanion {
val commandName = "adam2fastq"
val commandDescription = "Convert ADAM read data to FASTQ files"
def apply(cmdLine: Array[String]): ADAM2Fastq
}
class ADAM2FastqArgs extends Args4jBase {
var inputPath: String // Input ADAM file
var outputPath: String // Primary FASTQ output
var outputPath2: String // Secondary FASTQ for paired reads
var validationStringency: ValidationStringency // Input validation level
var repartition: Int // Output partitioning
var persistLevel: String // Spark persistence level
var disableProjection: Boolean // Disable column projection
var outputOriginalBaseQualities: Boolean // Use original quality scores
}Key Features:
Usage Examples:
# Single-end reads
adam-submit adam2fastq reads.adam output.fastq
# Paired-end reads with separate output files
adam-submit adam2fastq \
reads.adam \
output_R1.fastq \
output_R2.fastq
# Use original base qualities with lenient validation
adam-submit adam2fastq \
--outputOriginalBaseQualities \
--validationStringency LENIENT \
reads.adam output.fastq
# High-memory processing with custom persistence
adam-submit adam2fastq \
--persistLevel MEMORY_AND_DISK_SER \
--repartition 200 \
large_dataset.adam output.fastqConvert various genomic formats (SAM/BAM/CRAM) to ADAM fragment format, which maintains paired-end relationships and insert size information.
object TransformFragments extends BDGCommandCompanion {
val commandName = "transformFragments"
val commandDescription = "Convert SAM/BAM/CRAM to ADAM fragments"
def apply(cmdLine: Array[String]): TransformFragments
}
class TransformFragmentsArgs extends Args4jBase with ADAMSaveAnyArgs with ParquetArgs {
var inputPath: String // Input alignment file
var outputPath: String // Output fragment file
var coalesce: Int // Output partition count
var forceShuffle: Boolean // Force data shuffling
var storageLevel: String // Spark storage level
}Fragment Benefits:
Usage Example:
# Convert BAM to fragments with performance optimization
adam-submit transformFragments \
--coalesce 50 \
--storageLevel MEMORY_AND_DISK \
paired_reads.bam fragments.adam| Input Format | Output Format | Command | Key Features |
|---|---|---|---|
| FASTA | ADAM Contigs | fasta2adam | Sequence indexing, fragmentation |
| ADAM Contigs | FASTA | adam2fasta | Dictionary generation, line formatting |
| ADAM Reads/Alignments | FASTQ | adam2fastq | Paired-end separation, quality options |
| SAM/BAM/CRAM | ADAM Fragments | transformFragments | Insert size preservation, pairing |
# For large datasets, use disk-based persistence
--persistLevel MEMORY_AND_DISK_SER
# Control memory usage with partitioning
--repartition 100 # Increase for large files
--coalesce 10 # Decrease for small files# Force data shuffling for balanced partitions
--forceShuffle
# Disable column projection for full schema access
--disableProjection// Validation stringency levels
ValidationStringency.STRICT // Fail on any malformed data
ValidationStringency.LENIENT // Warn on malformed data
ValidationStringency.SILENT // Ignore malformed dataFASTA conversions automatically generate sequence dictionaries compatible with:
FASTQ conversions support both:
All conversions maintain compatibility with standard genomics file format specifications:
Install with Tessl CLI
npx tessl i tessl/maven-org-bdgenomics-adam--adam-cli-spark2-2-10