This document covers ADAM CLI's format conversion capabilities for transforming between various genomic file formats and ADAM's optimized Parquet storage format.
Convert FASTA sequence files to ADAM's Parquet-based nucleotide contig format for improved performance and integration with Spark-based analysis pipelines.
object Fasta2ADAM extends BDGCommandCompanion {
val commandName = "fasta2adam"
val commandDescription = "Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences."
def apply(cmdLine: Array[String]): Fasta2ADAM
}
class Fasta2ADAMArgs extends Args4jBase with ParquetSaveArgs {
var fastaFile: String // Input FASTA file path
var outputPath: String // Output ADAM file path
var verbose: Boolean // Enhanced debugging information
var reads: String // Contig ID mapping for read compatibility
var maximumLength: Long // Maximum fragment length (default: 10,000)
var partitions: Int // Number of output partitions
}Key Features:
Usage Examples:
# Basic conversion
adam-submit fasta2adam reference.fasta reference.adam
# With verbose output and custom fragment length
adam-submit fasta2adam \
--verbose \
--fragment_length 50000 \
--repartition 100 \
reference.fasta reference.adam
# Map contig IDs to match read dataset
adam-submit fasta2adam \
--reads alignments.adam \
--verbose \
reference.fasta reference.adamConvert ADAM nucleotide contig data back to standard FASTA format for compatibility with external tools.
object ADAM2Fasta extends BDGCommandCompanion {
val commandName = "adam2fasta"
val commandDescription = "Convert ADAM nucleotide contig fragments to FASTA files"
def apply(cmdLine: Array[String]): ADAM2Fasta
}
class ADAM2FastaArgs extends Args4jBase {
var inputPath: String // Input ADAM contig file
var outputPath: String // Output FASTA file path
var lineWidth: Int // FASTA line width (default: 70)
var coalesce: Int // Number of output partitions
var disableDictionary: Boolean // Skip sequence dictionary output
}Usage Examples:
# Basic conversion
adam-submit adam2fasta contigs.adam output.fasta
# Custom line width and single output file
adam-submit adam2fasta \
--lineWidth 80 \
--coalesce 1 \
contigs.adam reference.fastaConvert ADAM alignment or fragment data to FASTQ format for compatibility with external alignment tools and quality control applications.
object ADAM2Fastq extends BDGCommandCompanion {
val commandName = "adam2fastq"
val commandDescription = "Convert ADAM read data to FASTQ files"
def apply(cmdLine: Array[String]): ADAM2Fastq
}
class ADAM2FastqArgs extends Args4jBase {
var inputPath: String // Input ADAM file
var outputPath: String // Primary FASTQ output
var outputPath2: String // Secondary FASTQ for paired reads
var validationStringency: ValidationStringency // Input validation level
var repartition: Int // Output partitioning
var persistLevel: String // Spark persistence level
var disableProjection: Boolean // Disable column projection
var outputOriginalBaseQualities: Boolean // Use original quality scores
}Key Features:
Usage Examples:
# Single-end reads
adam-submit adam2fastq reads.adam output.fastq
# Paired-end reads with separate output files
adam-submit adam2fastq \
reads.adam \
output_R1.fastq \
output_R2.fastq
# Use original base qualities with lenient validation
adam-submit adam2fastq \
--outputOriginalBaseQualities \
--validationStringency LENIENT \
reads.adam output.fastq
# High-memory processing with custom persistence
adam-submit adam2fastq \
--persistLevel MEMORY_AND_DISK_SER \
--repartition 200 \
large_dataset.adam output.fastqConvert various genomic formats (SAM/BAM/CRAM) to ADAM fragment format, which maintains paired-end relationships and insert size information.
object TransformFragments extends BDGCommandCompanion {
val commandName = "transformFragments"
val commandDescription = "Convert SAM/BAM/CRAM to ADAM fragments"
def apply(cmdLine: Array[String]): TransformFragments
}
class TransformFragmentsArgs extends Args4jBase with ADAMSaveAnyArgs with ParquetArgs {
var inputPath: String // Input alignment file
var outputPath: String // Output fragment file
var coalesce: Int // Output partition count
var forceShuffle: Boolean // Force data shuffling
var storageLevel: String // Spark storage level
}Fragment Benefits:
Usage Example:
# Convert BAM to fragments with performance optimization
adam-submit transformFragments \
--coalesce 50 \
--storageLevel MEMORY_AND_DISK \
paired_reads.bam fragments.adam| Input Format | Output Format | Command | Key Features |
|---|---|---|---|
| FASTA | ADAM Contigs | fasta2adam | Sequence indexing, fragmentation |
| ADAM Contigs | FASTA | adam2fasta | Dictionary generation, line formatting |
| ADAM Reads/Alignments | FASTQ | adam2fastq | Paired-end separation, quality options |
| SAM/BAM/CRAM | ADAM Fragments | transformFragments | Insert size preservation, pairing |
# For large datasets, use disk-based persistence
--persistLevel MEMORY_AND_DISK_SER
# Control memory usage with partitioning
--repartition 100 # Increase for large files
--coalesce 10 # Decrease for small files# Force data shuffling for balanced partitions
--forceShuffle
# Disable column projection for full schema access
--disableProjection// Validation stringency levels
ValidationStringency.STRICT // Fail on any malformed data
ValidationStringency.LENIENT // Warn on malformed data
ValidationStringency.SILENT // Ignore malformed dataFASTA conversions automatically generate sequence dictionaries compatible with:
FASTQ conversions support both:
All conversions maintain compatibility with standard genomics file format specifications: