This document covers ADAM CLI's data inspection capabilities for viewing, filtering, and analyzing genomic datasets. These tools provide samtools-like functionality with distributed processing capabilities.
The View command provides samtools view-like functionality for filtering and examining genomic alignment data with support for flag-based filtering and format conversion.
object View extends BDGCommandCompanion {
val commandName = "view"
val commandDescription = "View certain reads from an alignment-record file."
def apply(cmdLine: Array[String]): View
}
class ViewArgs extends Args4jBase with ParquetArgs with ADAMSaveAnyArgs {
var inputPath: String // Input alignment file
var outputPath: String // Output file (optional)
var outputPathArg: String // Alternative output specification
// Flag-based filtering (samtools-compatible)
var matchAllBits: Int // Include reads matching all bits (-f)
var mismatchAllBits: Int // Exclude reads matching all bits (-F)
var matchSomeBits: Int // Include reads matching some bits (-g)
var mismatchSomeBits: Int // Exclude reads matching some bits (-G)
// Output options
var printCount: Boolean // Print count only (-c)
}Flag Filtering Examples:
# View only mapped reads (exclude unmapped, flag 4)
adam-submit view -F 4 alignments.adam
# View only proper pairs (flag 2) that are mapped (exclude flag 4)
adam-submit view -f 2 -F 4 alignments.adam mapped_pairs.adam
# Count unmapped reads
adam-submit view -f 4 -c alignments.adam
# View first read in pair (flag 64), exclude secondary alignments (flag 256)
adam-submit view -f 64 -F 256 alignments.adam first_reads.adamCommon SAM Flags:
1: Read is paired2: Read is in proper pair4: Read is unmapped8: Mate is unmapped16: Read is on reverse strand64: First read in pair128: Second read in pair256: Secondary alignment512: Read fails quality checks1024: PCR/optical duplicateDisplay the contents of ADAM files in human-readable format for data inspection and debugging.
object PrintADAM extends BDGCommandCompanion {
val commandName = "printAdam"
val commandDescription = "Print the contents of an ADAM file"
def apply(cmdLine: Array[String]): PrintADAM
}
class PrintADAMArgs extends Args4jBase with ParquetArgs {
var inputPath: String // Input ADAM file to print
var outputPath: String // Optional output file
var pretty: Boolean // Pretty-print JSON output
var records: Int // Number of records to print
}Usage Examples:
# Print first 10 records to console
adam-submit printAdam --records 10 data.adam
# Pretty-print all records to file
adam-submit printAdam --pretty data.adam output.txt
# Inspect data structure
adam-submit printAdam --records 1 --pretty alignments.adamGenerate comprehensive alignment statistics similar to samtools flagstat, providing essential quality control metrics for sequencing data.
object FlagStat extends BDGCommandCompanion {
val commandName = "flagstat"
val commandDescription = "Print statistics about reads in an alignment file"
def apply(cmdLine: Array[String]): FlagStat
}
class FlagStatArgs extends Args4jBase {
var inputPath: String // Input alignment file
var outputPath: String // Optional output file for statistics
var stringency: String // Validation stringency
}Statistics Generated:
Usage Examples:
# Basic flagstat to console
adam-submit flagstat alignments.adam
# Save statistics to file
adam-submit flagstat alignments.adam stats.txt
# Use lenient validation for problematic files
adam-submit flagstat --stringency LENIENT alignments.adamSample Output:
71723 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
69543 + 0 mapped (97.0% : N/A)
71723 + 0 paired in sequencing
35861 + 0 read1
35862 + 0 read2
67432 + 0 properly paired (94.0% : N/A)
69543 + 0 with itself and mate mapped
0 + 0 singletons (0.0% : N/A)All inspection tools support configurable validation stringency for handling problematic data:
// Validation levels
ValidationStringency.STRICT // Fail on any validation errors
ValidationStringency.LENIENT // Issue warnings for validation errors
ValidationStringency.SILENT // Ignore validation errorsUsage in Commands:
# Strict validation (default)
adam-submit view --stringency STRICT alignments.adam
# Lenient validation for legacy data
adam-submit flagstat --stringency LENIENT old_alignments.adam
# Silent validation for known problematic files
adam-submit printAdam --stringency SILENT problematic.adamFor very large datasets, consider these optimization strategies:
# Use sampling for quick inspection
adam-submit view -c alignments.adam # Count only, no data transfer
# Limit record processing for quick stats
adam-submit printAdam --records 1000 large_file.adam
# Use appropriate Spark resources
adam-submit --driver-memory 8g --executor-memory 4g -- \
flagstat huge_alignment.adam# For memory-intensive operations
adam-submit --conf spark.sql.adaptive.enabled=true \
--conf spark.sql.adaptive.coalescePartitions.enabled=true \
view -f 2 large_alignments.adam filtered.adamThe View command is commonly used to prepare data subsets:
# Extract high-quality mapped pairs for variant calling
adam-submit view \
-f 3 \ # Paired and both mapped
-F 1028 \ # Exclude duplicates and secondary
-q 20 \ # Minimum mapping quality
input.adam high_quality.adam
# Extract unmapped reads for assembly
adam-submit view -f 4 input.adam unmapped.adam
# Extract reads from specific chromosome
adam-submit view \
--regionPredicate "referenceName=chr22" \
input.adam chr22.adamCombine tools for comprehensive QC:
# 1. Get overall statistics
adam-submit flagstat input.adam > qc_stats.txt
# 2. Inspect problematic reads
adam-submit view -f 512 input.adam failed_qc.adam
# 3. Check duplicate rates
adam-submit view -f 1024 -c input.adam# Validate file integrity
adam-submit printAdam --records 1 --stringency STRICT data.adam
# Generate detailed statistics
adam-submit flagstat --stringency STRICT data.adam stats.txt
# Filter and validate simultaneously
adam-submit view -F 4 --stringency LENIENT input.adam validated.adamThe View command supports multiple output formats through the ADAMSaveAnyArgs mixin:
# Save as BAM for external tools
adam-submit view -f 2 input.adam -o output.bam
# Save as JSON for analysis scripts
adam-submit view --records 100 input.adam -o sample.json
# Save as text for manual inspection
adam-submit view --records 10 input.adam -o sample.txt