0
# ADAM CLI
1
2
ADAM CLI is a command-line interface for genomic data analysis using Apache Spark. It provides distributed processing capabilities for various genomic file formats including SAM/BAM/CRAM, BED/GFF3/GTF, VCF, and FASTA/FASTQ, with optimized Parquet columnar storage for improved performance and scalability.
3
4
## Package Information
5
6
- **Package Name**: adam-cli-spark2_2.10
7
- **Package Type**: maven
8
- **Language**: Scala (with Java support)
9
- **Group ID**: org.bdgenomics.adam
10
- **Version**: 0.23.0
11
- **Installation**:
12
```xml
13
<dependency>
14
<groupId>org.bdgenomics.adam</groupId>
15
<artifactId>adam-cli-spark2_2.10</artifactId>
16
<version>0.23.0</version>
17
</dependency>
18
```
19
Or download precompiled distribution from GitHub releases
20
21
## Core Usage
22
23
ADAM CLI is executed through the `adam-submit` script, which wraps Spark submission:
24
25
```bash
26
# Basic command structure
27
adam-submit [<spark-args> --] <command> [<command-args>]
28
29
# Example: Transform BAM to ADAM format
30
adam-submit transformAlignments input.bam output.adam
31
32
# Example with Spark arguments
33
adam-submit --master local[4] --driver-memory 8g -- transformAlignments input.bam output.adam
34
```
35
36
## Architecture
37
38
ADAM CLI is organized around several key architectural components:
39
40
- **Command System**: Modular command structure with 15 specialized tools organized into 3 functional groups
41
- **Spark Integration**: Built-in Apache Spark integration for distributed processing across clusters
42
- **Format Support**: Comprehensive support for genomic file formats with intelligent format detection
43
- **Parquet Optimization**: Columnar storage format for improved query performance and compression
44
- **Streaming Processing**: Ability to process large datasets that exceed single-node memory capacity
45
46
## Main Entry Point
47
48
```scala { .api }
49
object ADAMMain {
50
def main(args: Array[String]): Unit
51
val defaultCommandGroups: List[CommandGroup]
52
}
53
54
class ADAMMain @Inject() (commandGroups: List[CommandGroup]) extends Logging {
55
def apply(args: Array[String]): Unit
56
}
57
58
case class CommandGroup(name: String, commands: List[BDGCommandCompanion])
59
```
60
61
## Capabilities
62
63
### Genomic Data Processing
64
65
Core genomic data analysis operations including k-mer counting, coverage analysis, alignment transformations, and multi-format data processing.
66
67
```scala { .api }
68
// K-mer analysis
69
object CountReadKmers extends BDGCommandCompanion {
70
val commandName = "countKmers"
71
val commandDescription = "Counts the k-mers/q-mers from a read dataset."
72
}
73
object CountContigKmers extends BDGCommandCompanion {
74
val commandName = "countContigKmers"
75
val commandDescription = "Counts the k-mers/q-mers from a read dataset."
76
}
77
78
// Coverage analysis
79
object Reads2Coverage extends BDGCommandCompanion {
80
val commandName = "reads2coverage"
81
val commandDescription = "Calculate the coverage from a given ADAM file"
82
}
83
84
// Data transformations
85
object TransformAlignments extends BDGCommandCompanion {
86
val commandName = "transformAlignments"
87
val commandDescription = "Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations"
88
}
89
object TransformFeatures extends BDGCommandCompanion {
90
val commandName = "transformFeatures"
91
val commandDescription = "Convert a file with sequence features into corresponding ADAM format and vice versa"
92
}
93
object TransformGenotypes extends BDGCommandCompanion {
94
val commandName = "transformGenotypes"
95
val commandDescription = "Convert a file with genotypes into corresponding ADAM format and vice versa"
96
}
97
object TransformVariants extends BDGCommandCompanion {
98
val commandName = "transformVariants"
99
val commandDescription = "Convert a file with variants into corresponding ADAM format and vice versa"
100
}
101
object TransformFragments extends BDGCommandCompanion {
102
val commandName = "transformFragments"
103
val commandDescription = "Convert alignment records into fragment records"
104
}
105
106
// Utilities
107
object MergeShards extends BDGCommandCompanion {
108
val commandName = "mergeShards"
109
val commandDescription = "Merges the shards of a file"
110
}
111
```
112
113
[Genomic Data Processing](./genomic-processing.md)
114
115
### Format Conversion
116
117
Comprehensive format conversion utilities for transforming between various genomic file formats and ADAM's optimized Parquet format.
118
119
```scala { .api }
120
// FASTA conversions
121
object Fasta2ADAM extends BDGCommandCompanion {
122
val commandName = "fasta2adam"
123
val commandDescription = "Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences"
124
}
125
object ADAM2Fasta extends BDGCommandCompanion {
126
val commandName = "adam2fasta"
127
val commandDescription = "Convert ADAM nucleotide contig fragments to FASTA files"
128
}
129
130
// FASTQ conversions
131
object ADAM2Fastq extends BDGCommandCompanion {
132
val commandName = "adam2fastq"
133
val commandDescription = "Convert BAM to FASTQ files"
134
}
135
```
136
137
[Format Conversion](./format-conversion.md)
138
139
### Data Inspection and Analysis
140
141
Tools for viewing, analyzing, and generating statistics from genomic datasets, providing samtools-like functionality with distributed processing capabilities.
142
143
```scala { .api }
144
// Data viewing and filtering
145
object View extends BDGCommandCompanion {
146
val commandName = "view"
147
val commandDescription = "View certain reads from an alignment-record file."
148
}
149
object PrintADAM extends BDGCommandCompanion {
150
val commandName = "print"
151
val commandDescription = "Print an ADAM formatted file"
152
}
153
154
// Statistics and analysis
155
object FlagStat extends BDGCommandCompanion {
156
val commandName = "flagstat"
157
val commandDescription = "Print statistics on reads in an ADAM file (similar to samtools flagstat)"
158
}
159
```
160
161
[Data Inspection](./data-inspection.md)
162
163
## Common Types and Patterns
164
165
### Command Pattern
166
167
All ADAM CLI commands follow a consistent architectural pattern:
168
169
```scala { .api }
170
// Command companion object
171
trait BDGCommandCompanion {
172
val commandName: String
173
val commandDescription: String
174
def apply(cmdLine: Array[String]): BDGCommand
175
}
176
177
// Command arguments base class
178
class Args4jBase extends Logging with Serializable {
179
@Args4jOption(required = false, name = "-print_metrics", usage = "Print metrics to the log on completion")
180
var printMetrics = false
181
}
182
183
// Common argument mixins
184
trait ParquetArgs {
185
@Args4jOption(required = false, name = "-parquet_compression", usage = "Parquet compression codec")
186
var compressionCodec: String = "GZIP"
187
188
@Args4jOption(required = false, name = "-parquet_block_size", usage = "Parquet block size (default: 128mb)")
189
var blockSize: Int = 128 * 1024 * 1024
190
191
@Args4jOption(required = false, name = "-parquet_page_size", usage = "Parquet page size (default: 1mb)")
192
var pageSize: Int = 1024 * 1024
193
}
194
195
trait ParquetSaveArgs extends ParquetArgs {
196
@Args4jOption(required = false, name = "-disable_dictionary", usage = "Disable dictionary encoding")
197
var disableDictionaryEncoding = false
198
}
199
200
trait ADAMSaveAnyArgs {
201
@Args4jOption(required = false, name = "-single", usage = "Save as single file")
202
var asSingleFile = false
203
204
@Args4jOption(required = false, name = "-defer", usage = "Defer merging single file")
205
var deferMerging = false
206
207
@Args4jOption(required = false, name = "-disable_fast_concat", usage = "Disable fast concatenation")
208
var disableFastConcat = false
209
}
210
211
// Command execution
212
abstract class BDGSparkCommand[T <: Args4jBase] extends BDGCommand[T] {
213
val companion: BDGCommandCompanion
214
def run(sc: SparkContext): Unit
215
}
216
```
217
218
### Validation Stringency
219
220
```scala { .api }
221
// Validation levels for input parsing
222
type ValidationStringency = htsjdk.samtools.ValidationStringency
223
// Values: STRICT, LENIENT, SILENT
224
```
225
226
### Common Arguments
227
228
Most commands support these common arguments:
229
230
- **Input/Output Paths**: File system paths for source and destination data
231
- **Partitioning**: Control over data partitioning for performance optimization
232
- **Validation**: Stringency levels for input data validation
233
- **Storage**: Spark storage levels for intermediate data caching
234
- **Format Options**: Parquet-specific configuration options
235
236
## Version Information
237
238
```scala { .api }
239
class About {
240
def artifactId(): String
241
def buildTimestamp(): String
242
def commit(): String
243
def hadoopVersion(): String
244
def scalaVersion(): String
245
def sparkVersion(): String
246
def version(): String
247
def isSnapshot(): Boolean
248
}
249
```
250
251
## Error Handling
252
253
ADAM CLI commands use standard exit codes and provide comprehensive error messages:
254
255
- **Exit Code 0**: Successful execution
256
- **Exit Code 1**: General errors (invalid arguments, file not found, etc.)
257
- **Spark Exceptions**: Distributed processing errors with full stack traces
258
- **Validation Errors**: Input data validation failures with detailed reports