Tessl Tile for maven/org.bdgenomics.adam/adam-cli-spark2_2.10@0.23.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

data-inspection.md format-conversion.md genomic-processing.md index.md

index.mddocs/

0
# ADAM CLI
1

2
ADAM CLI is a command-line interface for genomic data analysis using Apache Spark. It provides distributed processing capabilities for various genomic file formats including SAM/BAM/CRAM, BED/GFF3/GTF, VCF, and FASTA/FASTQ, with optimized Parquet columnar storage for improved performance and scalability.
3

4
## Package Information
5

6
- **Package Name**: adam-cli-spark2_2.10
7
- **Package Type**: maven
8
- **Language**: Scala (with Java support)
9
- **Group ID**: org.bdgenomics.adam
10
- **Version**: 0.23.0
11
- **Installation**: 
12
  ```xml
13
  <dependency>
14
    <groupId>org.bdgenomics.adam</groupId>
15
    <artifactId>adam-cli-spark2_2.10</artifactId>
16
    <version>0.23.0</version>
17
  </dependency>
18
  ```
19
  Or download precompiled distribution from GitHub releases
20

21
## Core Usage
22

23
ADAM CLI is executed through the `adam-submit` script, which wraps Spark submission:
24

25
```bash
26
# Basic command structure
27
adam-submit [<spark-args> --] <command> [<command-args>]
28

29
# Example: Transform BAM to ADAM format
30
adam-submit transformAlignments input.bam output.adam
31

32
# Example with Spark arguments
33
adam-submit --master local[4] --driver-memory 8g -- transformAlignments input.bam output.adam
34
```
35

36
## Architecture
37

38
ADAM CLI is organized around several key architectural components:
39

40
- **Command System**: Modular command structure with 15 specialized tools organized into 3 functional groups
41
- **Spark Integration**: Built-in Apache Spark integration for distributed processing across clusters
42
- **Format Support**: Comprehensive support for genomic file formats with intelligent format detection
43
- **Parquet Optimization**: Columnar storage format for improved query performance and compression
44
- **Streaming Processing**: Ability to process large datasets that exceed single-node memory capacity
45

46
## Main Entry Point
47

48
```scala { .api }
49
object ADAMMain {
50
  def main(args: Array[String]): Unit
51
  val defaultCommandGroups: List[CommandGroup]
52
}
53

54
class ADAMMain @Inject() (commandGroups: List[CommandGroup]) extends Logging {
55
  def apply(args: Array[String]): Unit
56
}
57

58
case class CommandGroup(name: String, commands: List[BDGCommandCompanion])
59
```
60

61
## Capabilities
62

63
### Genomic Data Processing
64

65
Core genomic data analysis operations including k-mer counting, coverage analysis, alignment transformations, and multi-format data processing.
66

67
```scala { .api }
68
// K-mer analysis
69
object CountReadKmers extends BDGCommandCompanion {
70
  val commandName = "countKmers"
71
  val commandDescription = "Counts the k-mers/q-mers from a read dataset."
72
}
73
object CountContigKmers extends BDGCommandCompanion {
74
  val commandName = "countContigKmers"
75
  val commandDescription = "Counts the k-mers/q-mers from a read dataset."
76
}
77

78
// Coverage analysis  
79
object Reads2Coverage extends BDGCommandCompanion {
80
  val commandName = "reads2coverage"
81
  val commandDescription = "Calculate the coverage from a given ADAM file"
82
}
83

84
// Data transformations
85
object TransformAlignments extends BDGCommandCompanion {
86
  val commandName = "transformAlignments"
87
  val commandDescription = "Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations"
88
}
89
object TransformFeatures extends BDGCommandCompanion {
90
  val commandName = "transformFeatures"
91
  val commandDescription = "Convert a file with sequence features into corresponding ADAM format and vice versa"
92
}
93
object TransformGenotypes extends BDGCommandCompanion {
94
  val commandName = "transformGenotypes"
95
  val commandDescription = "Convert a file with genotypes into corresponding ADAM format and vice versa"
96
}
97
object TransformVariants extends BDGCommandCompanion {
98
  val commandName = "transformVariants"
99
  val commandDescription = "Convert a file with variants into corresponding ADAM format and vice versa"
100
}
101
object TransformFragments extends BDGCommandCompanion {
102
  val commandName = "transformFragments"
103
  val commandDescription = "Convert alignment records into fragment records"
104
}
105

106
// Utilities
107
object MergeShards extends BDGCommandCompanion {
108
  val commandName = "mergeShards"
109
  val commandDescription = "Merges the shards of a file"
110
}
111
```
112

113
[Genomic Data Processing](./genomic-processing.md)
114

115
### Format Conversion
116

117
Comprehensive format conversion utilities for transforming between various genomic file formats and ADAM's optimized Parquet format.
118

119
```scala { .api }
120
// FASTA conversions
121
object Fasta2ADAM extends BDGCommandCompanion {
122
  val commandName = "fasta2adam"
123
  val commandDescription = "Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences"
124
}
125
object ADAM2Fasta extends BDGCommandCompanion {
126
  val commandName = "adam2fasta"
127
  val commandDescription = "Convert ADAM nucleotide contig fragments to FASTA files"
128
}
129

130
// FASTQ conversions
131
object ADAM2Fastq extends BDGCommandCompanion {
132
  val commandName = "adam2fastq"
133
  val commandDescription = "Convert BAM to FASTQ files"
134
}
135
```
136

137
[Format Conversion](./format-conversion.md)
138

139
### Data Inspection and Analysis
140

141
Tools for viewing, analyzing, and generating statistics from genomic datasets, providing samtools-like functionality with distributed processing capabilities.
142

143
```scala { .api }
144
// Data viewing and filtering
145
object View extends BDGCommandCompanion {
146
  val commandName = "view"
147
  val commandDescription = "View certain reads from an alignment-record file."
148
}
149
object PrintADAM extends BDGCommandCompanion {
150
  val commandName = "print" 
151
  val commandDescription = "Print an ADAM formatted file"
152
}
153

154
// Statistics and analysis
155
object FlagStat extends BDGCommandCompanion {
156
  val commandName = "flagstat"
157
  val commandDescription = "Print statistics on reads in an ADAM file (similar to samtools flagstat)"
158
}
159
```
160

161
[Data Inspection](./data-inspection.md)
162

163
## Common Types and Patterns
164

165
### Command Pattern
166

167
All ADAM CLI commands follow a consistent architectural pattern:
168

169
```scala { .api }
170
// Command companion object
171
trait BDGCommandCompanion {
172
  val commandName: String
173
  val commandDescription: String
174
  def apply(cmdLine: Array[String]): BDGCommand
175
}
176

177
// Command arguments base class
178
class Args4jBase extends Logging with Serializable {
179
  @Args4jOption(required = false, name = "-print_metrics", usage = "Print metrics to the log on completion")
180
  var printMetrics = false
181
}
182

183
// Common argument mixins
184
trait ParquetArgs {
185
  @Args4jOption(required = false, name = "-parquet_compression", usage = "Parquet compression codec")
186
  var compressionCodec: String = "GZIP"
187
  
188
  @Args4jOption(required = false, name = "-parquet_block_size", usage = "Parquet block size (default: 128mb)")
189
  var blockSize: Int = 128 * 1024 * 1024
190
  
191
  @Args4jOption(required = false, name = "-parquet_page_size", usage = "Parquet page size (default: 1mb)")
192
  var pageSize: Int = 1024 * 1024
193
}
194

195
trait ParquetSaveArgs extends ParquetArgs {
196
  @Args4jOption(required = false, name = "-disable_dictionary", usage = "Disable dictionary encoding")
197
  var disableDictionaryEncoding = false
198
}
199

200
trait ADAMSaveAnyArgs {
201
  @Args4jOption(required = false, name = "-single", usage = "Save as single file")
202
  var asSingleFile = false
203
  
204
  @Args4jOption(required = false, name = "-defer", usage = "Defer merging single file")
205
  var deferMerging = false
206
  
207
  @Args4jOption(required = false, name = "-disable_fast_concat", usage = "Disable fast concatenation")
208
  var disableFastConcat = false
209
}
210

211
// Command execution
212
abstract class BDGSparkCommand[T <: Args4jBase] extends BDGCommand[T] {
213
  val companion: BDGCommandCompanion
214
  def run(sc: SparkContext): Unit
215
}
216
```
217

218
### Validation Stringency
219

220
```scala { .api }
221
// Validation levels for input parsing
222
type ValidationStringency = htsjdk.samtools.ValidationStringency
223
// Values: STRICT, LENIENT, SILENT
224
```
225

226
### Common Arguments
227

228
Most commands support these common arguments:
229

230
- **Input/Output Paths**: File system paths for source and destination data
231
- **Partitioning**: Control over data partitioning for performance optimization  
232
- **Validation**: Stringency levels for input data validation
233
- **Storage**: Spark storage levels for intermediate data caching
234
- **Format Options**: Parquet-specific configuration options
235

236
## Version Information
237

238
```scala { .api }
239
class About {
240
  def artifactId(): String
241
  def buildTimestamp(): String  
242
  def commit(): String
243
  def hadoopVersion(): String
244
  def scalaVersion(): String
245
  def sparkVersion(): String
246
  def version(): String
247
  def isSnapshot(): Boolean
248
}
249
```
250

251
## Error Handling
252

253
ADAM CLI commands use standard exit codes and provide comprehensive error messages:
254

255
- **Exit Code 0**: Successful execution
256
- **Exit Code 1**: General errors (invalid arguments, file not found, etc.)
257
- **Spark Exceptions**: Distributed processing errors with full stack traces
258
- **Validation Errors**: Input data validation failures with detailed reports

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/