or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-inspection.mdformat-conversion.mdgenomic-processing.mdindex.md

index.mddocs/

0

# ADAM CLI

1

2

ADAM CLI is a command-line interface for genomic data analysis using Apache Spark. It provides distributed processing capabilities for various genomic file formats including SAM/BAM/CRAM, BED/GFF3/GTF, VCF, and FASTA/FASTQ, with optimized Parquet columnar storage for improved performance and scalability.

3

4

## Package Information

5

6

- **Package Name**: adam-cli-spark2_2.10

7

- **Package Type**: maven

8

- **Language**: Scala (with Java support)

9

- **Group ID**: org.bdgenomics.adam

10

- **Version**: 0.23.0

11

- **Installation**:

12

```xml

13

<dependency>

14

<groupId>org.bdgenomics.adam</groupId>

15

<artifactId>adam-cli-spark2_2.10</artifactId>

16

<version>0.23.0</version>

17

</dependency>

18

```

19

Or download precompiled distribution from GitHub releases

20

21

## Core Usage

22

23

ADAM CLI is executed through the `adam-submit` script, which wraps Spark submission:

24

25

```bash

26

# Basic command structure

27

adam-submit [<spark-args> --] <command> [<command-args>]

28

29

# Example: Transform BAM to ADAM format

30

adam-submit transformAlignments input.bam output.adam

31

32

# Example with Spark arguments

33

adam-submit --master local[4] --driver-memory 8g -- transformAlignments input.bam output.adam

34

```

35

36

## Architecture

37

38

ADAM CLI is organized around several key architectural components:

39

40

- **Command System**: Modular command structure with 15 specialized tools organized into 3 functional groups

41

- **Spark Integration**: Built-in Apache Spark integration for distributed processing across clusters

42

- **Format Support**: Comprehensive support for genomic file formats with intelligent format detection

43

- **Parquet Optimization**: Columnar storage format for improved query performance and compression

44

- **Streaming Processing**: Ability to process large datasets that exceed single-node memory capacity

45

46

## Main Entry Point

47

48

```scala { .api }

49

object ADAMMain {

50

def main(args: Array[String]): Unit

51

val defaultCommandGroups: List[CommandGroup]

52

}

53

54

class ADAMMain @Inject() (commandGroups: List[CommandGroup]) extends Logging {

55

def apply(args: Array[String]): Unit

56

}

57

58

case class CommandGroup(name: String, commands: List[BDGCommandCompanion])

59

```

60

61

## Capabilities

62

63

### Genomic Data Processing

64

65

Core genomic data analysis operations including k-mer counting, coverage analysis, alignment transformations, and multi-format data processing.

66

67

```scala { .api }

68

// K-mer analysis

69

object CountReadKmers extends BDGCommandCompanion {

70

val commandName = "countKmers"

71

val commandDescription = "Counts the k-mers/q-mers from a read dataset."

72

}

73

object CountContigKmers extends BDGCommandCompanion {

74

val commandName = "countContigKmers"

75

val commandDescription = "Counts the k-mers/q-mers from a read dataset."

76

}

77

78

// Coverage analysis

79

object Reads2Coverage extends BDGCommandCompanion {

80

val commandName = "reads2coverage"

81

val commandDescription = "Calculate the coverage from a given ADAM file"

82

}

83

84

// Data transformations

85

object TransformAlignments extends BDGCommandCompanion {

86

val commandName = "transformAlignments"

87

val commandDescription = "Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations"

88

}

89

object TransformFeatures extends BDGCommandCompanion {

90

val commandName = "transformFeatures"

91

val commandDescription = "Convert a file with sequence features into corresponding ADAM format and vice versa"

92

}

93

object TransformGenotypes extends BDGCommandCompanion {

94

val commandName = "transformGenotypes"

95

val commandDescription = "Convert a file with genotypes into corresponding ADAM format and vice versa"

96

}

97

object TransformVariants extends BDGCommandCompanion {

98

val commandName = "transformVariants"

99

val commandDescription = "Convert a file with variants into corresponding ADAM format and vice versa"

100

}

101

object TransformFragments extends BDGCommandCompanion {

102

val commandName = "transformFragments"

103

val commandDescription = "Convert alignment records into fragment records"

104

}

105

106

// Utilities

107

object MergeShards extends BDGCommandCompanion {

108

val commandName = "mergeShards"

109

val commandDescription = "Merges the shards of a file"

110

}

111

```

112

113

[Genomic Data Processing](./genomic-processing.md)

114

115

### Format Conversion

116

117

Comprehensive format conversion utilities for transforming between various genomic file formats and ADAM's optimized Parquet format.

118

119

```scala { .api }

120

// FASTA conversions

121

object Fasta2ADAM extends BDGCommandCompanion {

122

val commandName = "fasta2adam"

123

val commandDescription = "Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences"

124

}

125

object ADAM2Fasta extends BDGCommandCompanion {

126

val commandName = "adam2fasta"

127

val commandDescription = "Convert ADAM nucleotide contig fragments to FASTA files"

128

}

129

130

// FASTQ conversions

131

object ADAM2Fastq extends BDGCommandCompanion {

132

val commandName = "adam2fastq"

133

val commandDescription = "Convert BAM to FASTQ files"

134

}

135

```

136

137

[Format Conversion](./format-conversion.md)

138

139

### Data Inspection and Analysis

140

141

Tools for viewing, analyzing, and generating statistics from genomic datasets, providing samtools-like functionality with distributed processing capabilities.

142

143

```scala { .api }

144

// Data viewing and filtering

145

object View extends BDGCommandCompanion {

146

val commandName = "view"

147

val commandDescription = "View certain reads from an alignment-record file."

148

}

149

object PrintADAM extends BDGCommandCompanion {

150

val commandName = "print"

151

val commandDescription = "Print an ADAM formatted file"

152

}

153

154

// Statistics and analysis

155

object FlagStat extends BDGCommandCompanion {

156

val commandName = "flagstat"

157

val commandDescription = "Print statistics on reads in an ADAM file (similar to samtools flagstat)"

158

}

159

```

160

161

[Data Inspection](./data-inspection.md)

162

163

## Common Types and Patterns

164

165

### Command Pattern

166

167

All ADAM CLI commands follow a consistent architectural pattern:

168

169

```scala { .api }

170

// Command companion object

171

trait BDGCommandCompanion {

172

val commandName: String

173

val commandDescription: String

174

def apply(cmdLine: Array[String]): BDGCommand

175

}

176

177

// Command arguments base class

178

class Args4jBase extends Logging with Serializable {

179

@Args4jOption(required = false, name = "-print_metrics", usage = "Print metrics to the log on completion")

180

var printMetrics = false

181

}

182

183

// Common argument mixins

184

trait ParquetArgs {

185

@Args4jOption(required = false, name = "-parquet_compression", usage = "Parquet compression codec")

186

var compressionCodec: String = "GZIP"

187

188

@Args4jOption(required = false, name = "-parquet_block_size", usage = "Parquet block size (default: 128mb)")

189

var blockSize: Int = 128 * 1024 * 1024

190

191

@Args4jOption(required = false, name = "-parquet_page_size", usage = "Parquet page size (default: 1mb)")

192

var pageSize: Int = 1024 * 1024

193

}

194

195

trait ParquetSaveArgs extends ParquetArgs {

196

@Args4jOption(required = false, name = "-disable_dictionary", usage = "Disable dictionary encoding")

197

var disableDictionaryEncoding = false

198

}

199

200

trait ADAMSaveAnyArgs {

201

@Args4jOption(required = false, name = "-single", usage = "Save as single file")

202

var asSingleFile = false

203

204

@Args4jOption(required = false, name = "-defer", usage = "Defer merging single file")

205

var deferMerging = false

206

207

@Args4jOption(required = false, name = "-disable_fast_concat", usage = "Disable fast concatenation")

208

var disableFastConcat = false

209

}

210

211

// Command execution

212

abstract class BDGSparkCommand[T <: Args4jBase] extends BDGCommand[T] {

213

val companion: BDGCommandCompanion

214

def run(sc: SparkContext): Unit

215

}

216

```

217

218

### Validation Stringency

219

220

```scala { .api }

221

// Validation levels for input parsing

222

type ValidationStringency = htsjdk.samtools.ValidationStringency

223

// Values: STRICT, LENIENT, SILENT

224

```

225

226

### Common Arguments

227

228

Most commands support these common arguments:

229

230

- **Input/Output Paths**: File system paths for source and destination data

231

- **Partitioning**: Control over data partitioning for performance optimization

232

- **Validation**: Stringency levels for input data validation

233

- **Storage**: Spark storage levels for intermediate data caching

234

- **Format Options**: Parquet-specific configuration options

235

236

## Version Information

237

238

```scala { .api }

239

class About {

240

def artifactId(): String

241

def buildTimestamp(): String

242

def commit(): String

243

def hadoopVersion(): String

244

def scalaVersion(): String

245

def sparkVersion(): String

246

def version(): String

247

def isSnapshot(): Boolean

248

}

249

```

250

251

## Error Handling

252

253

ADAM CLI commands use standard exit codes and provide comprehensive error messages:

254

255

- **Exit Code 0**: Successful execution

256

- **Exit Code 1**: General errors (invalid arguments, file not found, etc.)

257

- **Spark Exceptions**: Distributed processing errors with full stack traces

258

- **Validation Errors**: Input data validation failures with detailed reports