0
# ADAM APIs
1
2
ADAM APIs provides Java and Python-friendly API wrappers for the ADAM (A Distributed Alignment Mapper) genomics analysis library. This module enables scalable genomic data processing using Apache Spark's distributed computing capabilities, offering convenient wrapper classes and converters that make ADAM's core functionality accessible to Java and Python developers.
3
4
## Package Information
5
6
- **Package Name**: adam-apis_2.10
7
- **Package Type**: maven
8
- **Language**: Scala (with Java API wrappers)
9
- **Installation**: Add to Maven dependencies:
10
```xml
11
<dependency>
12
<groupId>org.bdgenomics.adam</groupId>
13
<artifactId>adam-apis_2.10</artifactId>
14
<version>0.23.0</version>
15
</dependency>
16
```
17
18
## Core Imports
19
20
```java
21
import org.bdgenomics.adam.api.java.JavaADAMContext;
22
import org.bdgenomics.adam.rdd.ADAMContext;
23
import org.apache.spark.api.java.JavaSparkContext;
24
import htsjdk.samtools.ValidationStringency;
25
```
26
27
For genomic RDD types:
28
```java
29
import org.bdgenomics.adam.rdd.read.AlignmentRecordRDD;
30
import org.bdgenomics.adam.rdd.contig.NucleotideContigFragmentRDD;
31
import org.bdgenomics.adam.rdd.fragment.FragmentRDD;
32
import org.bdgenomics.adam.rdd.feature.FeatureRDD;
33
import org.bdgenomics.adam.rdd.feature.CoverageRDD;
34
import org.bdgenomics.adam.rdd.variant.GenotypeRDD;
35
import org.bdgenomics.adam.rdd.variant.VariantRDD;
36
import org.bdgenomics.adam.rdd.variant.VariantContextRDD;
37
import org.bdgenomics.adam.util.ReferenceFile;
38
```
39
40
For RDD/Dataset conversions:
41
```java
42
import org.bdgenomics.adam.api.java.*;
43
```
44
45
For Python API support:
46
```java
47
import org.bdgenomics.adam.api.python.DataFrameConversionWrapper;
48
```
49
50
## Basic Usage
51
52
```java
53
import org.apache.spark.SparkConf;
54
import org.apache.spark.api.java.JavaSparkContext;
55
import org.bdgenomics.adam.api.java.JavaADAMContext;
56
import org.bdgenomics.adam.rdd.ADAMContext;
57
import org.bdgenomics.adam.rdd.read.AlignmentRecordRDD;
58
59
// Create Spark context
60
SparkConf conf = new SparkConf().setAppName("ADAM API Example");
61
JavaSparkContext jsc = new JavaSparkContext(conf);
62
63
// Create ADAM context
64
ADAMContext ac = new ADAMContext(jsc.sc());
65
JavaADAMContext jac = new JavaADAMContext(ac);
66
67
// Load genomic data
68
AlignmentRecordRDD alignments = jac.loadAlignments("sample.bam");
69
System.out.println("Loaded " + alignments.jrdd().count() + " alignment records");
70
71
// Load other genomic data types
72
jac.loadVariants("variants.vcf");
73
jac.loadFeatures("annotations.bed");
74
jac.loadContigFragments("reference.fa");
75
```
76
77
## Architecture
78
79
ADAM APIs is built around several key components:
80
81
- **JavaADAMContext**: Main entry point providing Java-friendly methods for loading various genomic file formats
82
- **RDD Converter Classes**: Function2-based converters for transforming between different genomic RDD types
83
- **Dataset Converter Classes**: Similar converters but for Spark SQL Dataset operations
84
- **Python API Support**: DataFrame conversion wrappers for Python interoperability
85
- **Type Safety**: Full preservation of genomic data types and metadata through conversions
86
87
## Capabilities
88
89
### Genomic Data Loading
90
91
Core functionality for loading genomic data from various file formats into ADAM's specialized RDD types. Supports automatic format detection and validation.
92
93
```java { .api }
94
// Main context class
95
class JavaADAMContext {
96
JavaADAMContext(ADAMContext ac);
97
JavaSparkContext getSparkContext();
98
99
// Load alignment data (BAM/CRAM/SAM/FASTA/FASTQ)
100
AlignmentRecordRDD loadAlignments(String pathName);
101
AlignmentRecordRDD loadAlignments(String pathName, ValidationStringency stringency);
102
103
// Load reference sequences
104
NucleotideContigFragmentRDD loadContigFragments(String pathName);
105
ReferenceFile loadReferenceFile(String pathName);
106
ReferenceFile loadReferenceFile(String pathName, Long maximumLength);
107
108
// Load fragments (paired-end sequencing data)
109
FragmentRDD loadFragments(String pathName);
110
FragmentRDD loadFragments(String pathName, ValidationStringency stringency);
111
112
// Load genomic features (annotations)
113
FeatureRDD loadFeatures(String pathName);
114
FeatureRDD loadFeatures(String pathName, ValidationStringency stringency);
115
116
// Load coverage data
117
CoverageRDD loadCoverage(String pathName);
118
CoverageRDD loadCoverage(String pathName, ValidationStringency stringency);
119
120
// Load variant data
121
GenotypeRDD loadGenotypes(String pathName);
122
GenotypeRDD loadGenotypes(String pathName, ValidationStringency stringency);
123
VariantRDD loadVariants(String pathName);
124
VariantRDD loadVariants(String pathName, ValidationStringency stringency);
125
}
126
```
127
128
[Genomic Data Loading](./genomic-data-loading.md)
129
130
### RDD Type Conversions
131
132
Comprehensive set of converter classes for transforming between different genomic RDD types. Each converter implements Function2 interface for use in Spark transformations.
133
134
```java { .api }
135
// Base conversion interface
136
interface SameTypeConversion<T, U extends GenomicRDD<T, U>> extends Function2<U, RDD<T>, U> {
137
U call(U v1, RDD<T> v2);
138
}
139
140
// Example converter classes
141
class ContigsToAlignmentRecordsConverter extends Function2<NucleotideContigFragmentRDD, RDD<AlignmentRecord>, AlignmentRecordRDD>;
142
class AlignmentRecordsToVariantsConverter extends Function2<AlignmentRecordRDD, RDD<Variant>, VariantRDD>;
143
class VariantsToGenotypesConverter extends Function2<VariantRDD, RDD<Genotype>, GenotypeRDD>;
144
```
145
146
[RDD Conversions](./rdd-conversions.md)
147
148
### Dataset Type Conversions
149
150
Spark SQL Dataset-based converters providing similar functionality to RDD converters but with Dataset operations for better performance and SQL integration.
151
152
```java { .api }
153
// Base dataset conversion traits
154
interface ToAlignmentRecordDatasetConversion<T extends Product, U extends GenomicDataset<?, T, U>>
155
extends GenomicDatasetConversion<T, U, AlignmentRecord, AlignmentRecordRDD>;
156
157
// Example dataset converter classes
158
class ContigsToAlignmentRecordsDatasetConverter extends ToAlignmentRecordDatasetConversion<NucleotideContigFragment, NucleotideContigFragmentRDD>;
159
class VariantsToGenotypesDatasetConverter extends ToGenotypeDatasetConversion<Variant, VariantRDD>;
160
```
161
162
[Dataset Conversions](./dataset-conversions.md)
163
164
### Python API Support
165
166
Wrapper functionality enabling Python integration through DataFrame conversion utilities.
167
168
```java { .api }
169
class DataFrameConversionWrapper implements JFunction<DataFrame, DataFrame> {
170
DataFrameConversionWrapper(DataFrame newDf);
171
DataFrame call(DataFrame v1);
172
}
173
```
174
175
[Python Integration](./python-integration.md)
176
177
## Supported Genomic Data Types
178
179
- **AlignmentRecord**: Read alignments from sequencing data
180
- **NucleotideContigFragment**: Reference genome sequences
181
- **Fragment**: Paired-end sequencing fragments
182
- **Feature**: Genomic annotations and intervals
183
- **Coverage**: Coverage depth information
184
- **Genotype**: Sample genotype calls
185
- **Variant**: Genetic variations
186
- **VariantContext**: Rich variant information with samples and additional metadata
187
188
## Supported File Formats
189
190
- **Alignment formats**: BAM, CRAM, SAM, FASTA, FASTQ, interleaved FASTQ (.ifq)
191
- **Feature formats**: BED6/12, GFF3, GTF/GFF2, NarrowPeak, IntervalList
192
- **Variant formats**: VCF (including .vcf.gz, .vcf.bgzf, .vcf.bgz)
193
- **Reference formats**: FASTA, 2bit
194
- **Universal fallback**: Parquet + Avro for all data types
195
196
All formats support standard Hadoop compression codecs (.gz, .bz2) where applicable.
197
198
## Types
199
200
### Core RDD Types
201
202
```java { .api }
203
// Genomic RDD wrapper types with metadata preservation
204
interface GenomicRDD<T, U extends GenomicRDD<T, U>> {
205
RDD<T> jrdd();
206
// Additional metadata methods...
207
}
208
209
class AlignmentRecordRDD extends GenomicRDD<AlignmentRecord, AlignmentRecordRDD> {}
210
class NucleotideContigFragmentRDD extends GenomicRDD<NucleotideContigFragment, NucleotideContigFragmentRDD> {}
211
class FragmentRDD extends GenomicRDD<Fragment, FragmentRDD> {}
212
class FeatureRDD extends GenomicRDD<Feature, FeatureRDD> {}
213
class CoverageRDD extends GenomicRDD<Coverage, CoverageRDD> {}
214
class GenotypeRDD extends GenomicRDD<Genotype, GenotypeRDD> {}
215
class VariantRDD extends GenomicRDD<Variant, VariantRDD> {}
216
class VariantContextRDD extends GenomicRDD<VariantContext, VariantContextRDD> {}
217
```
218
219
### Validation Stringency
220
221
```java { .api }
222
// HTSJDK validation strictness control
223
enum ValidationStringency {
224
STRICT, // Fail on any format violations
225
LENIENT, // Warn on format issues but continue processing
226
SILENT // Ignore format violations silently
227
}
228
```
229
230
### Utility Types
231
232
```java { .api }
233
// Broadcastable reference sequences
234
class ReferenceFile {
235
// Methods for efficient reference lookups across cluster
236
}
237
238
// Spark integration types
239
class JavaSparkContext {
240
// Standard Spark Java API context
241
}
242
243
class DataFrame {
244
// Spark SQL DataFrame for Python integration
245
}
246
```