0
# ADAM APIs
1
2
Java and Python API bindings for the ADAM genomics analysis library. This package provides language-friendly interfaces for accessing ADAM's distributed genomic data processing capabilities from Java and Python applications, enabling integration with existing bioinformatics workflows and data science pipelines.
3
4
## Package Information
5
6
- **Package Name**: adam-apis_2.11
7
- **Package Type**: maven
8
- **Language**: Scala (with Java and Python bindings)
9
- **Installation**: `<dependency><groupId>org.bdgenomics.adam</groupId><artifactId>adam-apis_2.11</artifactId><version>0.23.0</version></dependency>`
10
- **License**: Apache-2.0
11
12
## Core Imports
13
14
Java:
15
```java
16
import org.bdgenomics.adam.api.java.JavaADAMContext;
17
import htsjdk.samtools.ValidationStringency;
18
```
19
20
Scala:
21
```scala
22
import org.bdgenomics.adam.api.java.JavaADAMContext
23
import org.bdgenomics.adam.api.java.GenomicDatasetConverters._
24
import org.bdgenomics.adam.api.java.GenomicRDDConverters._
25
import org.bdgenomics.adam.api.python.DataFrameConversionWrapper
26
```
27
28
## Basic Usage
29
30
```java
31
import org.apache.spark.api.java.JavaSparkContext;
32
import org.bdgenomics.adam.api.java.JavaADAMContext;
33
import org.bdgenomics.adam.rdd.read.AlignmentRecordRDD;
34
import org.bdgenomics.adam.rdd.variant.VariantRDD;
35
36
// Initialize ADAM context from Spark context
37
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
38
JavaADAMContext jac = new JavaADAMContext(new ADAMContext(jsc.sc()));
39
40
// Load genomic alignment data
41
AlignmentRecordRDD alignments = jac.loadAlignments("input.bam");
42
43
// Load variant data
44
VariantRDD variants = jac.loadVariants("variants.vcf");
45
46
// Access the underlying Spark context
47
JavaSparkContext sparkContext = jac.getSparkContext();
48
```
49
50
## Architecture
51
52
The adam-apis package is built around several key components that provide different levels of API access:
53
54
- **JavaADAMContext**: Primary Java entry point providing high-level data loading functions for various genomic file formats
55
- **Dataset Converters**: Type-safe conversion utilities for transforming between different genomic dataset types in Spark DataFrames
56
- **RDD Converters**: Low-level conversion utilities for transforming between different genomic RDD types
57
- **Python Integration**: DataFrame wrapper classes enabling Python access to ADAM's conversion capabilities through PySpark
58
59
The package supports comprehensive genomic data types including alignments, variants, genotypes, features, coverage, fragments, and reference sequences, with automatic format detection based on file extensions and support for compressed formats.
60
61
## Capabilities
62
63
### Java API
64
65
Primary Java interface for loading and working with genomic data files. Provides high-level methods for reading common genomic formats including BAM/SAM/CRAM, VCF, FASTA, FASTQ, BED, GFF, and more.
66
67
```java { .api }
68
class JavaADAMContext {
69
JavaSparkContext getSparkContext();
70
AlignmentRecordRDD loadAlignments(String pathName);
71
AlignmentRecordRDD loadAlignments(String pathName, ValidationStringency stringency);
72
VariantRDD loadVariants(String pathName);
73
VariantRDD loadVariants(String pathName, ValidationStringency stringency);
74
GenotypeRDD loadGenotypes(String pathName);
75
FeatureRDD loadFeatures(String pathName);
76
CoverageRDD loadCoverage(String pathName);
77
FragmentRDD loadFragments(String pathName);
78
NucleotideContigFragmentRDD loadContigFragments(String pathName);
79
ReferenceFile loadReferenceFile(String pathName);
80
}
81
```
82
83
[Java API](./java-api.md)
84
85
### Dataset Conversions
86
87
Type-safe conversion system for transforming between different genomic dataset types using Spark DataFrames. Enables seamless interoperability between different genomic data formats within Spark workflows.
88
89
```scala { .api }
90
trait ToContigDatasetConversion[T, U]
91
trait ToCoverageDatasetConversion[T, U]
92
trait ToFeatureDatasetConversion[T, U]
93
trait ToFragmentDatasetConversion[T, U]
94
trait ToAlignmentRecordDatasetConversion[T, U]
95
trait ToGenotypeDatasetConversion[T, U]
96
trait ToVariantDatasetConversion[T, U]
97
```
98
99
[Dataset Conversions](./dataset-conversions.md)
100
101
### RDD Conversions
102
103
Low-level conversion utilities for transforming between different genomic RDD types. Provides fine-grained control over data transformations and supports all genomic data type combinations.
104
105
```scala { .api }
106
trait SameTypeConversion[T, U] {
107
def call(v1: RDD[T], v2: RDD[U]): RDD[U]
108
}
109
```
110
111
[RDD Conversions](./rdd-conversions.md)
112
113
### Python Integration
114
115
DataFrame wrapper functionality enabling Python access to ADAM's data conversion capabilities through PySpark integration.
116
117
```scala { .api }
118
class DataFrameConversionWrapper(newDf: DataFrame) extends JFunction[DataFrame, DataFrame] {
119
def call(v1: DataFrame): DataFrame
120
}
121
```
122
123
[Python Integration](./python-integration.md)
124
125
## Supported File Formats
126
127
- **Alignment formats**: BAM, SAM, CRAM, FASTA, FASTQ, interleaved FASTQ
128
- **Variant formats**: VCF (including compressed .vcf.gz, .vcf.bgzf, .vcf.bgz)
129
- **Feature formats**: BED (BED6/12), GFF3, GTF/GFF2, NarrowPeak, IntervalList
130
- **Reference formats**: FASTA, 2bit compressed format
131
- **Compression**: All formats support standard Hadoop compression codecs (.gz, .bz2)
132
133
## Genomic Data Types
134
135
```scala { .api }
136
// Core genomic data RDD types provided by ADAM
137
type AlignmentRecordRDD
138
type NucleotideContigFragmentRDD
139
type FragmentRDD
140
type FeatureRDD
141
type CoverageRDD
142
type GenotypeRDD
143
type VariantRDD
144
type VariantContextRDD
145
type ReferenceFile
146
```