or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

dataset-conversions.mdindex.mdjava-api.mdpython-integration.mdrdd-conversions.md

index.mddocs/

0

# ADAM APIs

1

2

Java and Python API bindings for the ADAM genomics analysis library. This package provides language-friendly interfaces for accessing ADAM's distributed genomic data processing capabilities from Java and Python applications, enabling integration with existing bioinformatics workflows and data science pipelines.

3

4

## Package Information

5

6

- **Package Name**: adam-apis_2.11

7

- **Package Type**: maven

8

- **Language**: Scala (with Java and Python bindings)

9

- **Installation**: `<dependency><groupId>org.bdgenomics.adam</groupId><artifactId>adam-apis_2.11</artifactId><version>0.23.0</version></dependency>`

10

- **License**: Apache-2.0

11

12

## Core Imports

13

14

Java:

15

```java

16

import org.bdgenomics.adam.api.java.JavaADAMContext;

17

import htsjdk.samtools.ValidationStringency;

18

```

19

20

Scala:

21

```scala

22

import org.bdgenomics.adam.api.java.JavaADAMContext

23

import org.bdgenomics.adam.api.java.GenomicDatasetConverters._

24

import org.bdgenomics.adam.api.java.GenomicRDDConverters._

25

import org.bdgenomics.adam.api.python.DataFrameConversionWrapper

26

```

27

28

## Basic Usage

29

30

```java

31

import org.apache.spark.api.java.JavaSparkContext;

32

import org.bdgenomics.adam.api.java.JavaADAMContext;

33

import org.bdgenomics.adam.rdd.read.AlignmentRecordRDD;

34

import org.bdgenomics.adam.rdd.variant.VariantRDD;

35

36

// Initialize ADAM context from Spark context

37

JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());

38

JavaADAMContext jac = new JavaADAMContext(new ADAMContext(jsc.sc()));

39

40

// Load genomic alignment data

41

AlignmentRecordRDD alignments = jac.loadAlignments("input.bam");

42

43

// Load variant data

44

VariantRDD variants = jac.loadVariants("variants.vcf");

45

46

// Access the underlying Spark context

47

JavaSparkContext sparkContext = jac.getSparkContext();

48

```

49

50

## Architecture

51

52

The adam-apis package is built around several key components that provide different levels of API access:

53

54

- **JavaADAMContext**: Primary Java entry point providing high-level data loading functions for various genomic file formats

55

- **Dataset Converters**: Type-safe conversion utilities for transforming between different genomic dataset types in Spark DataFrames

56

- **RDD Converters**: Low-level conversion utilities for transforming between different genomic RDD types

57

- **Python Integration**: DataFrame wrapper classes enabling Python access to ADAM's conversion capabilities through PySpark

58

59

The package supports comprehensive genomic data types including alignments, variants, genotypes, features, coverage, fragments, and reference sequences, with automatic format detection based on file extensions and support for compressed formats.

60

61

## Capabilities

62

63

### Java API

64

65

Primary Java interface for loading and working with genomic data files. Provides high-level methods for reading common genomic formats including BAM/SAM/CRAM, VCF, FASTA, FASTQ, BED, GFF, and more.

66

67

```java { .api }

68

class JavaADAMContext {

69

JavaSparkContext getSparkContext();

70

AlignmentRecordRDD loadAlignments(String pathName);

71

AlignmentRecordRDD loadAlignments(String pathName, ValidationStringency stringency);

72

VariantRDD loadVariants(String pathName);

73

VariantRDD loadVariants(String pathName, ValidationStringency stringency);

74

GenotypeRDD loadGenotypes(String pathName);

75

FeatureRDD loadFeatures(String pathName);

76

CoverageRDD loadCoverage(String pathName);

77

FragmentRDD loadFragments(String pathName);

78

NucleotideContigFragmentRDD loadContigFragments(String pathName);

79

ReferenceFile loadReferenceFile(String pathName);

80

}

81

```

82

83

[Java API](./java-api.md)

84

85

### Dataset Conversions

86

87

Type-safe conversion system for transforming between different genomic dataset types using Spark DataFrames. Enables seamless interoperability between different genomic data formats within Spark workflows.

88

89

```scala { .api }

90

trait ToContigDatasetConversion[T, U]

91

trait ToCoverageDatasetConversion[T, U]

92

trait ToFeatureDatasetConversion[T, U]

93

trait ToFragmentDatasetConversion[T, U]

94

trait ToAlignmentRecordDatasetConversion[T, U]

95

trait ToGenotypeDatasetConversion[T, U]

96

trait ToVariantDatasetConversion[T, U]

97

```

98

99

[Dataset Conversions](./dataset-conversions.md)

100

101

### RDD Conversions

102

103

Low-level conversion utilities for transforming between different genomic RDD types. Provides fine-grained control over data transformations and supports all genomic data type combinations.

104

105

```scala { .api }

106

trait SameTypeConversion[T, U] {

107

def call(v1: RDD[T], v2: RDD[U]): RDD[U]

108

}

109

```

110

111

[RDD Conversions](./rdd-conversions.md)

112

113

### Python Integration

114

115

DataFrame wrapper functionality enabling Python access to ADAM's data conversion capabilities through PySpark integration.

116

117

```scala { .api }

118

class DataFrameConversionWrapper(newDf: DataFrame) extends JFunction[DataFrame, DataFrame] {

119

def call(v1: DataFrame): DataFrame

120

}

121

```

122

123

[Python Integration](./python-integration.md)

124

125

## Supported File Formats

126

127

- **Alignment formats**: BAM, SAM, CRAM, FASTA, FASTQ, interleaved FASTQ

128

- **Variant formats**: VCF (including compressed .vcf.gz, .vcf.bgzf, .vcf.bgz)

129

- **Feature formats**: BED (BED6/12), GFF3, GTF/GFF2, NarrowPeak, IntervalList

130

- **Reference formats**: FASTA, 2bit compressed format

131

- **Compression**: All formats support standard Hadoop compression codecs (.gz, .bz2)

132

133

## Genomic Data Types

134

135

```scala { .api }

136

// Core genomic data RDD types provided by ADAM

137

type AlignmentRecordRDD

138

type NucleotideContigFragmentRDD

139

type FragmentRDD

140

type FeatureRDD

141

type CoverageRDD

142

type GenotypeRDD

143

type VariantRDD

144

type VariantContextRDD

145

type ReferenceFile

146

```