or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

dataset-conversions.mdindex.mdjava-api.mdpython-integration.mdrdd-conversions.md
tile.json

tessl/maven-org-bdgenomics-adam--adam-apis_2-11

Java and Python API bindings for ADAM genomics analysis library providing language-friendly interfaces for distributed genomic data processing.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.bdgenomics.adam/adam-apis_2.11@0.23.x

To install, run

npx @tessl/cli install tessl/maven-org-bdgenomics-adam--adam-apis_2-11@0.23.0

index.mddocs/

ADAM APIs

Java and Python API bindings for the ADAM genomics analysis library. This package provides language-friendly interfaces for accessing ADAM's distributed genomic data processing capabilities from Java and Python applications, enabling integration with existing bioinformatics workflows and data science pipelines.

Package Information

  • Package Name: adam-apis_2.11
  • Package Type: maven
  • Language: Scala (with Java and Python bindings)
  • Installation: <dependency><groupId>org.bdgenomics.adam</groupId><artifactId>adam-apis_2.11</artifactId><version>0.23.0</version></dependency>
  • License: Apache-2.0

Core Imports

Java:

import org.bdgenomics.adam.api.java.JavaADAMContext;
import htsjdk.samtools.ValidationStringency;

Scala:

import org.bdgenomics.adam.api.java.JavaADAMContext
import org.bdgenomics.adam.api.java.GenomicDatasetConverters._
import org.bdgenomics.adam.api.java.GenomicRDDConverters._
import org.bdgenomics.adam.api.python.DataFrameConversionWrapper

Basic Usage

import org.apache.spark.api.java.JavaSparkContext;
import org.bdgenomics.adam.api.java.JavaADAMContext;
import org.bdgenomics.adam.rdd.read.AlignmentRecordRDD;
import org.bdgenomics.adam.rdd.variant.VariantRDD;

// Initialize ADAM context from Spark context
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
JavaADAMContext jac = new JavaADAMContext(new ADAMContext(jsc.sc()));

// Load genomic alignment data
AlignmentRecordRDD alignments = jac.loadAlignments("input.bam");

// Load variant data
VariantRDD variants = jac.loadVariants("variants.vcf");

// Access the underlying Spark context
JavaSparkContext sparkContext = jac.getSparkContext();

Architecture

The adam-apis package is built around several key components that provide different levels of API access:

  • JavaADAMContext: Primary Java entry point providing high-level data loading functions for various genomic file formats
  • Dataset Converters: Type-safe conversion utilities for transforming between different genomic dataset types in Spark DataFrames
  • RDD Converters: Low-level conversion utilities for transforming between different genomic RDD types
  • Python Integration: DataFrame wrapper classes enabling Python access to ADAM's conversion capabilities through PySpark

The package supports comprehensive genomic data types including alignments, variants, genotypes, features, coverage, fragments, and reference sequences, with automatic format detection based on file extensions and support for compressed formats.

Capabilities

Java API

Primary Java interface for loading and working with genomic data files. Provides high-level methods for reading common genomic formats including BAM/SAM/CRAM, VCF, FASTA, FASTQ, BED, GFF, and more.

class JavaADAMContext {
    JavaSparkContext getSparkContext();
    AlignmentRecordRDD loadAlignments(String pathName);
    AlignmentRecordRDD loadAlignments(String pathName, ValidationStringency stringency);
    VariantRDD loadVariants(String pathName);
    VariantRDD loadVariants(String pathName, ValidationStringency stringency);
    GenotypeRDD loadGenotypes(String pathName);
    FeatureRDD loadFeatures(String pathName);
    CoverageRDD loadCoverage(String pathName);
    FragmentRDD loadFragments(String pathName);
    NucleotideContigFragmentRDD loadContigFragments(String pathName);
    ReferenceFile loadReferenceFile(String pathName);
}

Java API

Dataset Conversions

Type-safe conversion system for transforming between different genomic dataset types using Spark DataFrames. Enables seamless interoperability between different genomic data formats within Spark workflows.

trait ToContigDatasetConversion[T, U]
trait ToCoverageDatasetConversion[T, U]  
trait ToFeatureDatasetConversion[T, U]
trait ToFragmentDatasetConversion[T, U]
trait ToAlignmentRecordDatasetConversion[T, U]
trait ToGenotypeDatasetConversion[T, U]
trait ToVariantDatasetConversion[T, U]

Dataset Conversions

RDD Conversions

Low-level conversion utilities for transforming between different genomic RDD types. Provides fine-grained control over data transformations and supports all genomic data type combinations.

trait SameTypeConversion[T, U] {
    def call(v1: RDD[T], v2: RDD[U]): RDD[U]
}

RDD Conversions

Python Integration

DataFrame wrapper functionality enabling Python access to ADAM's data conversion capabilities through PySpark integration.

class DataFrameConversionWrapper(newDf: DataFrame) extends JFunction[DataFrame, DataFrame] {
    def call(v1: DataFrame): DataFrame
}

Python Integration

Supported File Formats

  • Alignment formats: BAM, SAM, CRAM, FASTA, FASTQ, interleaved FASTQ
  • Variant formats: VCF (including compressed .vcf.gz, .vcf.bgzf, .vcf.bgz)
  • Feature formats: BED (BED6/12), GFF3, GTF/GFF2, NarrowPeak, IntervalList
  • Reference formats: FASTA, 2bit compressed format
  • Compression: All formats support standard Hadoop compression codecs (.gz, .bz2)

Genomic Data Types

// Core genomic data RDD types provided by ADAM
type AlignmentRecordRDD
type NucleotideContigFragmentRDD  
type FragmentRDD
type FeatureRDD
type CoverageRDD
type GenotypeRDD
type VariantRDD
type VariantContextRDD
type ReferenceFile