Java/Python API wrappers for ADAM genomics analysis library enabling scalable genomic data processing with Apache Spark
—
ADAM APIs provides Dataset-based converters that parallel the RDD converter functionality but work with Spark SQL Datasets for better performance and SQL integration. These converters enable type-safe transformations while leveraging Catalyst query optimization.
Foundation traits that define the interface for Dataset-based genomic data conversions.
/**
* Base trait for conversions to contig fragment datasets
* @param <T> Source record type
* @param <U> Source genomic dataset type
*/
interface ToContigDatasetConversion<T extends Product, U extends GenomicDataset<?, T, U>>
extends GenomicDatasetConversion<T, U, NucleotideContigFragment, NucleotideContigFragmentRDD> {
TypeTag<NucleotideContigFragment> xTag = typeTag[NucleotideContigFragment];
}
/**
* Base trait for conversions to coverage datasets
*/
interface ToCoverageDatasetConversion<T extends Product, U extends GenomicDataset<?, T, U>>
extends GenomicDatasetConversion<T, U, Coverage, CoverageRDD> {
TypeTag<Coverage> xTag = typeTag[Coverage];
}
/**
* Base trait for conversions to feature datasets
*/
interface ToFeatureDatasetConversion<T extends Product, U extends GenomicDataset<?, T, U>>
extends GenomicDatasetConversion<T, U, Feature, FeatureRDD> {
TypeTag<Feature> xTag = typeTag[Feature];
}
/**
* Base trait for conversions to fragment datasets
*/
interface ToFragmentDatasetConversion<T extends Product, U extends GenomicDataset<?, T, U>>
extends GenomicDatasetConversion<T, U, Fragment, FragmentRDD> {
TypeTag<Fragment> xTag = typeTag[Fragment];
}
/**
* Base trait for conversions to alignment record datasets
*/
interface ToAlignmentRecordDatasetConversion<T extends Product, U extends GenomicDataset<?, T, U>>
extends GenomicDatasetConversion<T, U, AlignmentRecord, AlignmentRecordRDD> {
TypeTag<AlignmentRecord> xTag = typeTag[AlignmentRecord];
}
/**
* Base trait for conversions to genotype datasets
*/
interface ToGenotypeDatasetConversion<T extends Product, U extends GenomicDataset<?, T, U>>
extends GenomicDatasetConversion<T, U, Genotype, GenotypeRDD> {
TypeTag<Genotype> xTag = typeTag[Genotype];
}
/**
* Base trait for conversions to variant datasets
*/
interface ToVariantDatasetConversion<T extends Product, U extends GenomicDataset<?, T, U>>
extends GenomicDatasetConversion<T, U, Variant, VariantRDD> {
TypeTag<Variant> xTag = typeTag[Variant];
}Note: VariantContext dataset conversions are not currently supported in the Dataset converter API. Use RDD converters for VariantContext transformations.
Convert nucleotide contig fragments using Dataset operations for better SQL integration.
/**
* Convert NucleotideContigFragmentRDD with Coverage Dataset to CoverageRDD
*/
class ContigsToCoverageDatasetConverter extends ToCoverageDatasetConversion<NucleotideContigFragment, NucleotideContigFragmentRDD> {
/**
* Perform the dataset-based conversion
* @param v1 Source NucleotideContigFragmentRDD with metadata
* @param v2 Target Dataset[Coverage] with structured data
* @return CoverageRDD with combined metadata and data
*/
CoverageRDD call(NucleotideContigFragmentRDD v1, Dataset<Coverage> v2);
}
/**
* Convert NucleotideContigFragmentRDD with Feature Dataset to FeatureRDD
*/
class ContigsToFeaturesDatasetConverter extends ToFeatureDatasetConversion<NucleotideContigFragment, NucleotideContigFragmentRDD> {
FeatureRDD call(NucleotideContigFragmentRDD v1, Dataset<Feature> v2);
}
/**
* Convert NucleotideContigFragmentRDD with Fragment Dataset to FragmentRDD
*/
class ContigsToFragmentsDatasetConverter extends ToFragmentDatasetConversion<NucleotideContigFragment, NucleotideContigFragmentRDD> {
FragmentRDD call(NucleotideContigFragmentRDD v1, Dataset<Fragment> v2);
}
/**
* Convert NucleotideContigFragmentRDD with AlignmentRecord Dataset to AlignmentRecordRDD
*/
class ContigsToAlignmentRecordsDatasetConverter extends ToAlignmentRecordDatasetConversion<NucleotideContigFragment, NucleotideContigFragmentRDD> {
AlignmentRecordRDD call(NucleotideContigFragmentRDD v1, Dataset<AlignmentRecord> v2);
}
/**
* Convert NucleotideContigFragmentRDD with Genotype Dataset to GenotypeRDD
*/
class ContigsToGenotypesDatasetConverter extends ToGenotypeDatasetConversion<NucleotideContigFragment, NucleotideContigFragmentRDD> {
GenotypeRDD call(NucleotideContigFragmentRDD v1, Dataset<Genotype> v2);
}
/**
* Convert NucleotideContigFragmentRDD with Variant Dataset to VariantRDD
*/
class ContigsToVariantsDatasetConverter extends ToVariantDatasetConversion<NucleotideContigFragment, NucleotideContigFragmentRDD> {
VariantRDD call(NucleotideContigFragmentRDD v1, Dataset<Variant> v2);
}Convert coverage data using Dataset operations for optimized query execution.
/**
* Convert CoverageRDD with NucleotideContigFragment Dataset to NucleotideContigFragmentRDD
*/
class CoverageToContigsDatasetConverter extends ToContigDatasetConversion<Coverage, CoverageRDD> {
NucleotideContigFragmentRDD call(CoverageRDD v1, Dataset<NucleotideContigFragment> v2);
}
/**
* Convert CoverageRDD with Feature Dataset to FeatureRDD
*/
class CoverageToFeaturesDatasetConverter extends ToFeatureDatasetConversion<Coverage, CoverageRDD> {
FeatureRDD call(CoverageRDD v1, Dataset<Feature> v2);
}
/**
* Convert CoverageRDD with Fragment Dataset to FragmentRDD
*/
class CoverageToFragmentsDatasetConverter extends ToFragmentDatasetConversion<Coverage, CoverageRDD> {
FragmentRDD call(CoverageRDD v1, Dataset<Fragment> v2);
}
/**
* Convert CoverageRDD with AlignmentRecord Dataset to AlignmentRecordRDD
*/
class CoverageToAlignmentRecordsDatasetConverter extends ToAlignmentRecordDatasetConversion<Coverage, CoverageRDD> {
AlignmentRecordRDD call(CoverageRDD v1, Dataset<AlignmentRecord> v2);
}
/**
* Convert CoverageRDD with Genotype Dataset to GenotypeRDD
*/
class CoverageToGenotypesDatasetConverter extends ToGenotypeDatasetConversion<Coverage, CoverageRDD> {
GenotypeRDD call(CoverageRDD v1, Dataset<Genotype> v2);
}
/**
* Convert CoverageRDD with Variant Dataset to VariantRDD
*/
class CoverageToVariantsDatasetConverter extends ToVariantDatasetConversion<Coverage, CoverageRDD> {
VariantRDD call(CoverageRDD v1, Dataset<Variant> v2);
}Convert genomic feature data using Dataset operations for SQL compatibility.
/**
* Convert FeatureRDD with NucleotideContigFragment Dataset to NucleotideContigFragmentRDD
*/
class FeaturesToContigsDatasetConverter extends ToContigDatasetConversion<Feature, FeatureRDD> {
NucleotideContigFragmentRDD call(FeatureRDD v1, Dataset<NucleotideContigFragment> v2);
}
/**
* Convert FeatureRDD with Coverage Dataset to CoverageRDD
*/
class FeaturesToCoverageDatasetConverter extends ToCoverageDatasetConversion<Feature, FeatureRDD> {
CoverageRDD call(FeatureRDD v1, Dataset<Coverage> v2);
}
/**
* Convert FeatureRDD with Fragment Dataset to FragmentRDD
*/
class FeaturesToFragmentsDatasetConverter extends ToFragmentDatasetConversion<Feature, FeatureRDD> {
FragmentRDD call(FeatureRDD v1, Dataset<Fragment> v2);
}
/**
* Convert FeatureRDD with AlignmentRecord Dataset to AlignmentRecordRDD
*/
class FeaturesToAlignmentRecordsDatasetConverter extends ToAlignmentRecordDatasetConversion<Feature, FeatureRDD> {
AlignmentRecordRDD call(FeatureRDD v1, Dataset<AlignmentRecord> v2);
}
/**
* Convert FeatureRDD with Genotype Dataset to GenotypeRDD
*/
class FeaturesToGenotypesDatasetConverter extends ToGenotypeDatasetConversion<Feature, FeatureRDD> {
GenotypeRDD call(FeatureRDD v1, Dataset<Genotype> v2);
}
/**
* Convert FeatureRDD with Variant Dataset to VariantRDD
*/
class FeaturesToVariantsDatasetConverter extends ToVariantDatasetConversion<Feature, FeatureRDD> {
VariantRDD call(FeatureRDD v1, Dataset<Variant> v2);
}Convert sequencing fragment data using Dataset operations for enhanced performance.
/**
* Convert FragmentRDD with NucleotideContigFragment Dataset to NucleotideContigFragmentRDD
*/
class FragmentsToContigsDatasetConverter extends ToContigDatasetConversion<Fragment, FragmentRDD> {
NucleotideContigFragmentRDD call(FragmentRDD v1, Dataset<NucleotideContigFragment> v2);
}
/**
* Convert FragmentRDD with Coverage Dataset to CoverageRDD
*/
class FragmentsToCoverageDatasetConverter extends ToCoverageDatasetConversion<Fragment, FragmentRDD> {
CoverageRDD call(FragmentRDD v1, Dataset<Coverage> v2);
}
/**
* Convert FragmentRDD with Feature Dataset to FeatureRDD
*/
class FragmentsToFeaturesDatasetConverter extends ToFeatureDatasetConversion<Fragment, FragmentRDD> {
FeatureRDD call(FragmentRDD v1, Dataset<Feature> v2);
}
/**
* Convert FragmentRDD with AlignmentRecord Dataset to AlignmentRecordRDD
*/
class FragmentsToAlignmentRecordsDatasetConverter extends ToAlignmentRecordDatasetConversion<Fragment, FragmentRDD> {
AlignmentRecordRDD call(FragmentRDD v1, Dataset<AlignmentRecord> v2);
}
/**
* Convert FragmentRDD with Genotype Dataset to GenotypeRDD
*/
class FragmentsToGenotypesDatasetConverter extends ToGenotypeDatasetConversion<Fragment, FragmentRDD> {
GenotypeRDD call(FragmentRDD v1, Dataset<Genotype> v2);
}
/**
* Convert FragmentRDD with Variant Dataset to VariantRDD
*/
class FragmentsToVariantsDatasetConverter extends ToVariantDatasetConversion<Fragment, FragmentRDD> {
VariantRDD call(FragmentRDD v1, Dataset<Variant> v2);
}Convert alignment record data using Dataset operations with Catalyst optimization.
/**
* Convert AlignmentRecordRDD with NucleotideContigFragment Dataset to NucleotideContigFragmentRDD
*/
class AlignmentRecordsToContigsDatasetConverter extends ToContigDatasetConversion<AlignmentRecord, AlignmentRecordRDD> {
NucleotideContigFragmentRDD call(AlignmentRecordRDD v1, Dataset<NucleotideContigFragment> v2);
}
/**
* Convert AlignmentRecordRDD with Coverage Dataset to CoverageRDD
*/
class AlignmentRecordsToCoverageDatasetConverter extends ToCoverageDatasetConversion<AlignmentRecord, AlignmentRecordRDD> {
CoverageRDD call(AlignmentRecordRDD v1, Dataset<Coverage> v2);
}
/**
* Convert AlignmentRecordRDD with Feature Dataset to FeatureRDD
*/
class AlignmentRecordsToFeaturesDatasetConverter extends ToFeatureDatasetConversion<AlignmentRecord, AlignmentRecordRDD> {
FeatureRDD call(AlignmentRecordRDD v1, Dataset<Feature> v2);
}
/**
* Convert AlignmentRecordRDD with Fragment Dataset to FragmentRDD
*/
class AlignmentRecordsToFragmentsDatasetConverter extends ToFragmentDatasetConversion<AlignmentRecord, AlignmentRecordRDD> {
FragmentRDD call(AlignmentRecordRDD v1, Dataset<Fragment> v2);
}
/**
* Convert AlignmentRecordRDD with Genotype Dataset to GenotypeRDD
*/
class AlignmentRecordsToGenotypesDatasetConverter extends ToGenotypeDatasetConversion<AlignmentRecord, AlignmentRecordRDD> {
GenotypeRDD call(AlignmentRecordRDD v1, Dataset<Genotype> v2);
}
/**
* Convert AlignmentRecordRDD with Variant Dataset to VariantRDD
*/
class AlignmentRecordsToVariantsDatasetConverter extends ToVariantDatasetConversion<AlignmentRecord, AlignmentRecordRDD> {
VariantRDD call(AlignmentRecordRDD v1, Dataset<Variant> v2);
}Convert genotype data using Dataset operations for optimized variant analysis.
/**
* Convert GenotypeRDD with NucleotideContigFragment Dataset to NucleotideContigFragmentRDD
*/
class GenotypesToContigsDatasetConverter extends ToContigDatasetConversion<Genotype, GenotypeRDD> {
NucleotideContigFragmentRDD call(GenotypeRDD v1, Dataset<NucleotideContigFragment> v2);
}
/**
* Convert GenotypeRDD with Coverage Dataset to CoverageRDD
*/
class GenotypesToCoverageDatasetConverter extends ToCoverageDatasetConversion<Genotype, GenotypeRDD> {
CoverageRDD call(GenotypeRDD v1, Dataset<Coverage> v2);
}
/**
* Convert GenotypeRDD with Feature Dataset to FeatureRDD
*/
class GenotypesToFeaturesDatasetConverter extends ToFeatureDatasetConversion<Genotype, GenotypeRDD> {
FeatureRDD call(GenotypeRDD v1, Dataset<Feature> v2);
}
/**
* Convert GenotypeRDD with Fragment Dataset to FragmentRDD
*/
class GenotypesToFragmentsDatasetConverter extends ToFragmentDatasetConversion<Genotype, GenotypeRDD> {
FragmentRDD call(GenotypeRDD v1, Dataset<Fragment> v2);
}
/**
* Convert GenotypeRDD with AlignmentRecord Dataset to AlignmentRecordRDD
*/
class GenotypesToAlignmentRecordsDatasetConverter extends ToAlignmentRecordDatasetConversion<Genotype, GenotypeRDD> {
AlignmentRecordRDD call(GenotypeRDD v1, Dataset<AlignmentRecord> v2);
}
/**
* Convert GenotypeRDD with Variant Dataset to VariantRDD
*/
class GenotypesToVariantsDatasetConverter extends ToVariantDatasetConversion<Genotype, GenotypeRDD> {
VariantRDD call(GenotypeRDD v1, Dataset<Variant> v2);
}Convert variant data using Dataset operations for enhanced genomic analysis workflows.
/**
* Convert VariantRDD with NucleotideContigFragment Dataset to NucleotideContigFragmentRDD
*/
class VariantsToContigsDatasetConverter extends ToContigDatasetConversion<Variant, VariantRDD> {
NucleotideContigFragmentRDD call(VariantRDD v1, Dataset<NucleotideContigFragment> v2);
}
/**
* Convert VariantRDD with Coverage Dataset to CoverageRDD
*/
class VariantsToCoverageDatasetConverter extends ToCoverageDatasetConversion<Variant, VariantRDD> {
CoverageRDD call(VariantRDD v1, Dataset<Coverage> v2);
}
/**
* Convert VariantRDD with Feature Dataset to FeatureRDD
*/
class VariantsToFeaturesDatasetConverter extends ToFeatureDatasetConversion<Variant, VariantRDD> {
FeatureRDD call(VariantRDD v1, Dataset<Feature> v2);
}
/**
* Convert VariantRDD with Fragment Dataset to FragmentRDD
*/
class VariantsToFragmentsDatasetConverter extends ToFragmentDatasetConversion<Variant, VariantRDD> {
FragmentRDD call(VariantRDD v1, Dataset<Fragment> v2);
}
/**
* Convert VariantRDD with AlignmentRecord Dataset to AlignmentRecordRDD
*/
class VariantsToAlignmentRecordsDatasetConverter extends ToAlignmentRecordDatasetConversion<Variant, VariantRDD> {
AlignmentRecordRDD call(VariantRDD v1, Dataset<AlignmentRecord> v2);
}
/**
* Convert VariantRDD with Genotype Dataset to GenotypeRDD
*/
class VariantsToGenotypesDatasetConverter extends ToGenotypeDatasetConversion<Variant, VariantRDD> {
GenotypeRDD call(VariantRDD v1, Dataset<Genotype> v2);
}Basic Dataset conversion with SQL integration:
import org.bdgenomics.adam.api.java.*;
import org.apache.spark.sql.Dataset;
// Load genomic data
AlignmentRecordRDD alignments = jac.loadAlignments("sample.bam");
VariantRDD variants = jac.loadVariants("variants.vcf");
// Convert to Dataset for SQL operations
Dataset<Variant> variantDS = variants.dataset();
// Apply SQL transformations
Dataset<Variant> filteredVariants = variantDS
.filter("start > 1000000")
.filter("qual > 30.0");
// Convert back using Dataset converter
AlignmentRecordsToVariantsDatasetConverter converter =
new AlignmentRecordsToVariantsDatasetConverter();
VariantRDD convertedVariants = converter.call(alignments, filteredVariants);Performance-optimized Dataset operations:
// Load large genomic datasets
GenotypeRDD genotypes = jac.loadGenotypes("large_cohort.vcf");
FeatureRDD features = jac.loadFeatures("annotations.gtf");
// Convert to Datasets for Catalyst optimization
Dataset<Genotype> genotypeDS = genotypes.dataset();
Dataset<Feature> featureDS = features.dataset();
// Perform complex SQL-based analysis
Dataset<Feature> annotatedFeatures = featureDS
.join(genotypeDS, "contigName")
.where("genotype.variant.qual > 50")
.select("feature.*");
// Convert back with preserved metadata
GenotypesToFeaturesDatasetConverter converter =
new GenotypesToFeaturesDatasetConverter();
FeatureRDD result = converter.call(genotypes, annotatedFeatures);Combining RDD and Dataset operations:
// Start with RDD operations for complex logic
AlignmentRecordRDD alignments = jac.loadAlignments("sample.bam");
RDD<AlignmentRecord> filteredRDD = alignments.jrdd()
.filter(read -> read.getMapq() > 30 && read.getReadMapped());
// Convert to Dataset for SQL operations
Dataset<AlignmentRecord> alignmentDS = spark.createDataset(
filteredRDD, Encoders.bean(AlignmentRecord.class));
Dataset<Coverage> coverageDS = alignmentDS
.groupBy("contigName", "start")
.agg(count("*").as("count"))
.select(col("contigName"), col("start"), col("count").as("score"));
// Convert back to genomic RDD with metadata
AlignmentRecordsToCoverageDatasetConverter converter =
new AlignmentRecordsToCoverageDatasetConverter();
CoverageRDD coverage = converter.call(alignments, coverageDS);Install with Tessl CLI
npx tessl i tessl/maven-org-bdgenomics-adam--adam-apis-2-10