or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

dataset-conversions.mdindex.mdjava-api.mdpython-integration.mdrdd-conversions.md
tile.json

rdd-conversions.mddocs/

RDD Conversions

Low-level conversion utilities for transforming between different genomic RDD types. The GenomicRDDConverters module provides comprehensive RDD-based conversion capabilities between all combinations of genomic data types, offering fine-grained control over data transformations with support for same-type conversions and cross-type transformations.

Capabilities

Base Conversion Trait

Foundation trait for all RDD conversion operations.

/**
 * Base trait for same-type RDD conversions between genomic data types.
 */
trait SameTypeConversion[T, U <: GenomicRDD[T, U]] extends Function2[U, RDD[T], U] {
    /**
     * Convert source RDD to target RDD type.
     * @param v1 Source GenomicRDD providing schema and metadata
     * @param v2 Target RDD[T] containing raw data to convert
     * @return Converted GenomicRDD of target type with preserved metadata
     */
    def call(v1: U, v2: RDD[T]): U
}

Contig RDD Conversions

Convert between nucleotide contig fragments and other genomic data types.

/**
 * Same-type conversion for nucleotide contig fragments.
 */
class ContigsToContigsConverter 
    extends SameTypeConversion[NucleotideContigFragment, NucleotideContigFragment]

/**
 * Convert nucleotide contigs to coverage data.
 */
class ContigsToCoverageConverter 
    extends SameTypeConversion[NucleotideContigFragment, Coverage]

/**
 * Convert nucleotide contigs to genomic features.
 */
class ContigsToFeaturesConverter 
    extends SameTypeConversion[NucleotideContigFragment, Feature]

/**
 * Convert nucleotide contigs to sequencing fragments.
 */
class ContigsToFragmentsConverter 
    extends SameTypeConversion[NucleotideContigFragment, Fragment]

/**
 * Convert nucleotide contigs to alignment records.
 */
class ContigsToAlignmentRecordsConverter 
    extends SameTypeConversion[NucleotideContigFragment, AlignmentRecord]

/**
 * Convert nucleotide contigs to genotype data.
 */
class ContigsToGenotypesConverter 
    extends SameTypeConversion[NucleotideContigFragment, Genotype]

/**
 * Convert nucleotide contigs to variant data.
 */
class ContigsToVariantsConverter 
    extends SameTypeConversion[NucleotideContigFragment, Variant]

/**
 * Convert nucleotide contigs to variant context data.
 */
class ContigsToVariantContextsConverter 
    extends SameTypeConversion[NucleotideContigFragment, VariantContext]

Coverage RDD Conversions

Convert between coverage data and other genomic data types.

/**
 * Convert coverage data to nucleotide contigs.
 */
class CoverageToContigsConverter 
    extends SameTypeConversion[Coverage, NucleotideContigFragment]

/**
 * Same-type conversion for coverage data.
 */
class CoverageToCoverageConverter 
    extends SameTypeConversion[Coverage, Coverage]

/**
 * Convert coverage data to genomic features.
 */
class CoverageToFeaturesConverter 
    extends SameTypeConversion[Coverage, Feature]

/**
 * Convert coverage data to sequencing fragments.
 */
class CoverageToFragmentsConverter 
    extends SameTypeConversion[Coverage, Fragment]

/**
 * Convert coverage data to alignment records.
 */
class CoverageToAlignmentRecordsConverter 
    extends SameTypeConversion[Coverage, AlignmentRecord]

/**
 * Convert coverage data to genotype data.
 */
class CoverageToGenotypesConverter 
    extends SameTypeConversion[Coverage, Genotype]

/**
 * Convert coverage data to variant data.
 */
class CoverageToVariantsConverter 
    extends SameTypeConversion[Coverage, Variant]

/**
 * Convert coverage data to variant context data.
 */
class CoverageToVariantContextConverter 
    extends SameTypeConversion[Coverage, VariantContext]

Feature RDD Conversions

Convert between genomic features and other genomic data types.

/**
 * Convert genomic features to nucleotide contigs.
 */
class FeaturesToContigsConverter 
    extends SameTypeConversion[Feature, NucleotideContigFragment]

/**
 * Convert genomic features to coverage data.
 */
class FeaturesToCoverageConverter 
    extends SameTypeConversion[Feature, Coverage]

/**
 * Same-type conversion for genomic features.
 */
class FeaturesToFeatureConverter 
    extends SameTypeConversion[Feature, Feature]

/**
 * Convert genomic features to sequencing fragments.
 */
class FeaturesToFragmentsConverter 
    extends SameTypeConversion[Feature, Fragment]

/**
 * Convert genomic features to alignment records.
 */
class FeaturesToAlignmentRecordsConverter 
    extends SameTypeConversion[Feature, AlignmentRecord]

/**
 * Convert genomic features to genotype data.
 */
class FeaturesToGenotypesConverter 
    extends SameTypeConversion[Feature, Genotype]

/**
 * Convert genomic features to variant data.
 */
class FeaturesToVariantsConverter 
    extends SameTypeConversion[Feature, Variant]

/**
 * Convert genomic features to variant context data.
 */
class FeaturesToVariantContextConverter 
    extends SameTypeConversion[Feature, VariantContext]

Fragment RDD Conversions

Convert between sequencing fragments and other genomic data types.

/**
 * Convert sequencing fragments to nucleotide contigs.
 */
class FragmentsToContigsConverter 
    extends SameTypeConversion[Fragment, NucleotideContigFragment]

/**
 * Convert sequencing fragments to coverage data.
 */
class FragmentsToCoverageConverter 
    extends SameTypeConversion[Fragment, Coverage]

/**
 * Convert sequencing fragments to genomic features.
 */
class FragmentsToFeaturesConverter 
    extends SameTypeConversion[Fragment, Feature]

/**
 * Same-type conversion for sequencing fragments.
 */
class FragmentsToFragmentConverter 
    extends SameTypeConversion[Fragment, Fragment]

/**
 * Convert sequencing fragments to alignment records.
 */
class FragmentsToAlignmentRecordsConverter 
    extends SameTypeConversion[Fragment, AlignmentRecord]

/**
 * Convert sequencing fragments to genotype data.
 */
class FragmentsToGenotypesConverter 
    extends SameTypeConversion[Fragment, Genotype]

/**
 * Convert sequencing fragments to variant data.
 */
class FragmentsToVariantsConverter 
    extends SameTypeConversion[Fragment, Variant]

/**
 * Convert sequencing fragments to variant context data.
 */
class FragmentsToVariantContextConverter 
    extends SameTypeConversion[Fragment, VariantContext]

Alignment Record RDD Conversions

Convert between alignment records and other genomic data types.

/**
 * Convert alignment records to nucleotide contigs.
 */
class AlignmentRecordsToContigsConverter 
    extends SameTypeConversion[AlignmentRecord, NucleotideContigFragment]

/**
 * Convert alignment records to coverage data.
 */
class AlignmentRecordsToCoverageConverter 
    extends SameTypeConversion[AlignmentRecord, Coverage]

/**
 * Convert alignment records to genomic features.
 */
class AlignmentRecordsToFeaturesConverter 
    extends SameTypeConversion[AlignmentRecord, Feature]

/**
 * Convert alignment records to sequencing fragments.
 */
class AlignmentRecordsToFragmentsConverter 
    extends SameTypeConversion[AlignmentRecord, Fragment]

/**
 * Same-type conversion for alignment records.
 */
class AlignmentRecordsToAlignmentRecordsConverter 
    extends SameTypeConversion[AlignmentRecord, AlignmentRecord]

/**
 * Convert alignment records to genotype data.
 */
class AlignmentRecordsToGenotypesConverter 
    extends SameTypeConversion[AlignmentRecord, Genotype]

/**
 * Convert alignment records to variant data.
 */
class AlignmentRecordsToVariantsConverter 
    extends SameTypeConversion[AlignmentRecord, Variant]

/**
 * Convert alignment records to variant context data.
 */
class AlignmentRecordsToVariantContextConverter 
    extends SameTypeConversion[AlignmentRecord, VariantContext]

Genotype RDD Conversions

Convert between genotype data and other genomic data types.

/**
 * Convert genotype data to nucleotide contigs.
 */
class GenotypesToContigsConverter 
    extends SameTypeConversion[Genotype, NucleotideContigFragment]

/**
 * Convert genotype data to coverage data.
 */
class GenotypesToCoverageConverter 
    extends SameTypeConversion[Genotype, Coverage]

/**
 * Convert genotype data to genomic features.
 */
class GenotypesToFeaturesConverter 
    extends SameTypeConversion[Genotype, Feature]

/**
 * Convert genotype data to sequencing fragments.
 */
class GenotypesToFragmentsConverter 
    extends SameTypeConversion[Genotype, Fragment]

/**
 * Convert genotype data to alignment records.
 */
class GenotypesToAlignmentRecordsConverter 
    extends SameTypeConversion[Genotype, AlignmentRecord]

/**
 * Same-type conversion for genotype data.
 */
class GenotypesToGenotypesConverter 
    extends SameTypeConversion[Genotype, Genotype]

/**
 * Convert genotype data to variant data.
 */
class GenotypesToVariantsConverter 
    extends SameTypeConversion[Genotype, Variant]

/**
 * Convert genotype data to variant context data.
 */
class GenotypesToVariantContextConverter 
    extends SameTypeConversion[Genotype, VariantContext]

Variant RDD Conversions

Convert between variant data and other genomic data types.

/**
 * Convert variant data to nucleotide contigs.
 */
class VariantsToContigsConverter 
    extends SameTypeConversion[Variant, NucleotideContigFragment]

/**
 * Convert variant data to coverage data.
 */
class VariantsToCoverageConverter 
    extends SameTypeConversion[Variant, Coverage]

/**
 * Convert variant data to genomic features.
 */
class VariantsToFeaturesConverter 
    extends SameTypeConversion[Variant, Feature]

/**
 * Convert variant data to sequencing fragments.
 */
class VariantsToFragmentsConverter 
    extends SameTypeConversion[Variant, Fragment]

/**
 * Convert variant data to alignment records.
 */
class VariantsToAlignmentRecordsConverter 
    extends SameTypeConversion[Variant, AlignmentRecord]

/**
 * Convert variant data to genotype data.
 */
class VariantsToGenotypesConverter 
    extends SameTypeConversion[Variant, Genotype]

/**
 * Same-type conversion for variant data.
 */
class VariantsToVariantsConverter 
    extends SameTypeConversion[Variant, Variant]

/**
 * Convert variant data to variant context data.
 */
class VariantsToVariantContextConverter 
    extends SameTypeConversion[Variant, VariantContext]

Variant Context RDD Conversions

Convert between variant context data and other genomic data types.

/**
 * Convert variant context data to nucleotide contigs.
 */
class VariantContextsToContigsConverter 
    extends SameTypeConversion[VariantContext, NucleotideContigFragment]

/**
 * Convert variant context data to coverage data.
 */
class VariantContextsToCoverageConverter 
    extends SameTypeConversion[VariantContext, Coverage]

/**
 * Convert variant context data to genomic features.
 */
class VariantContextsToFeaturesConverter 
    extends SameTypeConversion[VariantContext, Feature]

/**
 * Convert variant context data to sequencing fragments.
 */
class VariantContextsToFragmentsConverter 
    extends SameTypeConversion[VariantContext, Fragment]

/**
 * Convert variant context data to alignment records.
 */
class VariantContextsToAlignmentRecordsConverter 
    extends SameTypeConversion[VariantContext, AlignmentRecord]

/**
 * Convert variant context data to genotype data.
 */
class VariantContextsToGenotypesConverter 
    extends SameTypeConversion[VariantContext, Genotype]

/**
 * Convert variant context data to variant data.
 */
class VariantContextsToVariantsConverter 
    extends SameTypeConversion[VariantContext, Variant]

/**
 * Same-type conversion for variant context data.
 */
class VariantContextsToVariantContextConverter 
    extends SameTypeConversion[VariantContext, VariantContext]

Usage Examples

import org.bdgenomics.adam.api.java.GenomicRDDConverters._
import org.apache.spark.rdd.RDD

// Convert alignment records to features using RDD
val alignments: AlignmentRecordRDD = jac.loadAlignments("input.bam")
val alignmentRDD: RDD[AlignmentRecord] = alignments.rdd
val emptyFeatureRDD: RDD[Feature] = spark.sparkContext.emptyRDD[Feature]

val converter = new AlignmentRecordsToFeaturesConverter()
val featureRDD: RDD[Feature] = converter.call(alignmentRDD, emptyFeatureRDD)

// Convert variants to coverage using RDD
val variants: VariantRDD = jac.loadVariants("variants.vcf")
val variantRDD: RDD[Variant] = variants.rdd
val emptyCoverageRDD: RDD[Coverage] = spark.sparkContext.emptyRDD[Coverage]

val coverageConverter = new VariantsToCoverageConverter()
val coverageRDD: RDD[Coverage] = coverageConverter.call(variantRDD, emptyCoverageRDD)

// Same-type conversion for data format transformation
val features: FeatureRDD = jac.loadFeatures("input.bed")
val featureRDD: RDD[Feature] = features.rdd
val emptyFeatureRDD2: RDD[Feature] = spark.sparkContext.emptyRDD[Feature]

val sameTypeConverter = new FeaturesToFeatureConverter()
val transformedFeatureRDD: RDD[Feature] = sameTypeConverter.call(featureRDD, emptyFeatureRDD2)

RDD vs Dataset Conversions

RDD Conversions provide:

  • Low-level control: Direct access to RDD operations and partitioning
  • Memory efficiency: Fine-grained control over data serialization and caching
  • Custom partitioning: Support for genomic-aware partitioning strategies
  • Legacy compatibility: Integration with older Spark RDD-based workflows

Dataset Conversions provide:

  • Type safety: Compile-time type checking with Spark's Catalyst optimizer
  • SQL integration: Ability to use Spark SQL operations on genomic data
  • Performance optimization: Automatic query optimization through Catalyst
  • Schema evolution: Better handling of schema changes and compatibility

Performance Considerations

  • Same-type conversions: Primarily used for format transformation and optimization
  • Cross-type conversions: May involve complex data transformations and schema mapping
  • Metadata preservation: All conversions maintain genomic metadata (sequence dictionaries, record groups)
  • Partitioning: RDD conversions preserve and optimize partition layouts for genomic data access patterns