0
# RDD Conversions
1
2
Low-level conversion utilities for transforming between different genomic RDD types. The GenomicRDDConverters module provides comprehensive RDD-based conversion capabilities between all combinations of genomic data types, offering fine-grained control over data transformations with support for same-type conversions and cross-type transformations.
3
4
## Capabilities
5
6
### Base Conversion Trait
7
8
Foundation trait for all RDD conversion operations.
9
10
```scala { .api }
11
/**
12
* Base trait for same-type RDD conversions between genomic data types.
13
*/
14
trait SameTypeConversion[T, U <: GenomicRDD[T, U]] extends Function2[U, RDD[T], U] {
15
/**
16
* Convert source RDD to target RDD type.
17
* @param v1 Source GenomicRDD providing schema and metadata
18
* @param v2 Target RDD[T] containing raw data to convert
19
* @return Converted GenomicRDD of target type with preserved metadata
20
*/
21
def call(v1: U, v2: RDD[T]): U
22
}
23
```
24
25
### Contig RDD Conversions
26
27
Convert between nucleotide contig fragments and other genomic data types.
28
29
```scala { .api }
30
/**
31
* Same-type conversion for nucleotide contig fragments.
32
*/
33
class ContigsToContigsConverter
34
extends SameTypeConversion[NucleotideContigFragment, NucleotideContigFragment]
35
36
/**
37
* Convert nucleotide contigs to coverage data.
38
*/
39
class ContigsToCoverageConverter
40
extends SameTypeConversion[NucleotideContigFragment, Coverage]
41
42
/**
43
* Convert nucleotide contigs to genomic features.
44
*/
45
class ContigsToFeaturesConverter
46
extends SameTypeConversion[NucleotideContigFragment, Feature]
47
48
/**
49
* Convert nucleotide contigs to sequencing fragments.
50
*/
51
class ContigsToFragmentsConverter
52
extends SameTypeConversion[NucleotideContigFragment, Fragment]
53
54
/**
55
* Convert nucleotide contigs to alignment records.
56
*/
57
class ContigsToAlignmentRecordsConverter
58
extends SameTypeConversion[NucleotideContigFragment, AlignmentRecord]
59
60
/**
61
* Convert nucleotide contigs to genotype data.
62
*/
63
class ContigsToGenotypesConverter
64
extends SameTypeConversion[NucleotideContigFragment, Genotype]
65
66
/**
67
* Convert nucleotide contigs to variant data.
68
*/
69
class ContigsToVariantsConverter
70
extends SameTypeConversion[NucleotideContigFragment, Variant]
71
72
/**
73
* Convert nucleotide contigs to variant context data.
74
*/
75
class ContigsToVariantContextsConverter
76
extends SameTypeConversion[NucleotideContigFragment, VariantContext]
77
```
78
79
### Coverage RDD Conversions
80
81
Convert between coverage data and other genomic data types.
82
83
```scala { .api }
84
/**
85
* Convert coverage data to nucleotide contigs.
86
*/
87
class CoverageToContigsConverter
88
extends SameTypeConversion[Coverage, NucleotideContigFragment]
89
90
/**
91
* Same-type conversion for coverage data.
92
*/
93
class CoverageToCoverageConverter
94
extends SameTypeConversion[Coverage, Coverage]
95
96
/**
97
* Convert coverage data to genomic features.
98
*/
99
class CoverageToFeaturesConverter
100
extends SameTypeConversion[Coverage, Feature]
101
102
/**
103
* Convert coverage data to sequencing fragments.
104
*/
105
class CoverageToFragmentsConverter
106
extends SameTypeConversion[Coverage, Fragment]
107
108
/**
109
* Convert coverage data to alignment records.
110
*/
111
class CoverageToAlignmentRecordsConverter
112
extends SameTypeConversion[Coverage, AlignmentRecord]
113
114
/**
115
* Convert coverage data to genotype data.
116
*/
117
class CoverageToGenotypesConverter
118
extends SameTypeConversion[Coverage, Genotype]
119
120
/**
121
* Convert coverage data to variant data.
122
*/
123
class CoverageToVariantsConverter
124
extends SameTypeConversion[Coverage, Variant]
125
126
/**
127
* Convert coverage data to variant context data.
128
*/
129
class CoverageToVariantContextConverter
130
extends SameTypeConversion[Coverage, VariantContext]
131
```
132
133
### Feature RDD Conversions
134
135
Convert between genomic features and other genomic data types.
136
137
```scala { .api }
138
/**
139
* Convert genomic features to nucleotide contigs.
140
*/
141
class FeaturesToContigsConverter
142
extends SameTypeConversion[Feature, NucleotideContigFragment]
143
144
/**
145
* Convert genomic features to coverage data.
146
*/
147
class FeaturesToCoverageConverter
148
extends SameTypeConversion[Feature, Coverage]
149
150
/**
151
* Same-type conversion for genomic features.
152
*/
153
class FeaturesToFeatureConverter
154
extends SameTypeConversion[Feature, Feature]
155
156
/**
157
* Convert genomic features to sequencing fragments.
158
*/
159
class FeaturesToFragmentsConverter
160
extends SameTypeConversion[Feature, Fragment]
161
162
/**
163
* Convert genomic features to alignment records.
164
*/
165
class FeaturesToAlignmentRecordsConverter
166
extends SameTypeConversion[Feature, AlignmentRecord]
167
168
/**
169
* Convert genomic features to genotype data.
170
*/
171
class FeaturesToGenotypesConverter
172
extends SameTypeConversion[Feature, Genotype]
173
174
/**
175
* Convert genomic features to variant data.
176
*/
177
class FeaturesToVariantsConverter
178
extends SameTypeConversion[Feature, Variant]
179
180
/**
181
* Convert genomic features to variant context data.
182
*/
183
class FeaturesToVariantContextConverter
184
extends SameTypeConversion[Feature, VariantContext]
185
```
186
187
### Fragment RDD Conversions
188
189
Convert between sequencing fragments and other genomic data types.
190
191
```scala { .api }
192
/**
193
* Convert sequencing fragments to nucleotide contigs.
194
*/
195
class FragmentsToContigsConverter
196
extends SameTypeConversion[Fragment, NucleotideContigFragment]
197
198
/**
199
* Convert sequencing fragments to coverage data.
200
*/
201
class FragmentsToCoverageConverter
202
extends SameTypeConversion[Fragment, Coverage]
203
204
/**
205
* Convert sequencing fragments to genomic features.
206
*/
207
class FragmentsToFeaturesConverter
208
extends SameTypeConversion[Fragment, Feature]
209
210
/**
211
* Same-type conversion for sequencing fragments.
212
*/
213
class FragmentsToFragmentConverter
214
extends SameTypeConversion[Fragment, Fragment]
215
216
/**
217
* Convert sequencing fragments to alignment records.
218
*/
219
class FragmentsToAlignmentRecordsConverter
220
extends SameTypeConversion[Fragment, AlignmentRecord]
221
222
/**
223
* Convert sequencing fragments to genotype data.
224
*/
225
class FragmentsToGenotypesConverter
226
extends SameTypeConversion[Fragment, Genotype]
227
228
/**
229
* Convert sequencing fragments to variant data.
230
*/
231
class FragmentsToVariantsConverter
232
extends SameTypeConversion[Fragment, Variant]
233
234
/**
235
* Convert sequencing fragments to variant context data.
236
*/
237
class FragmentsToVariantContextConverter
238
extends SameTypeConversion[Fragment, VariantContext]
239
```
240
241
### Alignment Record RDD Conversions
242
243
Convert between alignment records and other genomic data types.
244
245
```scala { .api }
246
/**
247
* Convert alignment records to nucleotide contigs.
248
*/
249
class AlignmentRecordsToContigsConverter
250
extends SameTypeConversion[AlignmentRecord, NucleotideContigFragment]
251
252
/**
253
* Convert alignment records to coverage data.
254
*/
255
class AlignmentRecordsToCoverageConverter
256
extends SameTypeConversion[AlignmentRecord, Coverage]
257
258
/**
259
* Convert alignment records to genomic features.
260
*/
261
class AlignmentRecordsToFeaturesConverter
262
extends SameTypeConversion[AlignmentRecord, Feature]
263
264
/**
265
* Convert alignment records to sequencing fragments.
266
*/
267
class AlignmentRecordsToFragmentsConverter
268
extends SameTypeConversion[AlignmentRecord, Fragment]
269
270
/**
271
* Same-type conversion for alignment records.
272
*/
273
class AlignmentRecordsToAlignmentRecordsConverter
274
extends SameTypeConversion[AlignmentRecord, AlignmentRecord]
275
276
/**
277
* Convert alignment records to genotype data.
278
*/
279
class AlignmentRecordsToGenotypesConverter
280
extends SameTypeConversion[AlignmentRecord, Genotype]
281
282
/**
283
* Convert alignment records to variant data.
284
*/
285
class AlignmentRecordsToVariantsConverter
286
extends SameTypeConversion[AlignmentRecord, Variant]
287
288
/**
289
* Convert alignment records to variant context data.
290
*/
291
class AlignmentRecordsToVariantContextConverter
292
extends SameTypeConversion[AlignmentRecord, VariantContext]
293
```
294
295
### Genotype RDD Conversions
296
297
Convert between genotype data and other genomic data types.
298
299
```scala { .api }
300
/**
301
* Convert genotype data to nucleotide contigs.
302
*/
303
class GenotypesToContigsConverter
304
extends SameTypeConversion[Genotype, NucleotideContigFragment]
305
306
/**
307
* Convert genotype data to coverage data.
308
*/
309
class GenotypesToCoverageConverter
310
extends SameTypeConversion[Genotype, Coverage]
311
312
/**
313
* Convert genotype data to genomic features.
314
*/
315
class GenotypesToFeaturesConverter
316
extends SameTypeConversion[Genotype, Feature]
317
318
/**
319
* Convert genotype data to sequencing fragments.
320
*/
321
class GenotypesToFragmentsConverter
322
extends SameTypeConversion[Genotype, Fragment]
323
324
/**
325
* Convert genotype data to alignment records.
326
*/
327
class GenotypesToAlignmentRecordsConverter
328
extends SameTypeConversion[Genotype, AlignmentRecord]
329
330
/**
331
* Same-type conversion for genotype data.
332
*/
333
class GenotypesToGenotypesConverter
334
extends SameTypeConversion[Genotype, Genotype]
335
336
/**
337
* Convert genotype data to variant data.
338
*/
339
class GenotypesToVariantsConverter
340
extends SameTypeConversion[Genotype, Variant]
341
342
/**
343
* Convert genotype data to variant context data.
344
*/
345
class GenotypesToVariantContextConverter
346
extends SameTypeConversion[Genotype, VariantContext]
347
```
348
349
### Variant RDD Conversions
350
351
Convert between variant data and other genomic data types.
352
353
```scala { .api }
354
/**
355
* Convert variant data to nucleotide contigs.
356
*/
357
class VariantsToContigsConverter
358
extends SameTypeConversion[Variant, NucleotideContigFragment]
359
360
/**
361
* Convert variant data to coverage data.
362
*/
363
class VariantsToCoverageConverter
364
extends SameTypeConversion[Variant, Coverage]
365
366
/**
367
* Convert variant data to genomic features.
368
*/
369
class VariantsToFeaturesConverter
370
extends SameTypeConversion[Variant, Feature]
371
372
/**
373
* Convert variant data to sequencing fragments.
374
*/
375
class VariantsToFragmentsConverter
376
extends SameTypeConversion[Variant, Fragment]
377
378
/**
379
* Convert variant data to alignment records.
380
*/
381
class VariantsToAlignmentRecordsConverter
382
extends SameTypeConversion[Variant, AlignmentRecord]
383
384
/**
385
* Convert variant data to genotype data.
386
*/
387
class VariantsToGenotypesConverter
388
extends SameTypeConversion[Variant, Genotype]
389
390
/**
391
* Same-type conversion for variant data.
392
*/
393
class VariantsToVariantsConverter
394
extends SameTypeConversion[Variant, Variant]
395
396
/**
397
* Convert variant data to variant context data.
398
*/
399
class VariantsToVariantContextConverter
400
extends SameTypeConversion[Variant, VariantContext]
401
```
402
403
### Variant Context RDD Conversions
404
405
Convert between variant context data and other genomic data types.
406
407
```scala { .api }
408
/**
409
* Convert variant context data to nucleotide contigs.
410
*/
411
class VariantContextsToContigsConverter
412
extends SameTypeConversion[VariantContext, NucleotideContigFragment]
413
414
/**
415
* Convert variant context data to coverage data.
416
*/
417
class VariantContextsToCoverageConverter
418
extends SameTypeConversion[VariantContext, Coverage]
419
420
/**
421
* Convert variant context data to genomic features.
422
*/
423
class VariantContextsToFeaturesConverter
424
extends SameTypeConversion[VariantContext, Feature]
425
426
/**
427
* Convert variant context data to sequencing fragments.
428
*/
429
class VariantContextsToFragmentsConverter
430
extends SameTypeConversion[VariantContext, Fragment]
431
432
/**
433
* Convert variant context data to alignment records.
434
*/
435
class VariantContextsToAlignmentRecordsConverter
436
extends SameTypeConversion[VariantContext, AlignmentRecord]
437
438
/**
439
* Convert variant context data to genotype data.
440
*/
441
class VariantContextsToGenotypesConverter
442
extends SameTypeConversion[VariantContext, Genotype]
443
444
/**
445
* Convert variant context data to variant data.
446
*/
447
class VariantContextsToVariantsConverter
448
extends SameTypeConversion[VariantContext, Variant]
449
450
/**
451
* Same-type conversion for variant context data.
452
*/
453
class VariantContextsToVariantContextConverter
454
extends SameTypeConversion[VariantContext, VariantContext]
455
```
456
457
## Usage Examples
458
459
```scala
460
import org.bdgenomics.adam.api.java.GenomicRDDConverters._
461
import org.apache.spark.rdd.RDD
462
463
// Convert alignment records to features using RDD
464
val alignments: AlignmentRecordRDD = jac.loadAlignments("input.bam")
465
val alignmentRDD: RDD[AlignmentRecord] = alignments.rdd
466
val emptyFeatureRDD: RDD[Feature] = spark.sparkContext.emptyRDD[Feature]
467
468
val converter = new AlignmentRecordsToFeaturesConverter()
469
val featureRDD: RDD[Feature] = converter.call(alignmentRDD, emptyFeatureRDD)
470
471
// Convert variants to coverage using RDD
472
val variants: VariantRDD = jac.loadVariants("variants.vcf")
473
val variantRDD: RDD[Variant] = variants.rdd
474
val emptyCoverageRDD: RDD[Coverage] = spark.sparkContext.emptyRDD[Coverage]
475
476
val coverageConverter = new VariantsToCoverageConverter()
477
val coverageRDD: RDD[Coverage] = coverageConverter.call(variantRDD, emptyCoverageRDD)
478
479
// Same-type conversion for data format transformation
480
val features: FeatureRDD = jac.loadFeatures("input.bed")
481
val featureRDD: RDD[Feature] = features.rdd
482
val emptyFeatureRDD2: RDD[Feature] = spark.sparkContext.emptyRDD[Feature]
483
484
val sameTypeConverter = new FeaturesToFeatureConverter()
485
val transformedFeatureRDD: RDD[Feature] = sameTypeConverter.call(featureRDD, emptyFeatureRDD2)
486
```
487
488
## RDD vs Dataset Conversions
489
490
**RDD Conversions** provide:
491
- **Low-level control**: Direct access to RDD operations and partitioning
492
- **Memory efficiency**: Fine-grained control over data serialization and caching
493
- **Custom partitioning**: Support for genomic-aware partitioning strategies
494
- **Legacy compatibility**: Integration with older Spark RDD-based workflows
495
496
**Dataset Conversions** provide:
497
- **Type safety**: Compile-time type checking with Spark's Catalyst optimizer
498
- **SQL integration**: Ability to use Spark SQL operations on genomic data
499
- **Performance optimization**: Automatic query optimization through Catalyst
500
- **Schema evolution**: Better handling of schema changes and compatibility
501
502
## Performance Considerations
503
504
- **Same-type conversions**: Primarily used for format transformation and optimization
505
- **Cross-type conversions**: May involve complex data transformations and schema mapping
506
- **Metadata preservation**: All conversions maintain genomic metadata (sequence dictionaries, record groups)
507
- **Partitioning**: RDD conversions preserve and optimize partition layouts for genomic data access patterns