or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

dataset-conversions.mdindex.mdjava-api.mdpython-integration.mdrdd-conversions.md

rdd-conversions.mddocs/

0

# RDD Conversions

1

2

Low-level conversion utilities for transforming between different genomic RDD types. The GenomicRDDConverters module provides comprehensive RDD-based conversion capabilities between all combinations of genomic data types, offering fine-grained control over data transformations with support for same-type conversions and cross-type transformations.

3

4

## Capabilities

5

6

### Base Conversion Trait

7

8

Foundation trait for all RDD conversion operations.

9

10

```scala { .api }

11

/**

12

* Base trait for same-type RDD conversions between genomic data types.

13

*/

14

trait SameTypeConversion[T, U <: GenomicRDD[T, U]] extends Function2[U, RDD[T], U] {

15

/**

16

* Convert source RDD to target RDD type.

17

* @param v1 Source GenomicRDD providing schema and metadata

18

* @param v2 Target RDD[T] containing raw data to convert

19

* @return Converted GenomicRDD of target type with preserved metadata

20

*/

21

def call(v1: U, v2: RDD[T]): U

22

}

23

```

24

25

### Contig RDD Conversions

26

27

Convert between nucleotide contig fragments and other genomic data types.

28

29

```scala { .api }

30

/**

31

* Same-type conversion for nucleotide contig fragments.

32

*/

33

class ContigsToContigsConverter

34

extends SameTypeConversion[NucleotideContigFragment, NucleotideContigFragment]

35

36

/**

37

* Convert nucleotide contigs to coverage data.

38

*/

39

class ContigsToCoverageConverter

40

extends SameTypeConversion[NucleotideContigFragment, Coverage]

41

42

/**

43

* Convert nucleotide contigs to genomic features.

44

*/

45

class ContigsToFeaturesConverter

46

extends SameTypeConversion[NucleotideContigFragment, Feature]

47

48

/**

49

* Convert nucleotide contigs to sequencing fragments.

50

*/

51

class ContigsToFragmentsConverter

52

extends SameTypeConversion[NucleotideContigFragment, Fragment]

53

54

/**

55

* Convert nucleotide contigs to alignment records.

56

*/

57

class ContigsToAlignmentRecordsConverter

58

extends SameTypeConversion[NucleotideContigFragment, AlignmentRecord]

59

60

/**

61

* Convert nucleotide contigs to genotype data.

62

*/

63

class ContigsToGenotypesConverter

64

extends SameTypeConversion[NucleotideContigFragment, Genotype]

65

66

/**

67

* Convert nucleotide contigs to variant data.

68

*/

69

class ContigsToVariantsConverter

70

extends SameTypeConversion[NucleotideContigFragment, Variant]

71

72

/**

73

* Convert nucleotide contigs to variant context data.

74

*/

75

class ContigsToVariantContextsConverter

76

extends SameTypeConversion[NucleotideContigFragment, VariantContext]

77

```

78

79

### Coverage RDD Conversions

80

81

Convert between coverage data and other genomic data types.

82

83

```scala { .api }

84

/**

85

* Convert coverage data to nucleotide contigs.

86

*/

87

class CoverageToContigsConverter

88

extends SameTypeConversion[Coverage, NucleotideContigFragment]

89

90

/**

91

* Same-type conversion for coverage data.

92

*/

93

class CoverageToCoverageConverter

94

extends SameTypeConversion[Coverage, Coverage]

95

96

/**

97

* Convert coverage data to genomic features.

98

*/

99

class CoverageToFeaturesConverter

100

extends SameTypeConversion[Coverage, Feature]

101

102

/**

103

* Convert coverage data to sequencing fragments.

104

*/

105

class CoverageToFragmentsConverter

106

extends SameTypeConversion[Coverage, Fragment]

107

108

/**

109

* Convert coverage data to alignment records.

110

*/

111

class CoverageToAlignmentRecordsConverter

112

extends SameTypeConversion[Coverage, AlignmentRecord]

113

114

/**

115

* Convert coverage data to genotype data.

116

*/

117

class CoverageToGenotypesConverter

118

extends SameTypeConversion[Coverage, Genotype]

119

120

/**

121

* Convert coverage data to variant data.

122

*/

123

class CoverageToVariantsConverter

124

extends SameTypeConversion[Coverage, Variant]

125

126

/**

127

* Convert coverage data to variant context data.

128

*/

129

class CoverageToVariantContextConverter

130

extends SameTypeConversion[Coverage, VariantContext]

131

```

132

133

### Feature RDD Conversions

134

135

Convert between genomic features and other genomic data types.

136

137

```scala { .api }

138

/**

139

* Convert genomic features to nucleotide contigs.

140

*/

141

class FeaturesToContigsConverter

142

extends SameTypeConversion[Feature, NucleotideContigFragment]

143

144

/**

145

* Convert genomic features to coverage data.

146

*/

147

class FeaturesToCoverageConverter

148

extends SameTypeConversion[Feature, Coverage]

149

150

/**

151

* Same-type conversion for genomic features.

152

*/

153

class FeaturesToFeatureConverter

154

extends SameTypeConversion[Feature, Feature]

155

156

/**

157

* Convert genomic features to sequencing fragments.

158

*/

159

class FeaturesToFragmentsConverter

160

extends SameTypeConversion[Feature, Fragment]

161

162

/**

163

* Convert genomic features to alignment records.

164

*/

165

class FeaturesToAlignmentRecordsConverter

166

extends SameTypeConversion[Feature, AlignmentRecord]

167

168

/**

169

* Convert genomic features to genotype data.

170

*/

171

class FeaturesToGenotypesConverter

172

extends SameTypeConversion[Feature, Genotype]

173

174

/**

175

* Convert genomic features to variant data.

176

*/

177

class FeaturesToVariantsConverter

178

extends SameTypeConversion[Feature, Variant]

179

180

/**

181

* Convert genomic features to variant context data.

182

*/

183

class FeaturesToVariantContextConverter

184

extends SameTypeConversion[Feature, VariantContext]

185

```

186

187

### Fragment RDD Conversions

188

189

Convert between sequencing fragments and other genomic data types.

190

191

```scala { .api }

192

/**

193

* Convert sequencing fragments to nucleotide contigs.

194

*/

195

class FragmentsToContigsConverter

196

extends SameTypeConversion[Fragment, NucleotideContigFragment]

197

198

/**

199

* Convert sequencing fragments to coverage data.

200

*/

201

class FragmentsToCoverageConverter

202

extends SameTypeConversion[Fragment, Coverage]

203

204

/**

205

* Convert sequencing fragments to genomic features.

206

*/

207

class FragmentsToFeaturesConverter

208

extends SameTypeConversion[Fragment, Feature]

209

210

/**

211

* Same-type conversion for sequencing fragments.

212

*/

213

class FragmentsToFragmentConverter

214

extends SameTypeConversion[Fragment, Fragment]

215

216

/**

217

* Convert sequencing fragments to alignment records.

218

*/

219

class FragmentsToAlignmentRecordsConverter

220

extends SameTypeConversion[Fragment, AlignmentRecord]

221

222

/**

223

* Convert sequencing fragments to genotype data.

224

*/

225

class FragmentsToGenotypesConverter

226

extends SameTypeConversion[Fragment, Genotype]

227

228

/**

229

* Convert sequencing fragments to variant data.

230

*/

231

class FragmentsToVariantsConverter

232

extends SameTypeConversion[Fragment, Variant]

233

234

/**

235

* Convert sequencing fragments to variant context data.

236

*/

237

class FragmentsToVariantContextConverter

238

extends SameTypeConversion[Fragment, VariantContext]

239

```

240

241

### Alignment Record RDD Conversions

242

243

Convert between alignment records and other genomic data types.

244

245

```scala { .api }

246

/**

247

* Convert alignment records to nucleotide contigs.

248

*/

249

class AlignmentRecordsToContigsConverter

250

extends SameTypeConversion[AlignmentRecord, NucleotideContigFragment]

251

252

/**

253

* Convert alignment records to coverage data.

254

*/

255

class AlignmentRecordsToCoverageConverter

256

extends SameTypeConversion[AlignmentRecord, Coverage]

257

258

/**

259

* Convert alignment records to genomic features.

260

*/

261

class AlignmentRecordsToFeaturesConverter

262

extends SameTypeConversion[AlignmentRecord, Feature]

263

264

/**

265

* Convert alignment records to sequencing fragments.

266

*/

267

class AlignmentRecordsToFragmentsConverter

268

extends SameTypeConversion[AlignmentRecord, Fragment]

269

270

/**

271

* Same-type conversion for alignment records.

272

*/

273

class AlignmentRecordsToAlignmentRecordsConverter

274

extends SameTypeConversion[AlignmentRecord, AlignmentRecord]

275

276

/**

277

* Convert alignment records to genotype data.

278

*/

279

class AlignmentRecordsToGenotypesConverter

280

extends SameTypeConversion[AlignmentRecord, Genotype]

281

282

/**

283

* Convert alignment records to variant data.

284

*/

285

class AlignmentRecordsToVariantsConverter

286

extends SameTypeConversion[AlignmentRecord, Variant]

287

288

/**

289

* Convert alignment records to variant context data.

290

*/

291

class AlignmentRecordsToVariantContextConverter

292

extends SameTypeConversion[AlignmentRecord, VariantContext]

293

```

294

295

### Genotype RDD Conversions

296

297

Convert between genotype data and other genomic data types.

298

299

```scala { .api }

300

/**

301

* Convert genotype data to nucleotide contigs.

302

*/

303

class GenotypesToContigsConverter

304

extends SameTypeConversion[Genotype, NucleotideContigFragment]

305

306

/**

307

* Convert genotype data to coverage data.

308

*/

309

class GenotypesToCoverageConverter

310

extends SameTypeConversion[Genotype, Coverage]

311

312

/**

313

* Convert genotype data to genomic features.

314

*/

315

class GenotypesToFeaturesConverter

316

extends SameTypeConversion[Genotype, Feature]

317

318

/**

319

* Convert genotype data to sequencing fragments.

320

*/

321

class GenotypesToFragmentsConverter

322

extends SameTypeConversion[Genotype, Fragment]

323

324

/**

325

* Convert genotype data to alignment records.

326

*/

327

class GenotypesToAlignmentRecordsConverter

328

extends SameTypeConversion[Genotype, AlignmentRecord]

329

330

/**

331

* Same-type conversion for genotype data.

332

*/

333

class GenotypesToGenotypesConverter

334

extends SameTypeConversion[Genotype, Genotype]

335

336

/**

337

* Convert genotype data to variant data.

338

*/

339

class GenotypesToVariantsConverter

340

extends SameTypeConversion[Genotype, Variant]

341

342

/**

343

* Convert genotype data to variant context data.

344

*/

345

class GenotypesToVariantContextConverter

346

extends SameTypeConversion[Genotype, VariantContext]

347

```

348

349

### Variant RDD Conversions

350

351

Convert between variant data and other genomic data types.

352

353

```scala { .api }

354

/**

355

* Convert variant data to nucleotide contigs.

356

*/

357

class VariantsToContigsConverter

358

extends SameTypeConversion[Variant, NucleotideContigFragment]

359

360

/**

361

* Convert variant data to coverage data.

362

*/

363

class VariantsToCoverageConverter

364

extends SameTypeConversion[Variant, Coverage]

365

366

/**

367

* Convert variant data to genomic features.

368

*/

369

class VariantsToFeaturesConverter

370

extends SameTypeConversion[Variant, Feature]

371

372

/**

373

* Convert variant data to sequencing fragments.

374

*/

375

class VariantsToFragmentsConverter

376

extends SameTypeConversion[Variant, Fragment]

377

378

/**

379

* Convert variant data to alignment records.

380

*/

381

class VariantsToAlignmentRecordsConverter

382

extends SameTypeConversion[Variant, AlignmentRecord]

383

384

/**

385

* Convert variant data to genotype data.

386

*/

387

class VariantsToGenotypesConverter

388

extends SameTypeConversion[Variant, Genotype]

389

390

/**

391

* Same-type conversion for variant data.

392

*/

393

class VariantsToVariantsConverter

394

extends SameTypeConversion[Variant, Variant]

395

396

/**

397

* Convert variant data to variant context data.

398

*/

399

class VariantsToVariantContextConverter

400

extends SameTypeConversion[Variant, VariantContext]

401

```

402

403

### Variant Context RDD Conversions

404

405

Convert between variant context data and other genomic data types.

406

407

```scala { .api }

408

/**

409

* Convert variant context data to nucleotide contigs.

410

*/

411

class VariantContextsToContigsConverter

412

extends SameTypeConversion[VariantContext, NucleotideContigFragment]

413

414

/**

415

* Convert variant context data to coverage data.

416

*/

417

class VariantContextsToCoverageConverter

418

extends SameTypeConversion[VariantContext, Coverage]

419

420

/**

421

* Convert variant context data to genomic features.

422

*/

423

class VariantContextsToFeaturesConverter

424

extends SameTypeConversion[VariantContext, Feature]

425

426

/**

427

* Convert variant context data to sequencing fragments.

428

*/

429

class VariantContextsToFragmentsConverter

430

extends SameTypeConversion[VariantContext, Fragment]

431

432

/**

433

* Convert variant context data to alignment records.

434

*/

435

class VariantContextsToAlignmentRecordsConverter

436

extends SameTypeConversion[VariantContext, AlignmentRecord]

437

438

/**

439

* Convert variant context data to genotype data.

440

*/

441

class VariantContextsToGenotypesConverter

442

extends SameTypeConversion[VariantContext, Genotype]

443

444

/**

445

* Convert variant context data to variant data.

446

*/

447

class VariantContextsToVariantsConverter

448

extends SameTypeConversion[VariantContext, Variant]

449

450

/**

451

* Same-type conversion for variant context data.

452

*/

453

class VariantContextsToVariantContextConverter

454

extends SameTypeConversion[VariantContext, VariantContext]

455

```

456

457

## Usage Examples

458

459

```scala

460

import org.bdgenomics.adam.api.java.GenomicRDDConverters._

461

import org.apache.spark.rdd.RDD

462

463

// Convert alignment records to features using RDD

464

val alignments: AlignmentRecordRDD = jac.loadAlignments("input.bam")

465

val alignmentRDD: RDD[AlignmentRecord] = alignments.rdd

466

val emptyFeatureRDD: RDD[Feature] = spark.sparkContext.emptyRDD[Feature]

467

468

val converter = new AlignmentRecordsToFeaturesConverter()

469

val featureRDD: RDD[Feature] = converter.call(alignmentRDD, emptyFeatureRDD)

470

471

// Convert variants to coverage using RDD

472

val variants: VariantRDD = jac.loadVariants("variants.vcf")

473

val variantRDD: RDD[Variant] = variants.rdd

474

val emptyCoverageRDD: RDD[Coverage] = spark.sparkContext.emptyRDD[Coverage]

475

476

val coverageConverter = new VariantsToCoverageConverter()

477

val coverageRDD: RDD[Coverage] = coverageConverter.call(variantRDD, emptyCoverageRDD)

478

479

// Same-type conversion for data format transformation

480

val features: FeatureRDD = jac.loadFeatures("input.bed")

481

val featureRDD: RDD[Feature] = features.rdd

482

val emptyFeatureRDD2: RDD[Feature] = spark.sparkContext.emptyRDD[Feature]

483

484

val sameTypeConverter = new FeaturesToFeatureConverter()

485

val transformedFeatureRDD: RDD[Feature] = sameTypeConverter.call(featureRDD, emptyFeatureRDD2)

486

```

487

488

## RDD vs Dataset Conversions

489

490

**RDD Conversions** provide:

491

- **Low-level control**: Direct access to RDD operations and partitioning

492

- **Memory efficiency**: Fine-grained control over data serialization and caching

493

- **Custom partitioning**: Support for genomic-aware partitioning strategies

494

- **Legacy compatibility**: Integration with older Spark RDD-based workflows

495

496

**Dataset Conversions** provide:

497

- **Type safety**: Compile-time type checking with Spark's Catalyst optimizer

498

- **SQL integration**: Ability to use Spark SQL operations on genomic data

499

- **Performance optimization**: Automatic query optimization through Catalyst

500

- **Schema evolution**: Better handling of schema changes and compatibility

501

502

## Performance Considerations

503

504

- **Same-type conversions**: Primarily used for format transformation and optimization

505

- **Cross-type conversions**: May involve complex data transformations and schema mapping

506

- **Metadata preservation**: All conversions maintain genomic metadata (sequence dictionaries, record groups)

507

- **Partitioning**: RDD conversions preserve and optimize partition layouts for genomic data access patterns