or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-structures.mdindex.mdio.mdstatistics.mdutilities.md

statistics.mddocs/

0

# Statistics

1

2

Scikit-allel provides comprehensive statistical methods for population genetics analysis, including diversity metrics, population structure analysis, linkage disequilibrium, and tests for natural selection.

3

4

## Diversity Analysis

5

6

### Sequence Diversity

7

8

Calculate nucleotide diversity (π) and related metrics.

9

10

```python

11

import allel

12

13

# Basic sequence diversity

14

diversity = allel.sequence_diversity(positions, allele_counts, start=None, stop=None)

15

16

# Mean pairwise differences

17

mpd = allel.mean_pairwise_difference(allele_counts)

18

19

# Between-population differences

20

div = allel.mean_pairwise_difference_between(ac1, ac2)

21

```

22

{ .api }

23

24

**Key Functions:**

25

26

```python

27

def sequence_diversity(pos, ac, start=None, stop=None, is_accessible=None):

28

"""

29

Calculate sequence diversity (π) over a genomic region.

30

31

Args:

32

pos (array_like): Variant positions

33

ac (AlleleCountsArray): Allele counts per variant

34

start (int, optional): Start position for calculation

35

stop (int, optional): Stop position for calculation

36

is_accessible (array_like, optional): Boolean array of accessible sites

37

38

Returns:

39

float: Sequence diversity value

40

"""

41

42

def mean_pairwise_difference(ac, an=None):

43

"""

44

Calculate mean pairwise nucleotide differences.

45

46

Args:

47

ac (AlleleCountsArray): Allele counts per variant

48

an (array_like, optional): Total allele numbers per variant

49

50

Returns:

51

ndarray: Mean pairwise differences per variant

52

"""

53

54

def windowed_diversity(pos, ac, windows, is_accessible=None):

55

"""

56

Calculate diversity in genomic windows.

57

58

Args:

59

pos (array_like): Variant positions

60

ac (AlleleCountsArray): Allele counts

61

windows (array_like): Window coordinates as (start, stop) pairs

62

is_accessible (array_like, optional): Accessible sites mask

63

64

Returns:

65

ndarray: Diversity values per window

66

"""

67

```

68

{ .api }

69

70

### Sequence Divergence

71

72

Calculate divergence between populations or sequences.

73

74

```python

75

# Between-population divergence

76

divergence = allel.sequence_divergence(positions, ac_pop1, ac_pop2, start=1000, stop=2000)

77

78

# Windowed divergence analysis

79

div_windows = allel.windowed_divergence(positions, ac_pop1, ac_pop2, windows)

80

81

# Windowed differentiation

82

df_windows = allel.windowed_df(positions, ac_pop1, ac_pop2, windows)

83

```

84

{ .api }

85

86

```python

87

def sequence_divergence(pos, ac1, ac2, start=None, stop=None, is_accessible=None):

88

"""

89

Calculate sequence divergence between two populations.

90

91

Args:

92

pos (array_like): Variant positions

93

ac1 (AlleleCountsArray): Allele counts for population 1

94

ac2 (AlleleCountsArray): Allele counts for population 2

95

start (int, optional): Start position

96

stop (int, optional): Stop position

97

is_accessible (array_like, optional): Accessible sites mask

98

99

Returns:

100

float: Sequence divergence value

101

"""

102

103

def windowed_divergence(pos, ac1, ac2, windows, is_accessible=None):

104

"""

105

Calculate divergence between populations in genomic windows.

106

107

Args:

108

pos (array_like): Variant positions

109

ac1 (AlleleCountsArray): Allele counts for population 1

110

ac2 (AlleleCountsArray): Allele counts for population 2

111

windows (array_like): Window coordinates

112

is_accessible (array_like, optional): Accessible sites mask

113

114

Returns:

115

ndarray: Divergence values per window

116

"""

117

118

def windowed_df(pos, ac1, ac2, windows, is_accessible=None):

119

"""

120

Calculate windowed differentiation (Df) between populations.

121

122

Args:

123

pos (array_like): Variant positions

124

ac1 (AlleleCountsArray): Allele counts for population 1

125

ac2 (AlleleCountsArray): Allele counts for population 2

126

windows (array_like): Window coordinates

127

is_accessible (array_like, optional): Accessible sites mask

128

129

Returns:

130

ndarray: Df values per window

131

"""

132

```

133

{ .api }

134

135

### Watterson's Theta

136

137

Estimate population mutation rate using Watterson's method.

138

139

```python

140

# Single value

141

theta_w = allel.watterson_theta(allele_counts)

142

143

# Windowed analysis

144

theta_windows = allel.windowed_watterson_theta(positions, allele_counts, windows)

145

```

146

{ .api }

147

148

```python

149

def watterson_theta(ac, an=None):

150

"""

151

Calculate Watterson's estimator of theta (population mutation rate).

152

153

Args:

154

ac (AlleleCountsArray): Allele counts per variant

155

an (array_like, optional): Total allele numbers per variant

156

157

Returns:

158

float: Watterson's theta estimate

159

"""

160

161

def windowed_watterson_theta(pos, ac, windows, is_accessible=None):

162

"""

163

Calculate Watterson's theta in genomic windows.

164

165

Args:

166

pos (array_like): Variant positions

167

ac (AlleleCountsArray): Allele counts

168

windows (array_like): Window coordinates

169

is_accessible (array_like, optional): Accessible sites mask

170

171

Returns:

172

ndarray: Theta values per window

173

"""

174

```

175

{ .api }

176

177

### Tajima's D

178

179

Test for departure from neutral evolution.

180

181

```python

182

# Single genomic region

183

tajima_d = allel.tajima_d(allele_counts, positions, start=1000, stop=2000)

184

185

# Windowed analysis

186

d_windows = allel.windowed_tajima_d(positions, allele_counts, windows)

187

188

# Moving window

189

d_moving = allel.moving_tajima_d(positions, allele_counts, size=1000, step=500)

190

```

191

{ .api }

192

193

```python

194

def tajima_d(ac, pos=None, start=None, stop=None, is_accessible=None):

195

"""

196

Calculate Tajima's D test statistic.

197

198

Args:

199

ac (AlleleCountsArray): Allele counts per variant

200

pos (array_like, optional): Variant positions

201

start (int, optional): Start position

202

stop (int, optional): Stop position

203

is_accessible (array_like, optional): Accessible sites mask

204

205

Returns:

206

float: Tajima's D value

207

"""

208

209

def windowed_tajima_d(pos, ac, windows, is_accessible=None):

210

"""

211

Calculate Tajima's D in genomic windows.

212

213

Args:

214

pos (array_like): Variant positions

215

ac (AlleleCountsArray): Allele counts

216

windows (array_like): Window coordinates

217

is_accessible (array_like, optional): Accessible sites mask

218

219

Returns:

220

ndarray: Tajima's D values per window

221

"""

222

223

def moving_tajima_d(pos, ac, size, start=None, stop=None, step=None, is_accessible=None):

224

"""

225

Calculate Tajima's D in sliding windows.

226

227

Args:

228

pos (array_like): Variant positions

229

ac (AlleleCountsArray): Allele counts

230

size (int): Window size in base pairs

231

start (int, optional): Start position

232

stop (int, optional): Stop position

233

step (int, optional): Step size between windows

234

is_accessible (array_like, optional): Accessible sites mask

235

236

Returns:

237

tuple: (window_centers, tajima_d_values)

238

"""

239

```

240

{ .api }

241

242

## Population Structure (FST)

243

244

### FST Estimators

245

246

Calculate fixation indices between populations.

247

248

```python

249

# Weir & Cockerham FST

250

fst_wc = allel.weir_cockerham_fst(genotypes, subpops)

251

252

# Hudson's FST

253

fst_hudson = allel.hudson_fst(allele_counts, subpops)

254

255

# Patterson's FST (for two populations)

256

fst_patterson = allel.patterson_fst(ac_pop1, ac_pop2)

257

```

258

{ .api }

259

260

**Key Functions:**

261

262

```python

263

def weir_cockerham_fst(g, subpops, max_allele=None):

264

"""

265

Calculate Weir & Cockerham FST between subpopulations.

266

267

Args:

268

g (GenotypeArray): Genotype data

269

subpops (list): List of arrays containing sample indices for each subpopulation

270

max_allele (int, optional): Maximum allele number to consider

271

272

Returns:

273

tuple: (a, b, c) variance components for FST calculation

274

"""

275

276

def hudson_fst(ac, subpops):

277

"""

278

Calculate Hudson's FST estimator.

279

280

Args:

281

ac (AlleleCountsArray): Allele counts per variant

282

subpops (list): Subpopulation definitions

283

284

Returns:

285

tuple: (num, den) numerator and denominator for FST calculation

286

"""

287

288

def average_weir_cockerham_fst(g, subpops, max_allele=None):

289

"""

290

Calculate average FST across all variants.

291

292

Args:

293

g (GenotypeArray): Genotype data

294

subpops (list): Subpopulation sample indices

295

max_allele (int, optional): Maximum allele number

296

297

Returns:

298

float: Average FST value

299

"""

300

301

def windowed_weir_cockerham_fst(pos, g, subpops, windows, max_allele=None):

302

"""

303

Calculate FST in genomic windows.

304

305

Args:

306

pos (array_like): Variant positions

307

g (GenotypeArray): Genotype data

308

subpops (list): Subpopulation definitions

309

windows (array_like): Window coordinates

310

max_allele (int, optional): Maximum allele number

311

312

Returns:

313

ndarray: FST values per window

314

"""

315

```

316

{ .api }

317

318

### Additional FST Methods

319

320

Advanced FST calculations for specific analysis needs.

321

322

```python

323

# Blockwise FST calculations (for bootstrapping)

324

fst_blocks_wc = allel.blockwise_weir_cockerham_fst(genotypes, subpops, blocks)

325

fst_blocks_hudson = allel.blockwise_hudson_fst(ac, subpops, blocks)

326

fst_blocks_patterson = allel.blockwise_patterson_fst(ac1, ac2, blocks)

327

328

# Moving window FST

329

fst_moving_hudson = allel.moving_hudson_fst(positions, ac, subpops, size=10000)

330

fst_moving_patterson = allel.moving_patterson_fst(positions, ac1, ac2, size=10000)

331

fst_moving_wc = allel.moving_weir_cockerham_fst(positions, genotypes, subpops, size=10000)

332

333

# Average FST calculations

334

avg_hudson = allel.average_hudson_fst(ac, subpops)

335

avg_patterson = allel.average_patterson_fst(ac1, ac2)

336

```

337

{ .api }

338

339

```python

340

def blockwise_weir_cockerham_fst(g, subpops, blocks, max_allele=None):

341

"""

342

Calculate Weir & Cockerham FST for genomic blocks.

343

344

Args:

345

g (GenotypeArray): Genotype data

346

subpops (list): Subpopulation sample indices

347

blocks (array_like): Block definitions

348

max_allele (int, optional): Maximum allele number

349

350

Returns:

351

ndarray: FST values per block

352

"""

353

354

def blockwise_hudson_fst(ac, subpops, blocks):

355

"""

356

Calculate Hudson's FST for genomic blocks.

357

358

Args:

359

ac (AlleleCountsArray): Allele counts

360

subpops (list): Subpopulation definitions

361

blocks (array_like): Block definitions

362

363

Returns:

364

ndarray: FST values per block

365

"""

366

367

def blockwise_patterson_fst(ac1, ac2, blocks):

368

"""

369

Calculate Patterson's FST for genomic blocks.

370

371

Args:

372

ac1 (AlleleCountsArray): Allele counts for population 1

373

ac2 (AlleleCountsArray): Allele counts for population 2

374

blocks (array_like): Block definitions

375

376

Returns:

377

ndarray: FST values per block

378

"""

379

380

def moving_hudson_fst(pos, ac, subpops, size, start=None, stop=None, step=None):

381

"""

382

Calculate Hudson's FST in sliding windows.

383

384

Args:

385

pos (array_like): Variant positions

386

ac (AlleleCountsArray): Allele counts

387

subpops (list): Subpopulation definitions

388

size (int): Window size in base pairs

389

start (int, optional): Start position

390

stop (int, optional): Stop position

391

step (int, optional): Step size between windows

392

393

Returns:

394

tuple: (window_centers, fst_values)

395

"""

396

397

def moving_patterson_fst(pos, ac1, ac2, size, start=None, stop=None, step=None):

398

"""

399

Calculate Patterson's FST in sliding windows.

400

401

Args:

402

pos (array_like): Variant positions

403

ac1 (AlleleCountsArray): Population 1 allele counts

404

ac2 (AlleleCountsArray): Population 2 allele counts

405

size (int): Window size in base pairs

406

start (int, optional): Start position

407

stop (int, optional): Stop position

408

step (int, optional): Step size between windows

409

410

Returns:

411

tuple: (window_centers, fst_values)

412

"""

413

414

def moving_weir_cockerham_fst(pos, g, subpops, size, start=None, stop=None, step=None, max_allele=None):

415

"""

416

Calculate Weir & Cockerham FST in sliding windows.

417

418

Args:

419

pos (array_like): Variant positions

420

g (GenotypeArray): Genotype data

421

subpops (list): Subpopulation sample indices

422

size (int): Window size in base pairs

423

start (int, optional): Start position

424

stop (int, optional): Stop position

425

step (int, optional): Step size between windows

426

max_allele (int, optional): Maximum allele number

427

428

Returns:

429

tuple: (window_centers, fst_values)

430

"""

431

432

def average_hudson_fst(ac, subpops):

433

"""

434

Calculate average Hudson's FST across all variants.

435

436

Args:

437

ac (AlleleCountsArray): Allele counts

438

subpops (list): Subpopulation definitions

439

440

Returns:

441

float: Average FST value

442

"""

443

444

def average_patterson_fst(ac1, ac2):

445

"""

446

Calculate average Patterson's FST across all variants.

447

448

Args:

449

ac1 (AlleleCountsArray): Population 1 allele counts

450

ac2 (AlleleCountsArray): Population 2 allele counts

451

452

Returns:

453

float: Average FST value

454

"""

455

```

456

{ .api }

457

458

## Linkage Disequilibrium

459

460

### LD Metrics

461

462

Measure linkage disequilibrium between variants.

463

464

```python

465

# Rogers & Huff r correlation

466

r = allel.rogers_huff_r(genotype_matrix)

467

468

# Between two sets of variants

469

r_between = allel.rogers_huff_r_between(gn_a, gn_b)

470

471

# Windowed r-squared

472

r2_windows = allel.windowed_r_squared(positions, genotype_matrix, size=10000)

473

474

# Visualize LD patterns

475

allel.plot_pairwise_ld(gn_subset, positions_subset)

476

```

477

{ .api }

478

479

```python

480

def rogers_huff_r(gn):

481

"""

482

Calculate Rogers & Huff r correlation coefficient matrix.

483

484

Args:

485

gn (array_like): Genotype matrix (n_variants, n_samples)

486

487

Returns:

488

ndarray: Correlation matrix between all pairs of variants

489

"""

490

491

def rogers_huff_r_between(gn1, gn2):

492

"""

493

Calculate Rogers & Huff r correlation between two sets of variants.

494

495

Args:

496

gn1 (array_like): First set of genotype data (n_variants1, n_samples)

497

gn2 (array_like): Second set of genotype data (n_variants2, n_samples)

498

499

Returns:

500

ndarray: Correlation matrix between variants in the two sets

501

"""

502

503

def windowed_r_squared(pos, gn, size, start=None, stop=None, step=None):

504

"""

505

Calculate r² in sliding windows.

506

507

Args:

508

pos (array_like): Variant positions

509

gn (array_like): Genotype matrix

510

size (int): Window size in base pairs

511

start (int, optional): Start position

512

stop (int, optional): Stop position

513

step (int, optional): Step size between windows

514

515

Returns:

516

tuple: (window_centers, mean_r2_values)

517

"""

518

519

def locate_unlinked(gn, size, step, threshold):

520

"""

521

Identify unlinked variants based on LD threshold.

522

523

Args:

524

gn (array_like): Genotype matrix

525

size (int): Window size for LD calculation

526

step (int): Step size between windows

527

threshold (float): LD threshold for considering variants unlinked

528

529

Returns:

530

ndarray: Indices of unlinked variants

531

"""

532

533

def plot_pairwise_ld(gn, pos, start=None, stop=None, ax=None):

534

"""

535

Plot pairwise linkage disequilibrium patterns.

536

537

Args:

538

gn (array_like): Genotype matrix

539

pos (array_like): Variant positions

540

start (int, optional): Start position for plotting

541

stop (int, optional): Stop position for plotting

542

ax (matplotlib axis, optional): Axis to plot on

543

544

Returns:

545

matplotlib figure: LD heatmap plot

546

"""

547

```

548

{ .api }

549

550

## Principal Component Analysis

551

552

### PCA Methods

553

554

Dimensionality reduction for population structure analysis.

555

556

```python

557

# Standard PCA

558

coords, model = allel.pca(genotype_matrix, n_components=10)

559

560

# Randomized PCA for large datasets

561

coords, model = allel.randomized_pca(genotype_matrix, n_components=10)

562

563

# Use different scalers

564

coords, model = allel.pca(genotype_matrix, scaler='patterson')

565

```

566

{ .api }

567

568

```python

569

def pca(gn, n_components=10, scaler=None, copy=True, random_state=None):

570

"""

571

Perform principal component analysis on genotype data.

572

573

Args:

574

gn (array_like): Genotype matrix (n_variants, n_samples)

575

n_components (int): Number of principal components to compute

576

scaler (str or object, optional): Scaling method ('standard', 'patterson', or scaler object)

577

copy (bool): Whether to copy input data

578

random_state (int, optional): Random seed for reproducibility

579

580

Returns:

581

tuple: (coordinates, fitted_model) where coordinates are sample projections

582

"""

583

584

def randomized_pca(gn, n_components=10, scaler=None, copy=True, random_state=None):

585

"""

586

Randomized PCA for large datasets.

587

588

Args:

589

gn (array_like): Genotype matrix

590

n_components (int): Number of components

591

scaler (str or object, optional): Scaling method

592

copy (bool): Whether to copy data

593

random_state (int, optional): Random seed

594

595

Returns:

596

tuple: (coordinates, fitted_model)

597

"""

598

```

599

{ .api }

600

601

## Data Preprocessing

602

603

### Scaling Methods

604

605

Standardize genotype data for analysis.

606

607

```python

608

# Standard scaling (mean=0, std=1)

609

scaler = allel.StandardScaler()

610

gn_scaled = scaler.fit_transform(genotype_matrix)

611

612

# Center scaling (mean=0)

613

scaler = allel.CenterScaler()

614

gn_centered = scaler.fit_transform(genotype_matrix)

615

616

# Patterson scaling (for PCA)

617

scaler = allel.PattersonScaler()

618

gn_patterson = scaler.fit_transform(genotype_matrix)

619

620

# Get scaler by name

621

scaler = allel.get_scaler('patterson')

622

```

623

{ .api }

624

625

```python

626

class StandardScaler:

627

"""

628

Standardize genotype data to have zero mean and unit variance.

629

"""

630

631

def fit(self, X):

632

"""Compute mean and standard deviation for later scaling."""

633

634

def transform(self, X):

635

"""Scale data using computed statistics."""

636

637

def fit_transform(self, X):

638

"""Fit scaler and transform data in one step."""

639

640

class CenterScaler:

641

"""

642

Center genotype data to have zero mean.

643

"""

644

645

def fit(self, X):

646

"""Compute mean for later centering."""

647

648

def transform(self, X):

649

"""Center data using computed mean."""

650

651

def fit_transform(self, X):

652

"""Fit scaler and center data in one step."""

653

654

class PattersonScaler:

655

"""

656

Scale genotype data using Patterson method for PCA.

657

658

Scales variants by their expected variance under Hardy-Weinberg

659

equilibrium, which is appropriate for principal component analysis.

660

"""

661

662

def fit(self, X):

663

"""Compute scaling factors."""

664

665

def transform(self, X):

666

"""Apply Patterson scaling."""

667

668

def fit_transform(self, X):

669

"""Fit and transform in one step."""

670

671

def get_scaler(name):

672

"""

673

Get a scaler by name.

674

675

Args:

676

name (str): Scaler name ('standard', 'center', or 'patterson')

677

678

Returns:

679

scaler: Scaler object

680

"""

681

```

682

{ .api }

683

684

## Selection Tests

685

686

### Extended Haplotype Homozygosity

687

688

Test for recent positive selection using haplotype structure.

689

690

```python

691

# Integrated Haplotype Score (iHS)

692

ihs_scores = allel.ihs(haplotypes, map_positions, min_ehh=0.05)

693

694

# Cross-population Extended Haplotype Homozygosity (XP-EHH)

695

xpehh_scores = allel.xpehh(hap_pop1, hap_pop2, map_positions)

696

697

# Standardize scores

698

ihs_std = allel.standardize(ihs_scores)

699

```

700

{ .api }

701

702

```python

703

def ihs(h, map_pos, min_ehh=0.05, include_edges=False, gap_scale=20000, max_gap=200000, discard_integration_warnings=False):

704

"""

705

Calculate integrated haplotype score (iHS) for each variant.

706

707

Args:

708

h (HaplotypeArray): Haplotype data

709

map_pos (array_like): Genetic map positions in centiMorgans

710

min_ehh (float): Minimum EHH value for integration cutoff

711

include_edges (bool): Whether to include variants at edges

712

gap_scale (float): Scale parameter for gap penalty

713

max_gap (float): Maximum gap size to allow

714

discard_integration_warnings (bool): Whether to suppress integration warnings

715

716

Returns:

717

ndarray: iHS scores per variant

718

"""

719

720

def xpehh(h1, h2, map_pos, min_ehh=0.05, include_edges=False, gap_scale=20000, max_gap=200000, discard_integration_warnings=False):

721

"""

722

Calculate cross-population EHH (XP-EHH) between two populations.

723

724

Args:

725

h1 (HaplotypeArray): Haplotypes from population 1

726

h2 (HaplotypeArray): Haplotypes from population 2

727

map_pos (array_like): Genetic map positions

728

min_ehh (float): Minimum EHH cutoff

729

include_edges (bool): Include edge variants

730

gap_scale (float): Gap penalty scale

731

max_gap (float): Maximum allowed gap

732

discard_integration_warnings (bool): Suppress warnings

733

734

Returns:

735

ndarray: XP-EHH scores per variant

736

"""

737

738

def standardize_by_allele_count(score, aac, flip=None):

739

"""

740

Standardize test statistic by allele count.

741

742

Args:

743

score (array_like): Raw test statistic values

744

aac (array_like): Alternate allele counts

745

flip (array_like, optional): Boolean array indicating which scores to flip

746

747

Returns:

748

ndarray: Standardized scores

749

"""

750

```

751

{ .api }

752

753

### Number of Segregating Sites by Length

754

755

Alternative selection test less sensitive to recombination rate variation.

756

757

```python

758

# NSL scores

759

nsl_scores = allel.nsl(haplotypes, map_positions)

760

761

# Cross-population NSL

762

xpnsl_scores = allel.xpnsl(hap_pop1, hap_pop2, map_positions)

763

```

764

{ .api }

765

766

```python

767

def nsl(h, map_pos=None, max_extent=None):

768

"""

769

Calculate number of segregating sites by length (NSL).

770

771

Args:

772

h (HaplotypeArray): Haplotype data

773

map_pos (array_like, optional): Map positions for physical distance

774

max_extent (float, optional): Maximum extent for calculation

775

776

Returns:

777

ndarray: NSL scores per variant

778

"""

779

780

def xpnsl(h1, h2, map_pos=None, max_extent=None):

781

"""

782

Calculate cross-population NSL.

783

784

Args:

785

h1 (HaplotypeArray): Population 1 haplotypes

786

h2 (HaplotypeArray): Population 2 haplotypes

787

map_pos (array_like, optional): Map positions

788

max_extent (float, optional): Maximum extent

789

790

Returns:

791

ndarray: XP-NSL scores per variant

792

"""

793

```

794

{ .api }

795

796

### Additional Selection Tests

797

798

More methods for detecting natural selection.

799

800

```python

801

# EHH decay analysis

802

ehh_values = allel.ehh_decay(haplotypes, map_positions, focal_variant=100)

803

804

# Voight painting (visualization of haplotype decay)

805

painting = allel.voight_painting(haplotypes, map_positions, focal_variant=100)

806

allel.plot_voight_painting(painting)

807

808

# Standardization methods

809

scores_std = allel.standardize(scores)

810

scores_std_ac = allel.standardize_by_allele_count(scores, allele_counts)

811

812

# Moving Tajima's D difference

813

delta_td = allel.moving_delta_tajima_d(positions, ac_pop1, ac_pop2, size=10000)

814

815

# Population Branch Statistic (PBS)

816

pbs_scores = allel.pbs(fst_AB, fst_AC, fst_BC)

817

```

818

{ .api }

819

820

```python

821

def ehh_decay(h, map_pos, focal=None, max_gap=200000, gap_scale=20000, truncate_left=None, truncate_right=None):

822

"""

823

Calculate Extended Haplotype Homozygosity (EHH) decay.

824

825

Args:

826

h (HaplotypeArray): Haplotype data

827

map_pos (array_like): Genetic map positions

828

focal (int, optional): Index of focal variant

829

max_gap (float): Maximum gap between variants

830

gap_scale (float): Gap scaling factor

831

truncate_left (float, optional): Left truncation point

832

truncate_right (float, optional): Right truncation point

833

834

Returns:

835

tuple: (distances, ehh_values)

836

"""

837

838

def voight_painting(h, map_pos, focal=None, max_gap=200000, gap_scale=20000):

839

"""

840

Generate Voight painting data for haplotype visualization.

841

842

Args:

843

h (HaplotypeArray): Haplotype data

844

map_pos (array_like): Genetic map positions

845

focal (int, optional): Index of focal variant

846

max_gap (float): Maximum gap between variants

847

gap_scale (float): Gap scaling factor

848

849

Returns:

850

dict: Painting data for visualization

851

"""

852

853

def plot_voight_painting(painting, ax=None):

854

"""

855

Plot Voight painting visualization.

856

857

Args:

858

painting (dict): Painting data from voight_painting()

859

ax (matplotlib axis, optional): Axis to plot on

860

861

Returns:

862

matplotlib figure: Voight painting plot

863

"""

864

865

def fig_voight_painting(painting, height=None):

866

"""

867

Create figure for Voight painting.

868

869

Args:

870

painting (dict): Painting data

871

height (float, optional): Figure height

872

873

Returns:

874

matplotlib figure: Figure object

875

"""

876

877

def plot_haplotype_frequencies(h, start=None, stop=None, ax=None):

878

"""

879

Plot haplotype frequencies.

880

881

Args:

882

h (HaplotypeArray): Haplotype data

883

start (int, optional): Start position index

884

stop (int, optional): Stop position index

885

ax (matplotlib axis, optional): Axis to plot on

886

887

Returns:

888

matplotlib figure: Haplotype frequency plot

889

"""

890

891

def plot_moving_haplotype_frequencies(h, size, start=None, stop=None, step=None, ax=None):

892

"""

893

Plot moving window haplotype frequencies.

894

895

Args:

896

h (HaplotypeArray): Haplotype data

897

size (int): Window size

898

start (int, optional): Start position

899

stop (int, optional): Stop position

900

step (int, optional): Step size

901

ax (matplotlib axis, optional): Axis to plot on

902

903

Returns:

904

matplotlib figure: Moving haplotype frequency plot

905

"""

906

907

def standardize(score):

908

"""

909

Standardize test statistic to z-scores.

910

911

Args:

912

score (array_like): Raw test statistic values

913

914

Returns:

915

ndarray: Standardized scores

916

"""

917

918

def moving_delta_tajima_d(pos, ac1, ac2, size, start=None, stop=None, step=None, is_accessible=None):

919

"""

920

Calculate difference in Tajima's D between populations in sliding windows.

921

922

Args:

923

pos (array_like): Variant positions

924

ac1 (AlleleCountsArray): Allele counts for population 1

925

ac2 (AlleleCountsArray): Allele counts for population 2

926

size (int): Window size in base pairs

927

start (int, optional): Start position

928

stop (int, optional): Stop position

929

step (int, optional): Step size

930

is_accessible (array_like, optional): Accessible sites mask

931

932

Returns:

933

tuple: (window_centers, delta_tajima_d_values)

934

"""

935

936

def pbs(fst_AB, fst_AC, fst_BC):

937

"""

938

Calculate Population Branch Statistic (PBS) for three populations.

939

940

Args:

941

fst_AB (array_like): FST between populations A and B

942

fst_AC (array_like): FST between populations A and C

943

fst_BC (array_like): FST between populations B and C

944

945

Returns:

946

ndarray: PBS values for population A

947

"""

948

949

def moving_haplotype_diversity(h, size, start=None, stop=None, step=None):

950

"""

951

Calculate haplotype diversity in sliding windows.

952

953

Args:

954

h (HaplotypeArray): Haplotype data

955

size (int): Window size (number of variants)

956

start (int, optional): Start position index

957

stop (int, optional): Stop position index

958

step (int, optional): Step size

959

960

Returns:

961

tuple: (window_centers, diversity_values)

962

"""

963

964

def moving_garud_h(h, size, start=None, stop=None, step=None):

965

"""

966

Calculate Garud's H statistics in sliding windows.

967

968

Args:

969

h (HaplotypeArray): Haplotype data

970

size (int): Window size (number of variants)

971

start (int, optional): Start position index

972

stop (int, optional): Stop position index

973

step (int, optional): Step size

974

975

Returns:

976

tuple: (window_centers, h1_values, h12_values, h123_values, h2_h1_values)

977

"""

978

```

979

{ .api }

980

981

### Haplotype Diversity

982

983

Measure diversity of haplotype patterns.

984

985

```python

986

# Basic haplotype diversity

987

h_div = allel.haplotype_diversity(haplotypes, start=1000, stop=2000)

988

989

# Moving window analysis

990

h_div_moving = allel.moving_haplotype_diversity(haplotypes, size=1000, step=500)

991

992

# Garud's H statistics

993

h_stats = allel.garud_h(haplotypes, start=1000, stop=2000)

994

```

995

{ .api }

996

997

```python

998

def haplotype_diversity(h, start=None, stop=None):

999

"""

1000

Calculate haplotype diversity in a genomic region.

1001

1002

Args:

1003

h (HaplotypeArray): Haplotype data

1004

start (int, optional): Start position index

1005

stop (int, optional): Stop position index

1006

1007

Returns:

1008

float: Haplotype diversity value

1009

"""

1010

1011

def garud_h(h, start=None, stop=None):

1012

"""

1013

Calculate Garud's H1, H12, H123, and H2/H1 statistics.

1014

1015

Args:

1016

h (HaplotypeArray): Haplotype data

1017

start (int, optional): Start position index

1018

stop (int, optional): Stop position index

1019

1020

Returns:

1021

tuple: (H1, H12, H123, H2_H1) statistics

1022

"""

1023

```

1024

{ .api }

1025

1026

## Site Frequency Spectrum

1027

1028

### SFS Analysis

1029

1030

Analyze allele frequency distributions.

1031

1032

```python

1033

# Basic SFS

1034

sfs = allel.sfs(derived_allele_counts, n=sample_size)

1035

1036

# Folded SFS (when ancestral state unknown)

1037

sfs_folded = allel.sfs_folded(derived_allele_counts, n=sample_size)

1038

1039

# Joint SFS between populations

1040

joint_sfs = allel.joint_sfs(dac_pop1, dac_pop2, n1=n_pop1, n2=n_pop2)

1041

```

1042

{ .api }

1043

1044

```python

1045

def sfs(dac, n=None):

1046

"""

1047

Calculate site frequency spectrum.

1048

1049

Args:

1050

dac (array_like): Derived allele counts per variant

1051

n (int, optional): Sample size (number of chromosomes)

1052

1053

Returns:

1054

ndarray: Site frequency spectrum

1055

"""

1056

1057

def sfs_folded(dac, n=None):

1058

"""

1059

Calculate folded site frequency spectrum.

1060

1061

Args:

1062

dac (array_like): Derived allele counts

1063

n (int, optional): Sample size

1064

1065

Returns:

1066

ndarray: Folded SFS

1067

"""

1068

1069

def joint_sfs(dac1, dac2, n1=None, n2=None):

1070

"""

1071

Calculate joint site frequency spectrum between two populations.

1072

1073

Args:

1074

dac1 (array_like): Derived allele counts for population 1

1075

dac2 (array_like): Derived allele counts for population 2

1076

n1 (int, optional): Sample size for population 1

1077

n2 (int, optional): Sample size for population 2

1078

1079

Returns:

1080

ndarray: Joint SFS as 2D array

1081

"""

1082

```

1083

{ .api }

1084

1085

### SFS Transformations and Scaling

1086

1087

Transform and scale site frequency spectra.

1088

1089

```python

1090

# Scale SFS by mutation rate

1091

sfs_scaled = allel.sfs_scaled(derived_counts, n=sample_size, theta=0.01)

1092

sfs_folded_scaled = allel.sfs_folded_scaled(derived_counts, n=sample_size, theta=0.01)

1093

1094

# Joint SFS scaling

1095

joint_sfs_scaled = allel.joint_sfs_scaled(dac1, dac2, n1=n1, n2=n2, theta=0.01)

1096

joint_sfs_folded_scaled = allel.joint_sfs_folded_scaled(dac1, dac2, n1=n1, n2=n2, theta=0.01)

1097

1098

# Fold existing SFS

1099

folded_sfs = allel.fold_sfs(unfolded_sfs)

1100

folded_joint_sfs = allel.fold_joint_sfs(unfolded_joint_sfs)

1101

1102

# Scale existing SFS

1103

scaled_sfs = allel.scale_sfs(sfs, theta=0.01)

1104

scaled_folded_sfs = allel.scale_sfs_folded(sfs, theta=0.01)

1105

scaled_joint_sfs = allel.scale_joint_sfs(joint_sfs, theta=0.01)

1106

scaled_joint_folded_sfs = allel.scale_joint_sfs_folded(joint_sfs, theta=0.01)

1107

```

1108

{ .api }

1109

1110

```python

1111

def sfs_scaled(dac, n=None, theta=1.0):

1112

"""

1113

Calculate scaled site frequency spectrum.

1114

1115

Args:

1116

dac (array_like): Derived allele counts

1117

n (int, optional): Sample size

1118

theta (float): Scaling factor (mutation rate)

1119

1120

Returns:

1121

ndarray: Scaled SFS

1122

"""

1123

1124

def sfs_folded_scaled(dac, n=None, theta=1.0):

1125

"""

1126

Calculate scaled folded site frequency spectrum.

1127

1128

Args:

1129

dac (array_like): Derived allele counts

1130

n (int, optional): Sample size

1131

theta (float): Scaling factor

1132

1133

Returns:

1134

ndarray: Scaled folded SFS

1135

"""

1136

1137

def joint_sfs_scaled(dac1, dac2, n1=None, n2=None, theta=1.0):

1138

"""

1139

Calculate scaled joint site frequency spectrum.

1140

1141

Args:

1142

dac1 (array_like): Population 1 derived allele counts

1143

dac2 (array_like): Population 2 derived allele counts

1144

n1 (int, optional): Population 1 sample size

1145

n2 (int, optional): Population 2 sample size

1146

theta (float): Scaling factor

1147

1148

Returns:

1149

ndarray: Scaled joint SFS

1150

"""

1151

1152

def joint_sfs_folded_scaled(dac1, dac2, n1=None, n2=None, theta=1.0):

1153

"""

1154

Calculate scaled folded joint site frequency spectrum.

1155

1156

Args:

1157

dac1 (array_like): Population 1 derived allele counts

1158

dac2 (array_like): Population 2 derived allele counts

1159

n1 (int, optional): Population 1 sample size

1160

n2 (int, optional): Population 2 sample size

1161

theta (float): Scaling factor

1162

1163

Returns:

1164

ndarray: Scaled folded joint SFS

1165

"""

1166

1167

def fold_sfs(sfs):

1168

"""

1169

Fold an existing site frequency spectrum.

1170

1171

Args:

1172

sfs (array_like): Unfolded SFS

1173

1174

Returns:

1175

ndarray: Folded SFS

1176

"""

1177

1178

def fold_joint_sfs(joint_sfs):

1179

"""

1180

Fold an existing joint site frequency spectrum.

1181

1182

Args:

1183

joint_sfs (array_like): Unfolded joint SFS

1184

1185

Returns:

1186

ndarray: Folded joint SFS

1187

"""

1188

1189

def scale_sfs(sfs, theta=1.0):

1190

"""

1191

Scale an existing site frequency spectrum.

1192

1193

Args:

1194

sfs (array_like): SFS to scale

1195

theta (float): Scaling factor

1196

1197

Returns:

1198

ndarray: Scaled SFS

1199

"""

1200

1201

def scale_sfs_folded(sfs, theta=1.0):

1202

"""

1203

Scale an existing folded site frequency spectrum.

1204

1205

Args:

1206

sfs (array_like): Folded SFS to scale

1207

theta (float): Scaling factor

1208

1209

Returns:

1210

ndarray: Scaled folded SFS

1211

"""

1212

1213

def scale_joint_sfs(joint_sfs, theta=1.0):

1214

"""

1215

Scale an existing joint site frequency spectrum.

1216

1217

Args:

1218

joint_sfs (array_like): Joint SFS to scale

1219

theta (float): Scaling factor

1220

1221

Returns:

1222

ndarray: Scaled joint SFS

1223

"""

1224

1225

def scale_joint_sfs_folded(joint_sfs, theta=1.0):

1226

"""

1227

Scale an existing folded joint site frequency spectrum.

1228

1229

Args:

1230

joint_sfs (array_like): Folded joint SFS to scale

1231

theta (float): Scaling factor

1232

1233

Returns:

1234

ndarray: Scaled folded joint SFS

1235

"""

1236

```

1237

{ .api }

1238

1239

### SFS Visualization

1240

1241

Plot site frequency spectra.

1242

1243

```python

1244

# Basic SFS plots

1245

allel.plot_sfs(sfs)

1246

allel.plot_sfs_folded(sfs_folded)

1247

allel.plot_sfs_scaled(sfs_scaled)

1248

allel.plot_sfs_folded_scaled(sfs_folded_scaled)

1249

1250

# Joint SFS plots

1251

allel.plot_joint_sfs(joint_sfs)

1252

allel.plot_joint_sfs_folded(joint_sfs_folded)

1253

allel.plot_joint_sfs_scaled(joint_sfs_scaled)

1254

allel.plot_joint_sfs_folded_scaled(joint_sfs_folded_scaled)

1255

```

1256

{ .api }

1257

1258

```python

1259

def plot_sfs(sfs, ax=None, n=None):

1260

"""

1261

Plot site frequency spectrum.

1262

1263

Args:

1264

sfs (array_like): SFS values

1265

ax (matplotlib axis, optional): Axis to plot on

1266

n (int, optional): Sample size for x-axis labels

1267

1268

Returns:

1269

matplotlib figure: SFS plot

1270

"""

1271

1272

def plot_sfs_folded(sfs, ax=None, n=None):

1273

"""

1274

Plot folded site frequency spectrum.

1275

1276

Args:

1277

sfs (array_like): Folded SFS values

1278

ax (matplotlib axis, optional): Axis to plot on

1279

n (int, optional): Sample size for x-axis labels

1280

1281

Returns:

1282

matplotlib figure: Folded SFS plot

1283

"""

1284

1285

def plot_sfs_scaled(sfs, ax=None, n=None):

1286

"""

1287

Plot scaled site frequency spectrum.

1288

1289

Args:

1290

sfs (array_like): Scaled SFS values

1291

ax (matplotlib axis, optional): Axis to plot on

1292

n (int, optional): Sample size

1293

1294

Returns:

1295

matplotlib figure: Scaled SFS plot

1296

"""

1297

1298

def plot_sfs_folded_scaled(sfs, ax=None, n=None):

1299

"""

1300

Plot scaled folded site frequency spectrum.

1301

1302

Args:

1303

sfs (array_like): Scaled folded SFS values

1304

ax (matplotlib axis, optional): Axis to plot on

1305

n (int, optional): Sample size

1306

1307

Returns:

1308

matplotlib figure: Scaled folded SFS plot

1309

"""

1310

1311

def plot_joint_sfs(joint_sfs, ax=None):

1312

"""

1313

Plot joint site frequency spectrum as heatmap.

1314

1315

Args:

1316

joint_sfs (array_like): Joint SFS matrix

1317

ax (matplotlib axis, optional): Axis to plot on

1318

1319

Returns:

1320

matplotlib figure: Joint SFS heatmap

1321

"""

1322

1323

def plot_joint_sfs_folded(joint_sfs, ax=None):

1324

"""

1325

Plot folded joint site frequency spectrum.

1326

1327

Args:

1328

joint_sfs (array_like): Folded joint SFS matrix

1329

ax (matplotlib axis, optional): Axis to plot on

1330

1331

Returns:

1332

matplotlib figure: Folded joint SFS heatmap

1333

"""

1334

1335

def plot_joint_sfs_scaled(joint_sfs, ax=None):

1336

"""

1337

Plot scaled joint site frequency spectrum.

1338

1339

Args:

1340

joint_sfs (array_like): Scaled joint SFS matrix

1341

ax (matplotlib axis, optional): Axis to plot on

1342

1343

Returns:

1344

matplotlib figure: Scaled joint SFS heatmap

1345

"""

1346

1347

def plot_joint_sfs_folded_scaled(joint_sfs, ax=None):

1348

"""

1349

Plot scaled folded joint site frequency spectrum.

1350

1351

Args:

1352

joint_sfs (array_like): Scaled folded joint SFS matrix

1353

ax (matplotlib axis, optional): Axis to plot on

1354

1355

Returns:

1356

matplotlib figure: Scaled folded joint SFS heatmap

1357

"""

1358

```

1359

{ .api }

1360

1361

## Hardy-Weinberg Equilibrium

1362

1363

### HWE Statistics

1364

1365

Test for deviations from Hardy-Weinberg equilibrium.

1366

1367

```python

1368

# Observed heterozygosity

1369

het_obs = allel.heterozygosity_observed(genotypes)

1370

1371

# Expected heterozygosity

1372

het_exp = allel.heterozygosity_expected(allele_frequencies)

1373

1374

# Inbreeding coefficient (FIS)

1375

fis = allel.inbreeding_coefficient(genotypes)

1376

```

1377

{ .api }

1378

1379

```python

1380

def heterozygosity_observed(g, alleles=None):

1381

"""

1382

Calculate observed heterozygosity per variant.

1383

1384

Args:

1385

g (GenotypeArray): Genotype data

1386

alleles (array_like, optional): Specific alleles to consider

1387

1388

Returns:

1389

ndarray: Observed heterozygosity per variant

1390

"""

1391

1392

def heterozygosity_expected(af, ploidy=2):

1393

"""

1394

Calculate expected heterozygosity under Hardy-Weinberg equilibrium.

1395

1396

Args:

1397

af (array_like): Allele frequencies

1398

ploidy (int): Ploidy level

1399

1400

Returns:

1401

ndarray: Expected heterozygosity per variant

1402

"""

1403

1404

def inbreeding_coefficient(g, alleles=None):

1405

"""

1406

Calculate inbreeding coefficient (FIS) per variant.

1407

1408

Args:

1409

g (GenotypeArray): Genotype data

1410

alleles (array_like, optional): Specific alleles to consider

1411

1412

Returns:

1413

ndarray: Inbreeding coefficient per variant

1414

"""

1415

```

1416

{ .api }

1417

1418

## Admixture Statistics

1419

1420

### Patterson's F-statistics

1421

1422

Test for population admixture and phylogenetic relationships.

1423

1424

```python

1425

# F2 statistic (genetic distance)

1426

f2_values = allel.patterson_f2(ac_pop1, ac_pop2)

1427

1428

# F3 statistic (admixture test)

1429

f3_values = allel.patterson_f3(ac_pop1, ac_pop2, ac_pop3)

1430

1431

# D statistic (4-population test)

1432

d_values = allel.patterson_d(ac_pop1, ac_pop2, ac_pop3, ac_pop4)

1433

1434

# Blockwise calculations for bootstrapping

1435

f3_blocks = allel.blockwise_patterson_f3(ac_pop1, ac_pop2, ac_pop3, blocks)

1436

d_blocks = allel.blockwise_patterson_d(ac_pop1, ac_pop2, ac_pop3, ac_pop4, blocks)

1437

1438

# Average statistics

1439

avg_f3 = allel.average_patterson_f3(ac_pop1, ac_pop2, ac_pop3)

1440

avg_d = allel.average_patterson_d(ac_pop1, ac_pop2, ac_pop3, ac_pop4)

1441

1442

# Moving window analysis

1443

f3_moving = allel.moving_patterson_f3(positions, ac_pop1, ac_pop2, ac_pop3, size=100000)

1444

d_moving = allel.moving_patterson_d(positions, ac_pop1, ac_pop2, ac_pop3, ac_pop4, size=100000)

1445

```

1446

{ .api }

1447

1448

**Key Functions:**

1449

1450

```python

1451

def patterson_f2(ac1, ac2):

1452

"""

1453

Calculate Patterson's F2 statistic (genetic distance).

1454

1455

Args:

1456

ac1 (AlleleCountsArray): Allele counts for population 1

1457

ac2 (AlleleCountsArray): Allele counts for population 2

1458

1459

Returns:

1460

ndarray: F2 values per variant

1461

"""

1462

1463

def patterson_f3(ac1, ac2, ac3):

1464

"""

1465

Calculate Patterson's F3 statistic (test for admixture).

1466

1467

Args:

1468

ac1 (AlleleCountsArray): Allele counts for population 1 (test population)

1469

ac2 (AlleleCountsArray): Allele counts for population 2 (source 1)

1470

ac3 (AlleleCountsArray): Allele counts for population 3 (source 2)

1471

1472

Returns:

1473

ndarray: F3 values per variant

1474

"""

1475

1476

def patterson_d(ac1, ac2, ac3, ac4):

1477

"""

1478

Calculate Patterson's D statistic (4-population test).

1479

1480

Args:

1481

ac1 (AlleleCountsArray): Population 1 allele counts

1482

ac2 (AlleleCountsArray): Population 2 allele counts

1483

ac3 (AlleleCountsArray): Population 3 allele counts

1484

ac4 (AlleleCountsArray): Population 4 allele counts (outgroup)

1485

1486

Returns:

1487

ndarray: D values per variant

1488

"""

1489

1490

def blockwise_patterson_f3(ac1, ac2, ac3, blocks):

1491

"""

1492

Calculate F3 statistic for genomic blocks.

1493

1494

Args:

1495

ac1 (AlleleCountsArray): Population 1 allele counts

1496

ac2 (AlleleCountsArray): Population 2 allele counts

1497

ac3 (AlleleCountsArray): Population 3 allele counts

1498

blocks (array_like): Block definitions

1499

1500

Returns:

1501

ndarray: F3 values per block

1502

"""

1503

1504

def blockwise_patterson_d(ac1, ac2, ac3, ac4, blocks):

1505

"""

1506

Calculate D statistic for genomic blocks.

1507

1508

Args:

1509

ac1 (AlleleCountsArray): Population 1 allele counts

1510

ac2 (AlleleCountsArray): Population 2 allele counts

1511

ac3 (AlleleCountsArray): Population 3 allele counts

1512

ac4 (AlleleCountsArray): Population 4 allele counts

1513

blocks (array_like): Block definitions

1514

1515

Returns:

1516

ndarray: D values per block

1517

"""

1518

1519

def average_patterson_f3(ac1, ac2, ac3):

1520

"""

1521

Calculate average F3 statistic across all variants.

1522

1523

Args:

1524

ac1 (AlleleCountsArray): Population 1 allele counts

1525

ac2 (AlleleCountsArray): Population 2 allele counts

1526

ac3 (AlleleCountsArray): Population 3 allele counts

1527

1528

Returns:

1529

float: Average F3 value

1530

"""

1531

1532

def average_patterson_d(ac1, ac2, ac3, ac4):

1533

"""

1534

Calculate average D statistic across all variants.

1535

1536

Args:

1537

ac1 (AlleleCountsArray): Population 1 allele counts

1538

ac2 (AlleleCountsArray): Population 2 allele counts

1539

ac3 (AlleleCountsArray): Population 3 allele counts

1540

ac4 (AlleleCountsArray): Population 4 allele counts

1541

1542

Returns:

1543

float: Average D value

1544

"""

1545

1546

def moving_patterson_f3(pos, ac1, ac2, ac3, size, start=None, stop=None, step=None):

1547

"""

1548

Calculate F3 statistic in sliding windows.

1549

1550

Args:

1551

pos (array_like): Variant positions

1552

ac1 (AlleleCountsArray): Population 1 allele counts

1553

ac2 (AlleleCountsArray): Population 2 allele counts

1554

ac3 (AlleleCountsArray): Population 3 allele counts

1555

size (int): Window size in base pairs

1556

start (int, optional): Start position

1557

stop (int, optional): Stop position

1558

step (int, optional): Step size

1559

1560

Returns:

1561

tuple: (window_centers, f3_values)

1562

"""

1563

1564

def moving_patterson_d(pos, ac1, ac2, ac3, ac4, size, start=None, stop=None, step=None):

1565

"""

1566

Calculate D statistic in sliding windows.

1567

1568

Args:

1569

pos (array_like): Variant positions

1570

ac1 (AlleleCountsArray): Population 1 allele counts

1571

ac2 (AlleleCountsArray): Population 2 allele counts

1572

ac3 (AlleleCountsArray): Population 3 allele counts

1573

ac4 (AlleleCountsArray): Population 4 allele counts

1574

size (int): Window size in base pairs

1575

start (int, optional): Start position

1576

stop (int, optional): Stop position

1577

step (int, optional): Step size

1578

1579

Returns:

1580

tuple: (window_centers, d_values)

1581

"""

1582

```

1583

{ .api }

1584

1585

## Windowed Analysis

1586

1587

### Window Functions

1588

1589

Apply statistical methods over genomic windows.

1590

1591

```python

1592

# Create equally-sized windows

1593

windows = allel.equally_accessible_windows(positions, size=10000)

1594

1595

# Apply any statistic to windows

1596

window_stats = allel.windowed_statistic(positions, values, np.mean, windows)

1597

1598

# Moving window statistics

1599

moving_stats = allel.moving_statistic(values, np.mean, size=100, step=50)

1600

1601

# Count variants in windows

1602

counts = allel.windowed_count(positions, windows)

1603

1604

# Per-base analysis

1605

per_base_values = allel.per_base(positions, values, start=1000, stop=2000)

1606

1607

# Moving statistics with different functions

1608

moving_means = allel.moving_mean(values, size=100, step=50)

1609

moving_stds = allel.moving_std(values, size=100, step=50)

1610

moving_mids = allel.moving_midpoint(positions, size=100, step=50)

1611

1612

# Window coordinate helpers

1613

index_wins = allel.index_windows(positions, window_size=100)

1614

pos_wins = allel.position_windows(positions, window_size=10000, step=5000)

1615

win_locs = allel.window_locations(windows)

1616

```

1617

{ .api }

1618

1619

### Additional Window Functions

1620

1621

More specialized windowing utilities.

1622

1623

```python

1624

def windowed_count(pos, windows, is_accessible=None):

1625

"""

1626

Count variants in genomic windows.

1627

1628

Args:

1629

pos (array_like): Variant positions

1630

windows (array_like): Window coordinates as (start, stop) pairs

1631

is_accessible (array_like, optional): Accessible sites mask

1632

1633

Returns:

1634

ndarray: Variant counts per window

1635

"""

1636

1637

def per_base(pos, values, start=None, stop=None, is_accessible=None):

1638

"""

1639

Calculate per-base values over a genomic region.

1640

1641

Args:

1642

pos (array_like): Variant positions

1643

values (array_like): Values per variant

1644

start (int, optional): Start position

1645

stop (int, optional): Stop position

1646

is_accessible (array_like, optional): Accessible sites mask

1647

1648

Returns:

1649

float: Per-base value

1650

"""

1651

1652

def moving_mean(values, size, step=1):

1653

"""

1654

Calculate moving mean over a sequence of values.

1655

1656

Args:

1657

values (array_like): Input values

1658

size (int): Window size

1659

step (int): Step size between windows

1660

1661

Returns:

1662

ndarray: Moving mean values

1663

"""

1664

1665

def moving_std(values, size, step=1):

1666

"""

1667

Calculate moving standard deviation.

1668

1669

Args:

1670

values (array_like): Input values

1671

size (int): Window size

1672

step (int): Step size between windows

1673

1674

Returns:

1675

ndarray: Moving standard deviation values

1676

"""

1677

1678

def moving_midpoint(positions, size, step=1):

1679

"""

1680

Calculate midpoints for moving windows.

1681

1682

Args:

1683

positions (array_like): Position coordinates

1684

size (int): Window size

1685

step (int): Step size between windows

1686

1687

Returns:

1688

ndarray: Window midpoint positions

1689

"""

1690

1691

def index_windows(positions, window_size):

1692

"""

1693

Create windows based on variant indices.

1694

1695

Args:

1696

positions (array_like): Variant positions

1697

window_size (int): Number of variants per window

1698

1699

Returns:

1700

ndarray: Window definitions as index ranges

1701

"""

1702

1703

def position_windows(positions, window_size, step=None):

1704

"""

1705

Create windows based on genomic positions.

1706

1707

Args:

1708

positions (array_like): Variant positions

1709

window_size (int): Window size in base pairs

1710

step (int, optional): Step size between windows

1711

1712

Returns:

1713

ndarray: Window coordinates as (start, stop) pairs

1714

"""

1715

1716

def window_locations(windows):

1717

"""

1718

Calculate locations (centers) of genomic windows.

1719

1720

Args:

1721

windows (array_like): Window coordinates as (start, stop) pairs

1722

1723

Returns:

1724

ndarray: Window center positions

1725

"""

1726

```

1727

{ .api }

1728

1729

## Mendelian Analysis

1730

1731

### Inheritance Pattern Analysis

1732

1733

Analyze transmission patterns in family data to identify Mendelian errors and phase genotypes.

1734

1735

```python

1736

# Identify Mendelian errors in trio data

1737

errors = allel.mendel_errors(offspring_genotypes, parent_genotypes)

1738

1739

# Phase genotypes using transmission patterns

1740

phased_offspring = allel.phase_progeny_by_transmission(offspring_genotypes, transmission)

1741

phased_parents = allel.phase_parents_by_transmission(parent_genotypes, transmission)

1742

1743

# Complete phasing workflow

1744

phased_offspring, phased_parents = allel.phase_by_transmission(offspring_genotypes, parent_genotypes)

1745

```

1746

{ .api }

1747

1748

**Key Functions:**

1749

1750

```python

1751

def mendel_errors(offspring_genotypes, parent_genotypes):

1752

"""

1753

Identify Mendelian inheritance errors in family data.

1754

1755

Args:

1756

offspring_genotypes (GenotypeArray): Genotypes of offspring

1757

parent_genotypes (GenotypeArray): Genotypes of parents

1758

1759

Returns:

1760

ndarray: Array indicating Mendelian error status for each variant/offspring pair

1761

"""

1762

1763

def phase_progeny_by_transmission(offspring_genotypes, transmission):

1764

"""

1765

Phase offspring genotypes using transmission patterns.

1766

1767

Args:

1768

offspring_genotypes (GenotypeArray): Offspring genotype data

1769

transmission (array_like): Transmission pattern information

1770

1771

Returns:

1772

GenotypeArray: Phased offspring genotypes

1773

"""

1774

1775

def phase_parents_by_transmission(parent_genotypes, transmission):

1776

"""

1777

Phase parent genotypes using transmission patterns.

1778

1779

Args:

1780

parent_genotypes (GenotypeArray): Parent genotype data

1781

transmission (array_like): Transmission pattern information

1782

1783

Returns:

1784

GenotypeArray: Phased parent genotypes

1785

"""

1786

1787

def phase_by_transmission(offspring_genotypes, parent_genotypes):

1788

"""

1789

Phase both offspring and parent genotypes using transmission patterns.

1790

1791

Args:

1792

offspring_genotypes (GenotypeArray): Offspring genotypes

1793

parent_genotypes (GenotypeArray): Parent genotypes

1794

1795

Returns:

1796

tuple: (phased_offspring, phased_parents) GenotypeArray objects

1797

"""

1798

```

1799

{ .api }

1800

1801

**Inheritance Constants:**

1802

1803

```python

1804

# Mendelian inheritance pattern constants

1805

INHERIT_MISSING = -1 # Missing inheritance pattern

1806

INHERIT_NONPARENTAL = 0 # Non-parental allele

1807

INHERIT_NONSEG_ALT = 1 # Non-segregating alternative

1808

INHERIT_NONSEG_REF = 2 # Non-segregating reference

1809

INHERIT_PARENT1 = 3 # Inherited from parent 1

1810

INHERIT_PARENT2 = 4 # Inherited from parent 2

1811

INHERIT_PARENT_MISSING = 5 # Parent has missing data

1812

INHERIT_UNDETERMINED = 6 # Cannot determine inheritance

1813

```

1814

{ .api }

1815

1816

## Runs of Homozygosity

1817

1818

### ROH Detection

1819

1820

Detect runs of homozygosity using Hidden Markov Models.

1821

1822

```python

1823

# Multi-state HMM for ROH detection

1824

roh_results = allel.roh_mhmm(genotypes, positions, phet_roh=0.001, phet_nonroh=0.01, transition=1e-6)

1825

1826

# Poisson HMM approach

1827

roh_poisson = allel.roh_poissonhmm(genotypes, positions, phet_roh=0.001, phet_nonroh=0.01,

1828

transition_hethet=1e-5, transition_homhom=1e-8)

1829

```

1830

{ .api }

1831

1832

```python

1833

def roh_mhmm(g, pos, phet_roh, phet_nonroh, transition, map_pos=None):

1834

"""

1835

Detect runs of homozygosity using multi-state Hidden Markov Model.

1836

1837

Args:

1838

g (GenotypeArray): Genotype data

1839

pos (array_like): Variant positions

1840

phet_roh (float): Probability of heterozygosity in ROH segments

1841

phet_nonroh (float): Probability of heterozygosity in non-ROH segments

1842

transition (float): Transition probability between states

1843

map_pos (array_like, optional): Genetic map positions

1844

1845

Returns:

1846

ndarray: ROH state predictions

1847

"""

1848

1849

def roh_poissonhmm(g, pos, phet_roh, phet_nonroh, transition_hethet, transition_homhom, map_pos=None):

1850

"""

1851

Detect runs of homozygosity using Poisson Hidden Markov Model.

1852

1853

Args:

1854

g (GenotypeArray): Genotype data

1855

pos (array_like): Variant positions

1856

phet_roh (float): Het probability in ROH

1857

phet_nonroh (float): Het probability in non-ROH

1858

transition_hethet (float): Transition probability between het states

1859

transition_homhom (float): Transition probability between hom states

1860

map_pos (array_like, optional): Genetic map positions

1861

1862

Returns:

1863

ndarray: ROH state predictions

1864

"""

1865

```

1866

{ .api }

1867

1868

## Windowed Analysis

1869

1870

### Window Functions

1871

1872

Apply statistical methods over genomic windows.

1873

1874

```python

1875

# Create equally-sized windows

1876

windows = allel.equally_accessible_windows(positions, size=10000)

1877

1878

# Apply any statistic to windows

1879

window_stats = allel.windowed_statistic(positions, values, np.mean, windows)

1880

1881

# Moving window statistics

1882

moving_stats = allel.moving_statistic(values, np.mean, size=100, step=50)

1883

```

1884

{ .api }

1885

1886

```python

1887

def windowed_statistic(pos, values, statistic, windows, fill=None):

1888

"""

1889

Apply a statistic function to values within genomic windows.

1890

1891

Args:

1892

pos (array_like): Positions corresponding to values

1893

values (array_like): Values to calculate statistics on

1894

statistic (callable): Function to apply (e.g., np.mean, np.sum)

1895

windows (array_like): Window coordinates as (start, stop) pairs

1896

fill (float, optional): Fill value for empty windows

1897

1898

Returns:

1899

ndarray: Statistic values per window

1900

"""

1901

1902

def equally_accessible_windows(pos, size, start=None, stop=None):

1903

"""

1904

Create equally-sized genomic windows.

1905

1906

Args:

1907

pos (array_like): Variant positions to determine window bounds

1908

size (int): Window size in base pairs

1909

start (int, optional): Start coordinate

1910

stop (int, optional): Stop coordinate

1911

1912

Returns:

1913

ndarray: Window coordinates as (start, stop) pairs

1914

"""

1915

1916

def moving_statistic(values, statistic, size, step=1):

1917

"""

1918

Apply statistic over a moving window.

1919

1920

Args:

1921

values (array_like): Input values

1922

statistic (callable): Function to apply

1923

size (int): Window size

1924

step (int): Step size between windows

1925

1926

Returns:

1927

ndarray: Statistic values per window

1928

"""

1929

```

1930

{ .api }

1931

1932

## Miscellaneous Functions

1933

1934

### Utility Functions

1935

1936

Additional statistical utilities and visualization tools.

1937

1938

```python

1939

# Variant locator plot for exploring data

1940

allel.plot_variant_locator(genotypes, positions, start=1000, stop=2000)

1941

1942

# Tabulate state transitions (for HMM analysis)

1943

transitions = allel.tabulate_state_transitions(states)

1944

1945

# Tabulate state blocks

1946

blocks = allel.tabulate_state_blocks(states)

1947

```

1948

{ .api }

1949

1950

```python

1951

def plot_variant_locator(gn, pos, start=None, stop=None, ax=None):

1952

"""

1953

Plot variant locator for exploring genotype data.

1954

1955

Args:

1956

gn (GenotypeArray): Genotype data

1957

pos (array_like): Variant positions

1958

start (int, optional): Start position for plotting

1959

stop (int, optional): Stop position for plotting

1960

ax (matplotlib axis, optional): Axis to plot on

1961

1962

Returns:

1963

matplotlib figure: Variant locator plot

1964

"""

1965

1966

def tabulate_state_transitions(states, pos=None):

1967

"""

1968

Tabulate transitions between states in a sequence.

1969

1970

Args:

1971

states (array_like): State sequence

1972

pos (array_like, optional): Positions corresponding to states

1973

1974

Returns:

1975

DataFrame: Transition table with counts and positions

1976

"""

1977

1978

def tabulate_state_blocks(states, pos=None):

1979

"""

1980

Tabulate contiguous blocks of the same state.

1981

1982

Args:

1983

states (array_like): State sequence

1984

pos (array_like, optional): Positions corresponding to states

1985

1986

Returns:

1987

DataFrame: Block table with start/stop positions and lengths

1988

"""

1989

```

1990

{ .api }