Tessl Tile for pypi/scikit-allel@1.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

data-structures.md index.md io.md statistics.md utilities.md

statistics.mddocs/

0
# Statistics
1

2
Scikit-allel provides comprehensive statistical methods for population genetics analysis, including diversity metrics, population structure analysis, linkage disequilibrium, and tests for natural selection.
3

4
## Diversity Analysis
5

6
### Sequence Diversity
7

8
Calculate nucleotide diversity (π) and related metrics.
9

10
```python
11
import allel
12

13
# Basic sequence diversity
14
diversity = allel.sequence_diversity(positions, allele_counts, start=None, stop=None)
15

16
# Mean pairwise differences
17
mpd = allel.mean_pairwise_difference(allele_counts)
18

19
# Between-population differences  
20
div = allel.mean_pairwise_difference_between(ac1, ac2)
21
```
22
{ .api }
23

24
**Key Functions:**
25

26
```python
27
def sequence_diversity(pos, ac, start=None, stop=None, is_accessible=None):
28
    """
29
    Calculate sequence diversity (π) over a genomic region.
30
    
31
    Args:
32
        pos (array_like): Variant positions
33
        ac (AlleleCountsArray): Allele counts per variant
34
        start (int, optional): Start position for calculation
35
        stop (int, optional): Stop position for calculation
36
        is_accessible (array_like, optional): Boolean array of accessible sites
37
        
38
    Returns:
39
        float: Sequence diversity value
40
    """
41

42
def mean_pairwise_difference(ac, an=None):
43
    """
44
    Calculate mean pairwise nucleotide differences.
45
    
46
    Args:
47
        ac (AlleleCountsArray): Allele counts per variant
48
        an (array_like, optional): Total allele numbers per variant
49
        
50
    Returns:
51
        ndarray: Mean pairwise differences per variant
52
    """
53

54
def windowed_diversity(pos, ac, windows, is_accessible=None):
55
    """
56
    Calculate diversity in genomic windows.
57
    
58
    Args:
59
        pos (array_like): Variant positions
60
        ac (AlleleCountsArray): Allele counts
61
        windows (array_like): Window coordinates as (start, stop) pairs
62
        is_accessible (array_like, optional): Accessible sites mask
63
        
64
    Returns:
65
        ndarray: Diversity values per window
66
    """
67
```
68
{ .api }
69

70
### Sequence Divergence
71

72
Calculate divergence between populations or sequences.
73

74
```python
75
# Between-population divergence
76
divergence = allel.sequence_divergence(positions, ac_pop1, ac_pop2, start=1000, stop=2000)
77

78
# Windowed divergence analysis
79
div_windows = allel.windowed_divergence(positions, ac_pop1, ac_pop2, windows)
80

81
# Windowed differentiation
82
df_windows = allel.windowed_df(positions, ac_pop1, ac_pop2, windows)
83
```
84
{ .api }
85

86
```python
87
def sequence_divergence(pos, ac1, ac2, start=None, stop=None, is_accessible=None):
88
    """
89
    Calculate sequence divergence between two populations.
90
    
91
    Args:
92
        pos (array_like): Variant positions
93
        ac1 (AlleleCountsArray): Allele counts for population 1
94
        ac2 (AlleleCountsArray): Allele counts for population 2
95
        start (int, optional): Start position
96
        stop (int, optional): Stop position
97
        is_accessible (array_like, optional): Accessible sites mask
98
        
99
    Returns:
100
        float: Sequence divergence value
101
    """
102

103
def windowed_divergence(pos, ac1, ac2, windows, is_accessible=None):
104
    """
105
    Calculate divergence between populations in genomic windows.
106
    
107
    Args:
108
        pos (array_like): Variant positions
109
        ac1 (AlleleCountsArray): Allele counts for population 1
110
        ac2 (AlleleCountsArray): Allele counts for population 2
111
        windows (array_like): Window coordinates
112
        is_accessible (array_like, optional): Accessible sites mask
113
        
114
    Returns:
115
        ndarray: Divergence values per window
116
    """
117

118
def windowed_df(pos, ac1, ac2, windows, is_accessible=None):
119
    """
120
    Calculate windowed differentiation (Df) between populations.
121
    
122
    Args:
123
        pos (array_like): Variant positions
124
        ac1 (AlleleCountsArray): Allele counts for population 1
125
        ac2 (AlleleCountsArray): Allele counts for population 2
126
        windows (array_like): Window coordinates
127
        is_accessible (array_like, optional): Accessible sites mask
128
        
129
    Returns:
130
        ndarray: Df values per window
131
    """
132
```
133
{ .api }
134

135
### Watterson's Theta
136

137
Estimate population mutation rate using Watterson's method.
138

139
```python
140
# Single value
141
theta_w = allel.watterson_theta(allele_counts)
142

143
# Windowed analysis
144
theta_windows = allel.windowed_watterson_theta(positions, allele_counts, windows)
145
```
146
{ .api }
147

148
```python
149
def watterson_theta(ac, an=None):
150
    """
151
    Calculate Watterson's estimator of theta (population mutation rate).
152
    
153
    Args:
154
        ac (AlleleCountsArray): Allele counts per variant
155
        an (array_like, optional): Total allele numbers per variant
156
        
157
    Returns:
158
        float: Watterson's theta estimate
159
    """
160

161
def windowed_watterson_theta(pos, ac, windows, is_accessible=None):
162
    """
163
    Calculate Watterson's theta in genomic windows.
164
    
165
    Args:
166
        pos (array_like): Variant positions
167
        ac (AlleleCountsArray): Allele counts
168
        windows (array_like): Window coordinates
169
        is_accessible (array_like, optional): Accessible sites mask
170
        
171
    Returns:
172
        ndarray: Theta values per window
173
    """
174
```
175
{ .api }
176

177
### Tajima's D
178

179
Test for departure from neutral evolution.
180

181
```python
182
# Single genomic region
183
tajima_d = allel.tajima_d(allele_counts, positions, start=1000, stop=2000)
184

185
# Windowed analysis
186
d_windows = allel.windowed_tajima_d(positions, allele_counts, windows)
187

188
# Moving window  
189
d_moving = allel.moving_tajima_d(positions, allele_counts, size=1000, step=500)
190
```
191
{ .api }
192

193
```python
194
def tajima_d(ac, pos=None, start=None, stop=None, is_accessible=None):
195
    """
196
    Calculate Tajima's D test statistic.
197
    
198
    Args:
199
        ac (AlleleCountsArray): Allele counts per variant
200
        pos (array_like, optional): Variant positions
201
        start (int, optional): Start position
202
        stop (int, optional): Stop position
203
        is_accessible (array_like, optional): Accessible sites mask
204
        
205
    Returns:
206
        float: Tajima's D value
207
    """
208

209
def windowed_tajima_d(pos, ac, windows, is_accessible=None):
210
    """
211
    Calculate Tajima's D in genomic windows.
212
    
213
    Args:
214
        pos (array_like): Variant positions
215
        ac (AlleleCountsArray): Allele counts
216
        windows (array_like): Window coordinates
217
        is_accessible (array_like, optional): Accessible sites mask
218
        
219
    Returns:
220
        ndarray: Tajima's D values per window
221
    """
222

223
def moving_tajima_d(pos, ac, size, start=None, stop=None, step=None, is_accessible=None):
224
    """
225
    Calculate Tajima's D in sliding windows.
226
    
227
    Args:
228
        pos (array_like): Variant positions
229
        ac (AlleleCountsArray): Allele counts
230
        size (int): Window size in base pairs
231
        start (int, optional): Start position
232
        stop (int, optional): Stop position
233
        step (int, optional): Step size between windows
234
        is_accessible (array_like, optional): Accessible sites mask
235
        
236
    Returns:
237
        tuple: (window_centers, tajima_d_values)
238
    """
239
```
240
{ .api }
241

242
## Population Structure (FST)
243

244
### FST Estimators
245

246
Calculate fixation indices between populations.
247

248
```python
249
# Weir & Cockerham FST
250
fst_wc = allel.weir_cockerham_fst(genotypes, subpops)
251

252
# Hudson's FST
253
fst_hudson = allel.hudson_fst(allele_counts, subpops)
254

255
# Patterson's FST (for two populations)
256
fst_patterson = allel.patterson_fst(ac_pop1, ac_pop2)
257
```
258
{ .api }
259

260
**Key Functions:**
261

262
```python
263
def weir_cockerham_fst(g, subpops, max_allele=None):
264
    """
265
    Calculate Weir & Cockerham FST between subpopulations.
266
    
267
    Args:
268
        g (GenotypeArray): Genotype data
269
        subpops (list): List of arrays containing sample indices for each subpopulation
270
        max_allele (int, optional): Maximum allele number to consider
271
        
272
    Returns:
273
        tuple: (a, b, c) variance components for FST calculation
274
    """
275

276
def hudson_fst(ac, subpops):
277
    """
278
    Calculate Hudson's FST estimator.
279
    
280
    Args:
281
        ac (AlleleCountsArray): Allele counts per variant
282
        subpops (list): Subpopulation definitions
283
        
284
    Returns:
285
        tuple: (num, den) numerator and denominator for FST calculation
286
    """
287

288
def average_weir_cockerham_fst(g, subpops, max_allele=None):
289
    """
290
    Calculate average FST across all variants.
291
    
292
    Args:
293
        g (GenotypeArray): Genotype data
294
        subpops (list): Subpopulation sample indices
295
        max_allele (int, optional): Maximum allele number
296
        
297
    Returns:
298
        float: Average FST value
299
    """
300

301
def windowed_weir_cockerham_fst(pos, g, subpops, windows, max_allele=None):
302
    """
303
    Calculate FST in genomic windows.
304
    
305
    Args:
306
        pos (array_like): Variant positions
307
        g (GenotypeArray): Genotype data
308
        subpops (list): Subpopulation definitions
309
        windows (array_like): Window coordinates
310
        max_allele (int, optional): Maximum allele number
311
        
312
    Returns:
313
        ndarray: FST values per window
314
    """
315
```
316
{ .api }
317

318
### Additional FST Methods
319

320
Advanced FST calculations for specific analysis needs.
321

322
```python
323
# Blockwise FST calculations (for bootstrapping)
324
fst_blocks_wc = allel.blockwise_weir_cockerham_fst(genotypes, subpops, blocks)
325
fst_blocks_hudson = allel.blockwise_hudson_fst(ac, subpops, blocks)  
326
fst_blocks_patterson = allel.blockwise_patterson_fst(ac1, ac2, blocks)
327

328
# Moving window FST
329
fst_moving_hudson = allel.moving_hudson_fst(positions, ac, subpops, size=10000)
330
fst_moving_patterson = allel.moving_patterson_fst(positions, ac1, ac2, size=10000)
331
fst_moving_wc = allel.moving_weir_cockerham_fst(positions, genotypes, subpops, size=10000)
332

333
# Average FST calculations
334
avg_hudson = allel.average_hudson_fst(ac, subpops)
335
avg_patterson = allel.average_patterson_fst(ac1, ac2)
336
```
337
{ .api }
338

339
```python
340
def blockwise_weir_cockerham_fst(g, subpops, blocks, max_allele=None):
341
    """
342
    Calculate Weir & Cockerham FST for genomic blocks.
343
    
344
    Args:
345
        g (GenotypeArray): Genotype data
346
        subpops (list): Subpopulation sample indices
347
        blocks (array_like): Block definitions
348
        max_allele (int, optional): Maximum allele number
349
        
350
    Returns:
351
        ndarray: FST values per block
352
    """
353

354
def blockwise_hudson_fst(ac, subpops, blocks):
355
    """
356
    Calculate Hudson's FST for genomic blocks.
357
    
358
    Args:
359
        ac (AlleleCountsArray): Allele counts
360
        subpops (list): Subpopulation definitions
361
        blocks (array_like): Block definitions
362
        
363
    Returns:
364
        ndarray: FST values per block
365
    """
366

367
def blockwise_patterson_fst(ac1, ac2, blocks):
368
    """
369
    Calculate Patterson's FST for genomic blocks.
370
    
371
    Args:
372
        ac1 (AlleleCountsArray): Allele counts for population 1
373
        ac2 (AlleleCountsArray): Allele counts for population 2
374
        blocks (array_like): Block definitions
375
        
376
    Returns:
377
        ndarray: FST values per block
378
    """
379

380
def moving_hudson_fst(pos, ac, subpops, size, start=None, stop=None, step=None):
381
    """
382
    Calculate Hudson's FST in sliding windows.
383
    
384
    Args:
385
        pos (array_like): Variant positions
386
        ac (AlleleCountsArray): Allele counts
387
        subpops (list): Subpopulation definitions
388
        size (int): Window size in base pairs
389
        start (int, optional): Start position
390
        stop (int, optional): Stop position
391
        step (int, optional): Step size between windows
392
        
393
    Returns:
394
        tuple: (window_centers, fst_values)
395
    """
396

397
def moving_patterson_fst(pos, ac1, ac2, size, start=None, stop=None, step=None):
398
    """
399
    Calculate Patterson's FST in sliding windows.
400
    
401
    Args:
402
        pos (array_like): Variant positions
403
        ac1 (AlleleCountsArray): Population 1 allele counts
404
        ac2 (AlleleCountsArray): Population 2 allele counts
405
        size (int): Window size in base pairs
406
        start (int, optional): Start position
407
        stop (int, optional): Stop position
408
        step (int, optional): Step size between windows
409
        
410
    Returns:
411
        tuple: (window_centers, fst_values)
412
    """
413

414
def moving_weir_cockerham_fst(pos, g, subpops, size, start=None, stop=None, step=None, max_allele=None):
415
    """
416
    Calculate Weir & Cockerham FST in sliding windows.
417
    
418
    Args:
419
        pos (array_like): Variant positions
420
        g (GenotypeArray): Genotype data
421
        subpops (list): Subpopulation sample indices
422
        size (int): Window size in base pairs
423
        start (int, optional): Start position
424
        stop (int, optional): Stop position
425
        step (int, optional): Step size between windows
426
        max_allele (int, optional): Maximum allele number
427
        
428
    Returns:
429
        tuple: (window_centers, fst_values)
430
    """
431

432
def average_hudson_fst(ac, subpops):
433
    """
434
    Calculate average Hudson's FST across all variants.
435
    
436
    Args:
437
        ac (AlleleCountsArray): Allele counts
438
        subpops (list): Subpopulation definitions
439
        
440
    Returns:
441
        float: Average FST value
442
    """
443

444
def average_patterson_fst(ac1, ac2):
445
    """
446
    Calculate average Patterson's FST across all variants.
447
    
448
    Args:
449
        ac1 (AlleleCountsArray): Population 1 allele counts
450
        ac2 (AlleleCountsArray): Population 2 allele counts
451
        
452
    Returns:
453
        float: Average FST value
454
    """
455
```
456
{ .api }
457

458
## Linkage Disequilibrium
459

460
### LD Metrics
461

462
Measure linkage disequilibrium between variants.
463

464
```python
465
# Rogers & Huff r correlation
466
r = allel.rogers_huff_r(genotype_matrix)
467

468
# Between two sets of variants
469
r_between = allel.rogers_huff_r_between(gn_a, gn_b)
470

471
# Windowed r-squared
472
r2_windows = allel.windowed_r_squared(positions, genotype_matrix, size=10000)
473

474
# Visualize LD patterns
475
allel.plot_pairwise_ld(gn_subset, positions_subset)
476
```
477
{ .api }
478

479
```python
480
def rogers_huff_r(gn):
481
    """
482
    Calculate Rogers & Huff r correlation coefficient matrix.
483
    
484
    Args:
485
        gn (array_like): Genotype matrix (n_variants, n_samples)
486
        
487
    Returns:
488
        ndarray: Correlation matrix between all pairs of variants
489
    """
490

491
def rogers_huff_r_between(gn1, gn2):
492
    """
493
    Calculate Rogers & Huff r correlation between two sets of variants.
494
    
495
    Args:
496
        gn1 (array_like): First set of genotype data (n_variants1, n_samples)
497
        gn2 (array_like): Second set of genotype data (n_variants2, n_samples)
498
        
499
    Returns:
500
        ndarray: Correlation matrix between variants in the two sets
501
    """
502

503
def windowed_r_squared(pos, gn, size, start=None, stop=None, step=None):
504
    """
505
    Calculate r² in sliding windows.
506
    
507
    Args:
508
        pos (array_like): Variant positions
509
        gn (array_like): Genotype matrix
510
        size (int): Window size in base pairs
511
        start (int, optional): Start position
512
        stop (int, optional): Stop position
513
        step (int, optional): Step size between windows
514
        
515
    Returns:
516
        tuple: (window_centers, mean_r2_values)
517
    """
518

519
def locate_unlinked(gn, size, step, threshold):
520
    """
521
    Identify unlinked variants based on LD threshold.
522
    
523
    Args:
524
        gn (array_like): Genotype matrix
525
        size (int): Window size for LD calculation
526
        step (int): Step size between windows
527
        threshold (float): LD threshold for considering variants unlinked
528
        
529
    Returns:
530
        ndarray: Indices of unlinked variants
531
    """
532

533
def plot_pairwise_ld(gn, pos, start=None, stop=None, ax=None):
534
    """
535
    Plot pairwise linkage disequilibrium patterns.
536
    
537
    Args:
538
        gn (array_like): Genotype matrix
539
        pos (array_like): Variant positions
540
        start (int, optional): Start position for plotting
541
        stop (int, optional): Stop position for plotting
542
        ax (matplotlib axis, optional): Axis to plot on
543
        
544
    Returns:
545
        matplotlib figure: LD heatmap plot
546
    """
547
```
548
{ .api }
549

550
## Principal Component Analysis
551

552
### PCA Methods
553

554
Dimensionality reduction for population structure analysis.
555

556
```python
557
# Standard PCA
558
coords, model = allel.pca(genotype_matrix, n_components=10)
559

560
# Randomized PCA for large datasets
561
coords, model = allel.randomized_pca(genotype_matrix, n_components=10)
562

563
# Use different scalers
564
coords, model = allel.pca(genotype_matrix, scaler='patterson')
565
```
566
{ .api }
567

568
```python
569
def pca(gn, n_components=10, scaler=None, copy=True, random_state=None):
570
    """
571
    Perform principal component analysis on genotype data.
572
    
573
    Args:
574
        gn (array_like): Genotype matrix (n_variants, n_samples)
575
        n_components (int): Number of principal components to compute
576
        scaler (str or object, optional): Scaling method ('standard', 'patterson', or scaler object)
577
        copy (bool): Whether to copy input data
578
        random_state (int, optional): Random seed for reproducibility
579
        
580
    Returns:
581
        tuple: (coordinates, fitted_model) where coordinates are sample projections
582
    """
583

584
def randomized_pca(gn, n_components=10, scaler=None, copy=True, random_state=None):
585
    """
586
    Randomized PCA for large datasets.
587
    
588
    Args:
589
        gn (array_like): Genotype matrix
590
        n_components (int): Number of components
591
        scaler (str or object, optional): Scaling method
592
        copy (bool): Whether to copy data
593
        random_state (int, optional): Random seed
594
        
595
    Returns:
596
        tuple: (coordinates, fitted_model)
597
    """
598
```
599
{ .api }
600

601
## Data Preprocessing
602

603
### Scaling Methods
604

605
Standardize genotype data for analysis.
606

607
```python
608
# Standard scaling (mean=0, std=1)
609
scaler = allel.StandardScaler()
610
gn_scaled = scaler.fit_transform(genotype_matrix)
611

612
# Center scaling (mean=0)
613
scaler = allel.CenterScaler()
614
gn_centered = scaler.fit_transform(genotype_matrix)
615

616
# Patterson scaling (for PCA)
617
scaler = allel.PattersonScaler()
618
gn_patterson = scaler.fit_transform(genotype_matrix)
619

620
# Get scaler by name
621
scaler = allel.get_scaler('patterson')
622
```
623
{ .api }
624

625
```python
626
class StandardScaler:
627
    """
628
    Standardize genotype data to have zero mean and unit variance.
629
    """
630
    
631
    def fit(self, X):
632
        """Compute mean and standard deviation for later scaling."""
633
        
634
    def transform(self, X):
635
        """Scale data using computed statistics."""
636
        
637
    def fit_transform(self, X):
638
        """Fit scaler and transform data in one step."""
639

640
class CenterScaler:
641
    """
642
    Center genotype data to have zero mean.
643
    """
644
    
645
    def fit(self, X):
646
        """Compute mean for later centering."""
647
        
648
    def transform(self, X):
649
        """Center data using computed mean."""
650
        
651
    def fit_transform(self, X):
652
        """Fit scaler and center data in one step."""
653

654
class PattersonScaler:
655
    """
656
    Scale genotype data using Patterson method for PCA.
657
    
658
    Scales variants by their expected variance under Hardy-Weinberg
659
    equilibrium, which is appropriate for principal component analysis.
660
    """
661
    
662
    def fit(self, X):
663
        """Compute scaling factors."""
664
        
665
    def transform(self, X):
666
        """Apply Patterson scaling."""
667
        
668
    def fit_transform(self, X):
669
        """Fit and transform in one step."""
670

671
def get_scaler(name):
672
    """
673
    Get a scaler by name.
674
    
675
    Args:
676
        name (str): Scaler name ('standard', 'center', or 'patterson')
677
        
678
    Returns:
679
        scaler: Scaler object
680
    """
681
```
682
{ .api }
683

684
## Selection Tests
685

686
### Extended Haplotype Homozygosity
687

688
Test for recent positive selection using haplotype structure.
689

690
```python
691
# Integrated Haplotype Score (iHS)
692
ihs_scores = allel.ihs(haplotypes, map_positions, min_ehh=0.05)
693

694
# Cross-population Extended Haplotype Homozygosity (XP-EHH)
695
xpehh_scores = allel.xpehh(hap_pop1, hap_pop2, map_positions)
696

697
# Standardize scores
698
ihs_std = allel.standardize(ihs_scores)
699
```
700
{ .api }
701

702
```python
703
def ihs(h, map_pos, min_ehh=0.05, include_edges=False, gap_scale=20000, max_gap=200000, discard_integration_warnings=False):
704
    """
705
    Calculate integrated haplotype score (iHS) for each variant.
706
    
707
    Args:
708
        h (HaplotypeArray): Haplotype data
709
        map_pos (array_like): Genetic map positions in centiMorgans
710
        min_ehh (float): Minimum EHH value for integration cutoff
711
        include_edges (bool): Whether to include variants at edges
712
        gap_scale (float): Scale parameter for gap penalty
713
        max_gap (float): Maximum gap size to allow
714
        discard_integration_warnings (bool): Whether to suppress integration warnings
715
        
716
    Returns:
717
        ndarray: iHS scores per variant
718
    """
719

720
def xpehh(h1, h2, map_pos, min_ehh=0.05, include_edges=False, gap_scale=20000, max_gap=200000, discard_integration_warnings=False):
721
    """
722
    Calculate cross-population EHH (XP-EHH) between two populations.
723
    
724
    Args:
725
        h1 (HaplotypeArray): Haplotypes from population 1
726
        h2 (HaplotypeArray): Haplotypes from population 2
727
        map_pos (array_like): Genetic map positions
728
        min_ehh (float): Minimum EHH cutoff
729
        include_edges (bool): Include edge variants
730
        gap_scale (float): Gap penalty scale
731
        max_gap (float): Maximum allowed gap
732
        discard_integration_warnings (bool): Suppress warnings
733
        
734
    Returns:
735
        ndarray: XP-EHH scores per variant
736
    """
737

738
def standardize_by_allele_count(score, aac, flip=None):
739
    """
740
    Standardize test statistic by allele count.
741
    
742
    Args:
743
        score (array_like): Raw test statistic values
744
        aac (array_like): Alternate allele counts
745
        flip (array_like, optional): Boolean array indicating which scores to flip
746
        
747
    Returns:
748
        ndarray: Standardized scores
749
    """
750
```
751
{ .api }
752

753
### Number of Segregating Sites by Length
754

755
Alternative selection test less sensitive to recombination rate variation.
756

757
```python
758
# NSL scores
759
nsl_scores = allel.nsl(haplotypes, map_positions)
760

761
# Cross-population NSL
762
xpnsl_scores = allel.xpnsl(hap_pop1, hap_pop2, map_positions)
763
```
764
{ .api }
765

766
```python
767
def nsl(h, map_pos=None, max_extent=None):
768
    """
769
    Calculate number of segregating sites by length (NSL).
770
    
771
    Args:
772
        h (HaplotypeArray): Haplotype data
773
        map_pos (array_like, optional): Map positions for physical distance
774
        max_extent (float, optional): Maximum extent for calculation
775
        
776
    Returns:
777
        ndarray: NSL scores per variant
778
    """
779

780
def xpnsl(h1, h2, map_pos=None, max_extent=None):
781
    """
782
    Calculate cross-population NSL.
783
    
784
    Args:
785
        h1 (HaplotypeArray): Population 1 haplotypes
786
        h2 (HaplotypeArray): Population 2 haplotypes
787
        map_pos (array_like, optional): Map positions
788
        max_extent (float, optional): Maximum extent
789
        
790
    Returns:
791
        ndarray: XP-NSL scores per variant
792
    """
793
```
794
{ .api }
795

796
### Additional Selection Tests
797

798
More methods for detecting natural selection.
799

800
```python
801
# EHH decay analysis
802
ehh_values = allel.ehh_decay(haplotypes, map_positions, focal_variant=100)
803

804
# Voight painting (visualization of haplotype decay)
805
painting = allel.voight_painting(haplotypes, map_positions, focal_variant=100)
806
allel.plot_voight_painting(painting)
807

808
# Standardization methods
809
scores_std = allel.standardize(scores)
810
scores_std_ac = allel.standardize_by_allele_count(scores, allele_counts)
811

812
# Moving Tajima's D difference
813
delta_td = allel.moving_delta_tajima_d(positions, ac_pop1, ac_pop2, size=10000)
814

815
# Population Branch Statistic (PBS)
816
pbs_scores = allel.pbs(fst_AB, fst_AC, fst_BC)
817
```
818
{ .api }
819

820
```python
821
def ehh_decay(h, map_pos, focal=None, max_gap=200000, gap_scale=20000, truncate_left=None, truncate_right=None):
822
    """
823
    Calculate Extended Haplotype Homozygosity (EHH) decay.
824
    
825
    Args:
826
        h (HaplotypeArray): Haplotype data
827
        map_pos (array_like): Genetic map positions
828
        focal (int, optional): Index of focal variant
829
        max_gap (float): Maximum gap between variants
830
        gap_scale (float): Gap scaling factor
831
        truncate_left (float, optional): Left truncation point
832
        truncate_right (float, optional): Right truncation point
833
        
834
    Returns:
835
        tuple: (distances, ehh_values)
836
    """
837

838
def voight_painting(h, map_pos, focal=None, max_gap=200000, gap_scale=20000):
839
    """
840
    Generate Voight painting data for haplotype visualization.
841
    
842
    Args:
843
        h (HaplotypeArray): Haplotype data
844
        map_pos (array_like): Genetic map positions
845
        focal (int, optional): Index of focal variant
846
        max_gap (float): Maximum gap between variants
847
        gap_scale (float): Gap scaling factor
848
        
849
    Returns:
850
        dict: Painting data for visualization
851
    """
852

853
def plot_voight_painting(painting, ax=None):
854
    """
855
    Plot Voight painting visualization.
856
    
857
    Args:
858
        painting (dict): Painting data from voight_painting()
859
        ax (matplotlib axis, optional): Axis to plot on
860
        
861
    Returns:
862
        matplotlib figure: Voight painting plot
863
    """
864

865
def fig_voight_painting(painting, height=None):
866
    """
867
    Create figure for Voight painting.
868
    
869
    Args:
870
        painting (dict): Painting data
871
        height (float, optional): Figure height
872
        
873
    Returns:
874
        matplotlib figure: Figure object
875
    """
876

877
def plot_haplotype_frequencies(h, start=None, stop=None, ax=None):
878
    """
879
    Plot haplotype frequencies.
880
    
881
    Args:
882
        h (HaplotypeArray): Haplotype data
883
        start (int, optional): Start position index
884
        stop (int, optional): Stop position index
885
        ax (matplotlib axis, optional): Axis to plot on
886
        
887
    Returns:
888
        matplotlib figure: Haplotype frequency plot
889
    """
890

891
def plot_moving_haplotype_frequencies(h, size, start=None, stop=None, step=None, ax=None):
892
    """
893
    Plot moving window haplotype frequencies.
894
    
895
    Args:
896
        h (HaplotypeArray): Haplotype data
897
        size (int): Window size
898
        start (int, optional): Start position
899
        stop (int, optional): Stop position
900
        step (int, optional): Step size
901
        ax (matplotlib axis, optional): Axis to plot on
902
        
903
    Returns:
904
        matplotlib figure: Moving haplotype frequency plot
905
    """
906

907
def standardize(score):
908
    """
909
    Standardize test statistic to z-scores.
910
    
911
    Args:
912
        score (array_like): Raw test statistic values
913
        
914
    Returns:
915
        ndarray: Standardized scores
916
    """
917

918
def moving_delta_tajima_d(pos, ac1, ac2, size, start=None, stop=None, step=None, is_accessible=None):
919
    """
920
    Calculate difference in Tajima's D between populations in sliding windows.
921
    
922
    Args:
923
        pos (array_like): Variant positions
924
        ac1 (AlleleCountsArray): Allele counts for population 1
925
        ac2 (AlleleCountsArray): Allele counts for population 2
926
        size (int): Window size in base pairs
927
        start (int, optional): Start position
928
        stop (int, optional): Stop position
929
        step (int, optional): Step size
930
        is_accessible (array_like, optional): Accessible sites mask
931
        
932
    Returns:
933
        tuple: (window_centers, delta_tajima_d_values)
934
    """
935

936
def pbs(fst_AB, fst_AC, fst_BC):
937
    """
938
    Calculate Population Branch Statistic (PBS) for three populations.
939
    
940
    Args:
941
        fst_AB (array_like): FST between populations A and B
942
        fst_AC (array_like): FST between populations A and C
943
        fst_BC (array_like): FST between populations B and C
944
        
945
    Returns:
946
        ndarray: PBS values for population A
947
    """
948

949
def moving_haplotype_diversity(h, size, start=None, stop=None, step=None):
950
    """
951
    Calculate haplotype diversity in sliding windows.
952
    
953
    Args:
954
        h (HaplotypeArray): Haplotype data
955
        size (int): Window size (number of variants)
956
        start (int, optional): Start position index
957
        stop (int, optional): Stop position index
958
        step (int, optional): Step size
959
        
960
    Returns:
961
        tuple: (window_centers, diversity_values)
962
    """
963

964
def moving_garud_h(h, size, start=None, stop=None, step=None):
965
    """
966
    Calculate Garud's H statistics in sliding windows.
967
    
968
    Args:
969
        h (HaplotypeArray): Haplotype data
970
        size (int): Window size (number of variants)
971
        start (int, optional): Start position index
972
        stop (int, optional): Stop position index
973
        step (int, optional): Step size
974
        
975
    Returns:
976
        tuple: (window_centers, h1_values, h12_values, h123_values, h2_h1_values)
977
    """
978
```
979
{ .api }
980

981
### Haplotype Diversity
982

983
Measure diversity of haplotype patterns.
984

985
```python
986
# Basic haplotype diversity
987
h_div = allel.haplotype_diversity(haplotypes, start=1000, stop=2000)
988

989
# Moving window analysis
990
h_div_moving = allel.moving_haplotype_diversity(haplotypes, size=1000, step=500)
991

992
# Garud's H statistics
993
h_stats = allel.garud_h(haplotypes, start=1000, stop=2000)
994
```
995
{ .api }
996

997
```python
998
def haplotype_diversity(h, start=None, stop=None):
999
    """
1000
    Calculate haplotype diversity in a genomic region.
1001
    
1002
    Args:
1003
        h (HaplotypeArray): Haplotype data
1004
        start (int, optional): Start position index
1005
        stop (int, optional): Stop position index
1006
        
1007
    Returns:
1008
        float: Haplotype diversity value
1009
    """
1010

1011
def garud_h(h, start=None, stop=None):
1012
    """
1013
    Calculate Garud's H1, H12, H123, and H2/H1 statistics.
1014
    
1015
    Args:
1016
        h (HaplotypeArray): Haplotype data
1017
        start (int, optional): Start position index
1018
        stop (int, optional): Stop position index
1019
        
1020
    Returns:
1021
        tuple: (H1, H12, H123, H2_H1) statistics
1022
    """
1023
```
1024
{ .api }
1025

1026
## Site Frequency Spectrum
1027

1028
### SFS Analysis
1029

1030
Analyze allele frequency distributions.
1031

1032
```python
1033
# Basic SFS
1034
sfs = allel.sfs(derived_allele_counts, n=sample_size)
1035

1036
# Folded SFS (when ancestral state unknown)
1037
sfs_folded = allel.sfs_folded(derived_allele_counts, n=sample_size)
1038

1039
# Joint SFS between populations
1040
joint_sfs = allel.joint_sfs(dac_pop1, dac_pop2, n1=n_pop1, n2=n_pop2)
1041
```
1042
{ .api }
1043

1044
```python
1045
def sfs(dac, n=None):
1046
    """
1047
    Calculate site frequency spectrum.
1048
    
1049
    Args:
1050
        dac (array_like): Derived allele counts per variant
1051
        n (int, optional): Sample size (number of chromosomes)
1052
        
1053
    Returns:
1054
        ndarray: Site frequency spectrum
1055
    """
1056

1057
def sfs_folded(dac, n=None):
1058
    """
1059
    Calculate folded site frequency spectrum.
1060
    
1061
    Args:
1062
        dac (array_like): Derived allele counts
1063
        n (int, optional): Sample size
1064
        
1065
    Returns:
1066
        ndarray: Folded SFS
1067
    """
1068

1069
def joint_sfs(dac1, dac2, n1=None, n2=None):
1070
    """
1071
    Calculate joint site frequency spectrum between two populations.
1072
    
1073
    Args:
1074
        dac1 (array_like): Derived allele counts for population 1
1075
        dac2 (array_like): Derived allele counts for population 2
1076
        n1 (int, optional): Sample size for population 1
1077
        n2 (int, optional): Sample size for population 2
1078
        
1079
    Returns:
1080
        ndarray: Joint SFS as 2D array
1081
    """
1082
```
1083
{ .api }
1084

1085
### SFS Transformations and Scaling
1086

1087
Transform and scale site frequency spectra.
1088

1089
```python
1090
# Scale SFS by mutation rate
1091
sfs_scaled = allel.sfs_scaled(derived_counts, n=sample_size, theta=0.01)
1092
sfs_folded_scaled = allel.sfs_folded_scaled(derived_counts, n=sample_size, theta=0.01)
1093

1094
# Joint SFS scaling
1095
joint_sfs_scaled = allel.joint_sfs_scaled(dac1, dac2, n1=n1, n2=n2, theta=0.01)
1096
joint_sfs_folded_scaled = allel.joint_sfs_folded_scaled(dac1, dac2, n1=n1, n2=n2, theta=0.01)
1097

1098
# Fold existing SFS
1099
folded_sfs = allel.fold_sfs(unfolded_sfs)
1100
folded_joint_sfs = allel.fold_joint_sfs(unfolded_joint_sfs)
1101

1102
# Scale existing SFS
1103
scaled_sfs = allel.scale_sfs(sfs, theta=0.01)
1104
scaled_folded_sfs = allel.scale_sfs_folded(sfs, theta=0.01)
1105
scaled_joint_sfs = allel.scale_joint_sfs(joint_sfs, theta=0.01)
1106
scaled_joint_folded_sfs = allel.scale_joint_sfs_folded(joint_sfs, theta=0.01)
1107
```
1108
{ .api }
1109

1110
```python
1111
def sfs_scaled(dac, n=None, theta=1.0):
1112
    """
1113
    Calculate scaled site frequency spectrum.
1114
    
1115
    Args:
1116
        dac (array_like): Derived allele counts
1117
        n (int, optional): Sample size
1118
        theta (float): Scaling factor (mutation rate)
1119
        
1120
    Returns:
1121
        ndarray: Scaled SFS
1122
    """
1123

1124
def sfs_folded_scaled(dac, n=None, theta=1.0):
1125
    """
1126
    Calculate scaled folded site frequency spectrum.
1127
    
1128
    Args:
1129
        dac (array_like): Derived allele counts
1130
        n (int, optional): Sample size
1131
        theta (float): Scaling factor
1132
        
1133
    Returns:
1134
        ndarray: Scaled folded SFS
1135
    """
1136

1137
def joint_sfs_scaled(dac1, dac2, n1=None, n2=None, theta=1.0):
1138
    """
1139
    Calculate scaled joint site frequency spectrum.
1140
    
1141
    Args:
1142
        dac1 (array_like): Population 1 derived allele counts
1143
        dac2 (array_like): Population 2 derived allele counts
1144
        n1 (int, optional): Population 1 sample size
1145
        n2 (int, optional): Population 2 sample size
1146
        theta (float): Scaling factor
1147
        
1148
    Returns:
1149
        ndarray: Scaled joint SFS
1150
    """
1151

1152
def joint_sfs_folded_scaled(dac1, dac2, n1=None, n2=None, theta=1.0):
1153
    """
1154
    Calculate scaled folded joint site frequency spectrum.
1155
    
1156
    Args:
1157
        dac1 (array_like): Population 1 derived allele counts
1158
        dac2 (array_like): Population 2 derived allele counts
1159
        n1 (int, optional): Population 1 sample size
1160
        n2 (int, optional): Population 2 sample size
1161
        theta (float): Scaling factor
1162
        
1163
    Returns:
1164
        ndarray: Scaled folded joint SFS
1165
    """
1166

1167
def fold_sfs(sfs):
1168
    """
1169
    Fold an existing site frequency spectrum.
1170
    
1171
    Args:
1172
        sfs (array_like): Unfolded SFS
1173
        
1174
    Returns:
1175
        ndarray: Folded SFS
1176
    """
1177

1178
def fold_joint_sfs(joint_sfs):
1179
    """
1180
    Fold an existing joint site frequency spectrum.
1181
    
1182
    Args:
1183
        joint_sfs (array_like): Unfolded joint SFS
1184
        
1185
    Returns:
1186
        ndarray: Folded joint SFS
1187
    """
1188

1189
def scale_sfs(sfs, theta=1.0):
1190
    """
1191
    Scale an existing site frequency spectrum.
1192
    
1193
    Args:
1194
        sfs (array_like): SFS to scale
1195
        theta (float): Scaling factor
1196
        
1197
    Returns:
1198
        ndarray: Scaled SFS
1199
    """
1200

1201
def scale_sfs_folded(sfs, theta=1.0):
1202
    """
1203
    Scale an existing folded site frequency spectrum.
1204
    
1205
    Args:
1206
        sfs (array_like): Folded SFS to scale
1207
        theta (float): Scaling factor
1208
        
1209
    Returns:
1210
        ndarray: Scaled folded SFS
1211
    """
1212

1213
def scale_joint_sfs(joint_sfs, theta=1.0):
1214
    """
1215
    Scale an existing joint site frequency spectrum.
1216
    
1217
    Args:
1218
        joint_sfs (array_like): Joint SFS to scale
1219
        theta (float): Scaling factor
1220
        
1221
    Returns:
1222
        ndarray: Scaled joint SFS
1223
    """
1224

1225
def scale_joint_sfs_folded(joint_sfs, theta=1.0):
1226
    """
1227
    Scale an existing folded joint site frequency spectrum.
1228
    
1229
    Args:
1230
        joint_sfs (array_like): Folded joint SFS to scale
1231
        theta (float): Scaling factor
1232
        
1233
    Returns:
1234
        ndarray: Scaled folded joint SFS
1235
    """
1236
```
1237
{ .api }
1238

1239
### SFS Visualization
1240

1241
Plot site frequency spectra.
1242

1243
```python
1244
# Basic SFS plots
1245
allel.plot_sfs(sfs)
1246
allel.plot_sfs_folded(sfs_folded)
1247
allel.plot_sfs_scaled(sfs_scaled)
1248
allel.plot_sfs_folded_scaled(sfs_folded_scaled)
1249

1250
# Joint SFS plots
1251
allel.plot_joint_sfs(joint_sfs)
1252
allel.plot_joint_sfs_folded(joint_sfs_folded)
1253
allel.plot_joint_sfs_scaled(joint_sfs_scaled)
1254
allel.plot_joint_sfs_folded_scaled(joint_sfs_folded_scaled)
1255
```
1256
{ .api }
1257

1258
```python
1259
def plot_sfs(sfs, ax=None, n=None):
1260
    """
1261
    Plot site frequency spectrum.
1262
    
1263
    Args:
1264
        sfs (array_like): SFS values
1265
        ax (matplotlib axis, optional): Axis to plot on
1266
        n (int, optional): Sample size for x-axis labels
1267
        
1268
    Returns:
1269
        matplotlib figure: SFS plot
1270
    """
1271

1272
def plot_sfs_folded(sfs, ax=None, n=None):
1273
    """
1274
    Plot folded site frequency spectrum.
1275
    
1276
    Args:
1277
        sfs (array_like): Folded SFS values
1278
        ax (matplotlib axis, optional): Axis to plot on
1279
        n (int, optional): Sample size for x-axis labels
1280
        
1281
    Returns:
1282
        matplotlib figure: Folded SFS plot
1283
    """
1284

1285
def plot_sfs_scaled(sfs, ax=None, n=None):
1286
    """
1287
    Plot scaled site frequency spectrum.
1288
    
1289
    Args:
1290
        sfs (array_like): Scaled SFS values
1291
        ax (matplotlib axis, optional): Axis to plot on
1292
        n (int, optional): Sample size
1293
        
1294
    Returns:
1295
        matplotlib figure: Scaled SFS plot
1296
    """
1297

1298
def plot_sfs_folded_scaled(sfs, ax=None, n=None):
1299
    """
1300
    Plot scaled folded site frequency spectrum.
1301
    
1302
    Args:
1303
        sfs (array_like): Scaled folded SFS values
1304
        ax (matplotlib axis, optional): Axis to plot on
1305
        n (int, optional): Sample size
1306
        
1307
    Returns:
1308
        matplotlib figure: Scaled folded SFS plot
1309
    """
1310

1311
def plot_joint_sfs(joint_sfs, ax=None):
1312
    """
1313
    Plot joint site frequency spectrum as heatmap.
1314
    
1315
    Args:
1316
        joint_sfs (array_like): Joint SFS matrix
1317
        ax (matplotlib axis, optional): Axis to plot on
1318
        
1319
    Returns:
1320
        matplotlib figure: Joint SFS heatmap
1321
    """
1322

1323
def plot_joint_sfs_folded(joint_sfs, ax=None):
1324
    """
1325
    Plot folded joint site frequency spectrum.
1326
    
1327
    Args:
1328
        joint_sfs (array_like): Folded joint SFS matrix
1329
        ax (matplotlib axis, optional): Axis to plot on
1330
        
1331
    Returns:
1332
        matplotlib figure: Folded joint SFS heatmap
1333
    """
1334

1335
def plot_joint_sfs_scaled(joint_sfs, ax=None):
1336
    """
1337
    Plot scaled joint site frequency spectrum.
1338
    
1339
    Args:
1340
        joint_sfs (array_like): Scaled joint SFS matrix
1341
        ax (matplotlib axis, optional): Axis to plot on
1342
        
1343
    Returns:
1344
        matplotlib figure: Scaled joint SFS heatmap
1345
    """
1346

1347
def plot_joint_sfs_folded_scaled(joint_sfs, ax=None):
1348
    """
1349
    Plot scaled folded joint site frequency spectrum.
1350
    
1351
    Args:
1352
        joint_sfs (array_like): Scaled folded joint SFS matrix
1353
        ax (matplotlib axis, optional): Axis to plot on
1354
        
1355
    Returns:
1356
        matplotlib figure: Scaled folded joint SFS heatmap
1357
    """
1358
```
1359
{ .api }
1360

1361
## Hardy-Weinberg Equilibrium
1362

1363
### HWE Statistics
1364

1365
Test for deviations from Hardy-Weinberg equilibrium.
1366

1367
```python
1368
# Observed heterozygosity
1369
het_obs = allel.heterozygosity_observed(genotypes)
1370

1371
# Expected heterozygosity
1372
het_exp = allel.heterozygosity_expected(allele_frequencies)
1373

1374
# Inbreeding coefficient (FIS)
1375
fis = allel.inbreeding_coefficient(genotypes)
1376
```
1377
{ .api }
1378

1379
```python
1380
def heterozygosity_observed(g, alleles=None):
1381
    """
1382
    Calculate observed heterozygosity per variant.
1383
    
1384
    Args:
1385
        g (GenotypeArray): Genotype data
1386
        alleles (array_like, optional): Specific alleles to consider
1387
        
1388
    Returns:
1389
        ndarray: Observed heterozygosity per variant
1390
    """
1391

1392
def heterozygosity_expected(af, ploidy=2):
1393
    """
1394
    Calculate expected heterozygosity under Hardy-Weinberg equilibrium.
1395
    
1396
    Args:
1397
        af (array_like): Allele frequencies
1398
        ploidy (int): Ploidy level
1399
        
1400
    Returns:
1401
        ndarray: Expected heterozygosity per variant
1402
    """
1403

1404
def inbreeding_coefficient(g, alleles=None):
1405
    """
1406
    Calculate inbreeding coefficient (FIS) per variant.
1407
    
1408
    Args:
1409
        g (GenotypeArray): Genotype data
1410
        alleles (array_like, optional): Specific alleles to consider
1411
        
1412
    Returns:
1413
        ndarray: Inbreeding coefficient per variant
1414
    """
1415
```
1416
{ .api }
1417

1418
## Admixture Statistics
1419

1420
### Patterson's F-statistics
1421

1422
Test for population admixture and phylogenetic relationships.
1423

1424
```python
1425
# F2 statistic (genetic distance)
1426
f2_values = allel.patterson_f2(ac_pop1, ac_pop2)
1427

1428
# F3 statistic (admixture test)
1429
f3_values = allel.patterson_f3(ac_pop1, ac_pop2, ac_pop3)
1430

1431
# D statistic (4-population test)
1432
d_values = allel.patterson_d(ac_pop1, ac_pop2, ac_pop3, ac_pop4)
1433

1434
# Blockwise calculations for bootstrapping
1435
f3_blocks = allel.blockwise_patterson_f3(ac_pop1, ac_pop2, ac_pop3, blocks)
1436
d_blocks = allel.blockwise_patterson_d(ac_pop1, ac_pop2, ac_pop3, ac_pop4, blocks)
1437

1438
# Average statistics
1439
avg_f3 = allel.average_patterson_f3(ac_pop1, ac_pop2, ac_pop3)
1440
avg_d = allel.average_patterson_d(ac_pop1, ac_pop2, ac_pop3, ac_pop4)
1441

1442
# Moving window analysis
1443
f3_moving = allel.moving_patterson_f3(positions, ac_pop1, ac_pop2, ac_pop3, size=100000)
1444
d_moving = allel.moving_patterson_d(positions, ac_pop1, ac_pop2, ac_pop3, ac_pop4, size=100000)
1445
```
1446
{ .api }
1447

1448
**Key Functions:**
1449

1450
```python
1451
def patterson_f2(ac1, ac2):
1452
    """
1453
    Calculate Patterson's F2 statistic (genetic distance).
1454
    
1455
    Args:
1456
        ac1 (AlleleCountsArray): Allele counts for population 1
1457
        ac2 (AlleleCountsArray): Allele counts for population 2
1458
        
1459
    Returns:
1460
        ndarray: F2 values per variant
1461
    """
1462

1463
def patterson_f3(ac1, ac2, ac3):
1464
    """
1465
    Calculate Patterson's F3 statistic (test for admixture).
1466
    
1467
    Args:
1468
        ac1 (AlleleCountsArray): Allele counts for population 1 (test population)
1469
        ac2 (AlleleCountsArray): Allele counts for population 2 (source 1)
1470
        ac3 (AlleleCountsArray): Allele counts for population 3 (source 2)
1471
        
1472
    Returns:
1473
        ndarray: F3 values per variant
1474
    """
1475

1476
def patterson_d(ac1, ac2, ac3, ac4):
1477
    """
1478
    Calculate Patterson's D statistic (4-population test).
1479
    
1480
    Args:
1481
        ac1 (AlleleCountsArray): Population 1 allele counts
1482
        ac2 (AlleleCountsArray): Population 2 allele counts
1483
        ac3 (AlleleCountsArray): Population 3 allele counts
1484
        ac4 (AlleleCountsArray): Population 4 allele counts (outgroup)
1485
        
1486
    Returns:
1487
        ndarray: D values per variant
1488
    """
1489

1490
def blockwise_patterson_f3(ac1, ac2, ac3, blocks):
1491
    """
1492
    Calculate F3 statistic for genomic blocks.
1493
    
1494
    Args:
1495
        ac1 (AlleleCountsArray): Population 1 allele counts
1496
        ac2 (AlleleCountsArray): Population 2 allele counts
1497
        ac3 (AlleleCountsArray): Population 3 allele counts
1498
        blocks (array_like): Block definitions
1499
        
1500
    Returns:
1501
        ndarray: F3 values per block
1502
    """
1503

1504
def blockwise_patterson_d(ac1, ac2, ac3, ac4, blocks):
1505
    """
1506
    Calculate D statistic for genomic blocks.
1507
    
1508
    Args:
1509
        ac1 (AlleleCountsArray): Population 1 allele counts
1510
        ac2 (AlleleCountsArray): Population 2 allele counts
1511
        ac3 (AlleleCountsArray): Population 3 allele counts
1512
        ac4 (AlleleCountsArray): Population 4 allele counts
1513
        blocks (array_like): Block definitions
1514
        
1515
    Returns:
1516
        ndarray: D values per block
1517
    """
1518

1519
def average_patterson_f3(ac1, ac2, ac3):
1520
    """
1521
    Calculate average F3 statistic across all variants.
1522
    
1523
    Args:
1524
        ac1 (AlleleCountsArray): Population 1 allele counts
1525
        ac2 (AlleleCountsArray): Population 2 allele counts
1526
        ac3 (AlleleCountsArray): Population 3 allele counts
1527
        
1528
    Returns:
1529
        float: Average F3 value
1530
    """
1531

1532
def average_patterson_d(ac1, ac2, ac3, ac4):
1533
    """
1534
    Calculate average D statistic across all variants.
1535
    
1536
    Args:
1537
        ac1 (AlleleCountsArray): Population 1 allele counts
1538
        ac2 (AlleleCountsArray): Population 2 allele counts
1539
        ac3 (AlleleCountsArray): Population 3 allele counts
1540
        ac4 (AlleleCountsArray): Population 4 allele counts
1541
        
1542
    Returns:
1543
        float: Average D value
1544
    """
1545

1546
def moving_patterson_f3(pos, ac1, ac2, ac3, size, start=None, stop=None, step=None):
1547
    """
1548
    Calculate F3 statistic in sliding windows.
1549
    
1550
    Args:
1551
        pos (array_like): Variant positions
1552
        ac1 (AlleleCountsArray): Population 1 allele counts
1553
        ac2 (AlleleCountsArray): Population 2 allele counts
1554
        ac3 (AlleleCountsArray): Population 3 allele counts
1555
        size (int): Window size in base pairs
1556
        start (int, optional): Start position
1557
        stop (int, optional): Stop position
1558
        step (int, optional): Step size
1559
        
1560
    Returns:
1561
        tuple: (window_centers, f3_values)
1562
    """
1563

1564
def moving_patterson_d(pos, ac1, ac2, ac3, ac4, size, start=None, stop=None, step=None):
1565
    """
1566
    Calculate D statistic in sliding windows.
1567
    
1568
    Args:
1569
        pos (array_like): Variant positions
1570
        ac1 (AlleleCountsArray): Population 1 allele counts
1571
        ac2 (AlleleCountsArray): Population 2 allele counts
1572
        ac3 (AlleleCountsArray): Population 3 allele counts
1573
        ac4 (AlleleCountsArray): Population 4 allele counts
1574
        size (int): Window size in base pairs
1575
        start (int, optional): Start position
1576
        stop (int, optional): Stop position
1577
        step (int, optional): Step size
1578
        
1579
    Returns:
1580
        tuple: (window_centers, d_values)
1581
    """
1582
```
1583
{ .api }
1584

1585
## Windowed Analysis
1586

1587
### Window Functions
1588

1589
Apply statistical methods over genomic windows.
1590

1591
```python
1592
# Create equally-sized windows
1593
windows = allel.equally_accessible_windows(positions, size=10000)
1594

1595
# Apply any statistic to windows
1596
window_stats = allel.windowed_statistic(positions, values, np.mean, windows)
1597

1598
# Moving window statistics
1599
moving_stats = allel.moving_statistic(values, np.mean, size=100, step=50)
1600

1601
# Count variants in windows  
1602
counts = allel.windowed_count(positions, windows)
1603

1604
# Per-base analysis
1605
per_base_values = allel.per_base(positions, values, start=1000, stop=2000)
1606

1607
# Moving statistics with different functions
1608
moving_means = allel.moving_mean(values, size=100, step=50)
1609
moving_stds = allel.moving_std(values, size=100, step=50)
1610
moving_mids = allel.moving_midpoint(positions, size=100, step=50)
1611

1612
# Window coordinate helpers
1613
index_wins = allel.index_windows(positions, window_size=100)
1614
pos_wins = allel.position_windows(positions, window_size=10000, step=5000)
1615
win_locs = allel.window_locations(windows)
1616
```
1617
{ .api }
1618

1619
### Additional Window Functions
1620

1621
More specialized windowing utilities.
1622

1623
```python
1624
def windowed_count(pos, windows, is_accessible=None):
1625
    """
1626
    Count variants in genomic windows.
1627
    
1628
    Args:
1629
        pos (array_like): Variant positions
1630
        windows (array_like): Window coordinates as (start, stop) pairs
1631
        is_accessible (array_like, optional): Accessible sites mask
1632
        
1633
    Returns:
1634
        ndarray: Variant counts per window
1635
    """
1636

1637
def per_base(pos, values, start=None, stop=None, is_accessible=None):
1638
    """
1639
    Calculate per-base values over a genomic region.
1640
    
1641
    Args:
1642
        pos (array_like): Variant positions
1643
        values (array_like): Values per variant
1644
        start (int, optional): Start position
1645
        stop (int, optional): Stop position
1646
        is_accessible (array_like, optional): Accessible sites mask
1647
        
1648
    Returns:
1649
        float: Per-base value
1650
    """
1651

1652
def moving_mean(values, size, step=1):
1653
    """
1654
    Calculate moving mean over a sequence of values.
1655
    
1656
    Args:
1657
        values (array_like): Input values
1658
        size (int): Window size
1659
        step (int): Step size between windows
1660
        
1661
    Returns:
1662
        ndarray: Moving mean values
1663
    """
1664

1665
def moving_std(values, size, step=1):
1666
    """
1667
    Calculate moving standard deviation.
1668
    
1669
    Args:
1670
        values (array_like): Input values
1671
        size (int): Window size
1672
        step (int): Step size between windows
1673
        
1674
    Returns:
1675
        ndarray: Moving standard deviation values
1676
    """
1677

1678
def moving_midpoint(positions, size, step=1):
1679
    """
1680
    Calculate midpoints for moving windows.
1681
    
1682
    Args:
1683
        positions (array_like): Position coordinates
1684
        size (int): Window size
1685
        step (int): Step size between windows
1686
        
1687
    Returns:
1688
        ndarray: Window midpoint positions
1689
    """
1690

1691
def index_windows(positions, window_size):
1692
    """
1693
    Create windows based on variant indices.
1694
    
1695
    Args:
1696
        positions (array_like): Variant positions
1697
        window_size (int): Number of variants per window
1698
        
1699
    Returns:
1700
        ndarray: Window definitions as index ranges
1701
    """
1702

1703
def position_windows(positions, window_size, step=None):
1704
    """
1705
    Create windows based on genomic positions.
1706
    
1707
    Args:
1708
        positions (array_like): Variant positions
1709
        window_size (int): Window size in base pairs
1710
        step (int, optional): Step size between windows
1711
        
1712
    Returns:
1713
        ndarray: Window coordinates as (start, stop) pairs
1714
    """
1715

1716
def window_locations(windows):
1717
    """
1718
    Calculate locations (centers) of genomic windows.
1719
    
1720
    Args:
1721
        windows (array_like): Window coordinates as (start, stop) pairs
1722
        
1723
    Returns:
1724
        ndarray: Window center positions
1725
    """
1726
```
1727
{ .api }
1728

1729
## Mendelian Analysis
1730

1731
### Inheritance Pattern Analysis
1732

1733
Analyze transmission patterns in family data to identify Mendelian errors and phase genotypes.
1734

1735
```python
1736
# Identify Mendelian errors in trio data
1737
errors = allel.mendel_errors(offspring_genotypes, parent_genotypes)
1738

1739
# Phase genotypes using transmission patterns
1740
phased_offspring = allel.phase_progeny_by_transmission(offspring_genotypes, transmission)
1741
phased_parents = allel.phase_parents_by_transmission(parent_genotypes, transmission)
1742

1743
# Complete phasing workflow
1744
phased_offspring, phased_parents = allel.phase_by_transmission(offspring_genotypes, parent_genotypes)
1745
```
1746
{ .api }
1747

1748
**Key Functions:**
1749

1750
```python
1751
def mendel_errors(offspring_genotypes, parent_genotypes):
1752
    """
1753
    Identify Mendelian inheritance errors in family data.
1754
    
1755
    Args:
1756
        offspring_genotypes (GenotypeArray): Genotypes of offspring
1757
        parent_genotypes (GenotypeArray): Genotypes of parents
1758
        
1759
    Returns:
1760
        ndarray: Array indicating Mendelian error status for each variant/offspring pair
1761
    """
1762

1763
def phase_progeny_by_transmission(offspring_genotypes, transmission):
1764
    """
1765
    Phase offspring genotypes using transmission patterns.
1766
    
1767
    Args:
1768
        offspring_genotypes (GenotypeArray): Offspring genotype data
1769
        transmission (array_like): Transmission pattern information
1770
        
1771
    Returns:
1772
        GenotypeArray: Phased offspring genotypes
1773
    """
1774

1775
def phase_parents_by_transmission(parent_genotypes, transmission):
1776
    """
1777
    Phase parent genotypes using transmission patterns.
1778
    
1779
    Args:
1780
        parent_genotypes (GenotypeArray): Parent genotype data
1781
        transmission (array_like): Transmission pattern information
1782
        
1783
    Returns:
1784
        GenotypeArray: Phased parent genotypes
1785
    """
1786

1787
def phase_by_transmission(offspring_genotypes, parent_genotypes):
1788
    """
1789
    Phase both offspring and parent genotypes using transmission patterns.
1790
    
1791
    Args:
1792
        offspring_genotypes (GenotypeArray): Offspring genotypes
1793
        parent_genotypes (GenotypeArray): Parent genotypes
1794
        
1795
    Returns:
1796
        tuple: (phased_offspring, phased_parents) GenotypeArray objects
1797
    """
1798
```
1799
{ .api }
1800

1801
**Inheritance Constants:**
1802

1803
```python
1804
# Mendelian inheritance pattern constants
1805
INHERIT_MISSING = -1        # Missing inheritance pattern
1806
INHERIT_NONPARENTAL = 0     # Non-parental allele
1807
INHERIT_NONSEG_ALT = 1      # Non-segregating alternative
1808
INHERIT_NONSEG_REF = 2      # Non-segregating reference
1809
INHERIT_PARENT1 = 3         # Inherited from parent 1
1810
INHERIT_PARENT2 = 4         # Inherited from parent 2
1811
INHERIT_PARENT_MISSING = 5  # Parent has missing data
1812
INHERIT_UNDETERMINED = 6    # Cannot determine inheritance
1813
```
1814
{ .api }
1815

1816
## Runs of Homozygosity
1817

1818
### ROH Detection
1819

1820
Detect runs of homozygosity using Hidden Markov Models.
1821

1822
```python
1823
# Multi-state HMM for ROH detection
1824
roh_results = allel.roh_mhmm(genotypes, positions, phet_roh=0.001, phet_nonroh=0.01, transition=1e-6)
1825

1826
# Poisson HMM approach
1827
roh_poisson = allel.roh_poissonhmm(genotypes, positions, phet_roh=0.001, phet_nonroh=0.01, 
1828
                                  transition_hethet=1e-5, transition_homhom=1e-8)
1829
```
1830
{ .api }
1831

1832
```python
1833
def roh_mhmm(g, pos, phet_roh, phet_nonroh, transition, map_pos=None):
1834
    """
1835
    Detect runs of homozygosity using multi-state Hidden Markov Model.
1836
    
1837
    Args:
1838
        g (GenotypeArray): Genotype data
1839
        pos (array_like): Variant positions
1840
        phet_roh (float): Probability of heterozygosity in ROH segments
1841
        phet_nonroh (float): Probability of heterozygosity in non-ROH segments
1842
        transition (float): Transition probability between states
1843
        map_pos (array_like, optional): Genetic map positions
1844
        
1845
    Returns:
1846
        ndarray: ROH state predictions
1847
    """
1848

1849
def roh_poissonhmm(g, pos, phet_roh, phet_nonroh, transition_hethet, transition_homhom, map_pos=None):
1850
    """
1851
    Detect runs of homozygosity using Poisson Hidden Markov Model.
1852
    
1853
    Args:
1854
        g (GenotypeArray): Genotype data
1855
        pos (array_like): Variant positions
1856
        phet_roh (float): Het probability in ROH
1857
        phet_nonroh (float): Het probability in non-ROH
1858
        transition_hethet (float): Transition probability between het states
1859
        transition_homhom (float): Transition probability between hom states
1860
        map_pos (array_like, optional): Genetic map positions
1861
        
1862
    Returns:
1863
        ndarray: ROH state predictions
1864
    """
1865
```
1866
{ .api }
1867

1868
## Windowed Analysis
1869

1870
### Window Functions
1871

1872
Apply statistical methods over genomic windows.
1873

1874
```python
1875
# Create equally-sized windows
1876
windows = allel.equally_accessible_windows(positions, size=10000)
1877

1878
# Apply any statistic to windows
1879
window_stats = allel.windowed_statistic(positions, values, np.mean, windows)
1880

1881
# Moving window statistics
1882
moving_stats = allel.moving_statistic(values, np.mean, size=100, step=50)
1883
```
1884
{ .api }
1885

1886
```python
1887
def windowed_statistic(pos, values, statistic, windows, fill=None):
1888
    """
1889
    Apply a statistic function to values within genomic windows.
1890
    
1891
    Args:
1892
        pos (array_like): Positions corresponding to values
1893
        values (array_like): Values to calculate statistics on
1894
        statistic (callable): Function to apply (e.g., np.mean, np.sum)
1895
        windows (array_like): Window coordinates as (start, stop) pairs
1896
        fill (float, optional): Fill value for empty windows
1897
        
1898
    Returns:
1899
        ndarray: Statistic values per window
1900
    """
1901

1902
def equally_accessible_windows(pos, size, start=None, stop=None):
1903
    """
1904
    Create equally-sized genomic windows.
1905
    
1906
    Args:
1907
        pos (array_like): Variant positions to determine window bounds
1908
        size (int): Window size in base pairs
1909
        start (int, optional): Start coordinate
1910
        stop (int, optional): Stop coordinate
1911
        
1912
    Returns:
1913
        ndarray: Window coordinates as (start, stop) pairs
1914
    """
1915

1916
def moving_statistic(values, statistic, size, step=1):
1917
    """
1918
    Apply statistic over a moving window.
1919
    
1920
    Args:
1921
        values (array_like): Input values
1922
        statistic (callable): Function to apply
1923
        size (int): Window size
1924
        step (int): Step size between windows
1925
        
1926
    Returns:
1927
        ndarray: Statistic values per window
1928
    """
1929
```
1930
{ .api }
1931

1932
## Miscellaneous Functions
1933

1934
### Utility Functions
1935

1936
Additional statistical utilities and visualization tools.
1937

1938
```python
1939
# Variant locator plot for exploring data
1940
allel.plot_variant_locator(genotypes, positions, start=1000, stop=2000)
1941

1942
# Tabulate state transitions (for HMM analysis)
1943
transitions = allel.tabulate_state_transitions(states)
1944

1945
# Tabulate state blocks
1946
blocks = allel.tabulate_state_blocks(states)
1947
```
1948
{ .api }
1949

1950
```python
1951
def plot_variant_locator(gn, pos, start=None, stop=None, ax=None):
1952
    """
1953
    Plot variant locator for exploring genotype data.
1954
    
1955
    Args:
1956
        gn (GenotypeArray): Genotype data
1957
        pos (array_like): Variant positions
1958
        start (int, optional): Start position for plotting
1959
        stop (int, optional): Stop position for plotting
1960
        ax (matplotlib axis, optional): Axis to plot on
1961
        
1962
    Returns:
1963
        matplotlib figure: Variant locator plot
1964
    """
1965

1966
def tabulate_state_transitions(states, pos=None):
1967
    """
1968
    Tabulate transitions between states in a sequence.
1969
    
1970
    Args:
1971
        states (array_like): State sequence
1972
        pos (array_like, optional): Positions corresponding to states
1973
        
1974
    Returns:
1975
        DataFrame: Transition table with counts and positions
1976
    """
1977

1978
def tabulate_state_blocks(states, pos=None):
1979
    """
1980
    Tabulate contiguous blocks of the same state.
1981
    
1982
    Args:
1983
        states (array_like): State sequence
1984
        pos (array_like, optional): Positions corresponding to states
1985
        
1986
    Returns:
1987
        DataFrame: Block table with start/stop positions and lengths
1988
    """
1989
```
1990
{ .api }

Version

Tile

Files

statistics.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

statistics.mddocs/