0
# Tree Analysis & Comparison
1
2
Phylogenetic tree analysis including distance calculations, tree comparison metrics, summarization algorithms, and topological analysis. DendroPy provides comprehensive tools for comparing trees, calculating phylogenetic distances, and summarizing tree collections.
3
4
## Capabilities
5
6
### Tree Comparison Metrics
7
8
Functions for comparing phylogenetic trees using various distance metrics and topological measures.
9
10
```python { .api }
11
# Import tree comparison functions
12
from dendropy.calculate.treecompare import (
13
symmetric_difference,
14
unweighted_robinson_foulds_distance,
15
weighted_robinson_foulds_distance,
16
euclidean_distance
17
)
18
19
# Tree comparison functions
20
def symmetric_difference(tree1, tree2, is_bipartitions_updated=False):
21
"""
22
Calculate symmetric difference between two trees.
23
24
Parameters:
25
- tree1, tree2: Tree objects to compare
26
- is_bipartitions_updated: Whether bipartitions are already calculated
27
28
Returns:
29
int: Number of bipartitions in one tree but not the other
30
"""
31
32
def unweighted_robinson_foulds_distance(tree1, tree2, is_bipartitions_updated=False):
33
"""
34
Calculate unweighted Robinson-Foulds distance between trees.
35
36
Parameters:
37
- tree1, tree2: Tree objects to compare
38
- is_bipartitions_updated: Whether bipartitions are already calculated
39
40
Returns:
41
int: Robinson-Foulds distance (0 = identical topologies)
42
"""
43
44
def weighted_robinson_foulds_distance(tree1, tree2, edge_weight_attr="length", is_bipartitions_updated=False):
45
"""
46
Calculate weighted Robinson-Foulds distance using branch lengths.
47
48
Parameters:
49
- tree1, tree2: Tree objects to compare
50
- edge_weight_attr: Attribute name for edge weights (default: "length")
51
- is_bipartitions_updated: Whether bipartitions are already calculated
52
53
Returns:
54
float: Weighted Robinson-Foulds distance
55
"""
56
57
def euclidean_distance(tree1, tree2, edge_weight_attr="length", is_bipartitions_updated=False, value_type=float):
58
"""
59
Calculate Euclidean distance between trees based on branch lengths.
60
61
Parameters:
62
- tree1, tree2: Tree objects to compare
63
- edge_weight_attr: Attribute name for edge weights
64
- is_bipartitions_updated: Whether bipartitions are already calculated
65
- value_type: Type for calculations (float, Decimal, etc.)
66
67
Returns:
68
float: Euclidean distance between tree vectors
69
"""
70
71
def false_positives_and_negatives(reference_tree, comparison_tree, is_bipartitions_updated=False):
72
"""
73
Calculate false positives and false negatives when comparing trees.
74
75
Parameters:
76
- reference_tree: Reference (true) tree
77
- comparison_tree: Tree being evaluated
78
- is_bipartitions_updated: Whether bipartitions are already calculated
79
80
Returns:
81
tuple: (false_positives, false_negatives)
82
"""
83
84
def find_missing_bipartitions(reference_tree, comparison_tree, is_bipartitions_updated=False):
85
"""
86
Find bipartitions present in reference but missing in comparison tree.
87
88
Parameters:
89
- reference_tree: Reference tree with expected bipartitions
90
- comparison_tree: Tree to check for missing bipartitions
91
- is_bipartitions_updated: Whether bipartitions are already calculated
92
93
Returns:
94
set: Set of Bipartition objects missing from comparison tree
95
"""
96
97
def mason_gamer_kellogg_score(tree):
98
"""
99
Calculate Mason-Gamer-Kellogg score for tree shape.
100
101
Parameters:
102
- tree: Tree object to analyze
103
104
Returns:
105
float: MGK score measuring tree balance
106
"""
107
```
108
109
### Phylogenetic Distance Matrices
110
111
Classes for calculating and storing various types of phylogenetic distances.
112
113
```python { .api }
114
class PhylogeneticDistanceMatrix:
115
"""
116
Matrix of phylogenetic distances between taxa.
117
118
Parameters:
119
- tree: Tree object for distance calculations
120
- taxon_namespace: TaxonNamespace for matrix indexing
121
"""
122
123
def __init__(self, tree=None, taxon_namespace=None): ...
124
125
def patristic_distances(self, tree):
126
"""
127
Calculate patristic (tree path) distances between all taxa.
128
129
Parameters:
130
- tree: Tree for distance calculation
131
132
Returns:
133
None (updates internal distance matrix)
134
"""
135
136
def path_distance(self, taxon1, taxon2):
137
"""
138
Get path distance between two specific taxa.
139
140
Parameters:
141
- taxon1, taxon2: Taxon objects
142
143
Returns:
144
float: Path distance between taxa
145
"""
146
147
def max_dist(self):
148
"""Get maximum distance in matrix."""
149
150
def mean_dist(self):
151
"""Get mean distance in matrix."""
152
153
def distances(self):
154
"""Iterator over all pairwise distances."""
155
156
def taxon_pairs(self):
157
"""Iterator over all taxon pairs."""
158
159
class PatristicDistanceMatrix(PhylogeneticDistanceMatrix):
160
"""Specialized matrix for patristic distances."""
161
162
def __init__(self, tree): ...
163
164
class NodeDistanceMatrix:
165
"""
166
Matrix of distances between tree nodes.
167
168
Parameters:
169
- tree: Tree object for node distance calculations
170
"""
171
172
def __init__(self, tree=None): ...
173
174
def distances(self, node1, node2):
175
"""Get distance between two nodes."""
176
```
177
178
### Tree Summarization
179
180
Tools for summarizing collections of trees and extracting consensus information.
181
182
```python { .api }
183
class TreeSummarizer:
184
"""
185
Summarizes collections of trees into consensus trees and statistics.
186
187
Parameters:
188
- taxon_namespace: TaxonNamespace for summary trees
189
"""
190
191
def __init__(self, taxon_namespace=None): ...
192
193
def summarize(self, trees, min_freq=0.5):
194
"""
195
Create consensus tree from tree collection.
196
197
Parameters:
198
- trees: Iterable of Tree objects
199
- min_freq: Minimum frequency for bipartition inclusion
200
201
Returns:
202
Tree: Consensus tree with support values
203
"""
204
205
def map_support_as_node_ages(self, tree, trees):
206
"""Map bipartition support values as node ages on tree."""
207
208
def map_support_as_node_labels(self, tree, trees, label_format="%.2f"):
209
"""Map bipartition support values as node labels."""
210
211
class TopologyCounter:
212
"""
213
Counts and analyzes tree topologies in collections.
214
215
Parameters:
216
- taxon_namespace: TaxonNamespace for topology comparison
217
"""
218
219
def __init__(self, taxon_namespace=None): ...
220
221
def count(self, trees, topology_counter=None):
222
"""
223
Count unique topologies in tree collection.
224
225
Parameters:
226
- trees: Iterable of Tree objects
227
- topology_counter: Optional existing counter to update
228
229
Returns:
230
dict: Mapping from topology to count
231
"""
232
233
def topology_frequencies(self, trees):
234
"""Get frequencies of different topologies."""
235
236
def unique_topologies(self, trees):
237
"""Get set of unique topologies."""
238
```
239
240
### Bipartition Analysis
241
242
Classes and functions for working with bipartitions (splits) in phylogenetic trees.
243
244
```python { .api }
245
class Bipartition:
246
"""
247
Represents a bipartition (split) in a phylogenetic tree.
248
249
Parameters:
250
- taxon_namespace: TaxonNamespace defining taxa
251
- bitmask: Bitmask representing the split
252
"""
253
254
def __init__(self, taxon_namespace=None, **kwargs): ...
255
256
def split_as_newick_string(self, taxon_namespace=None):
257
"""Return bipartition as Newick string."""
258
259
def leafset_as_newick_string(self, taxon_namespace=None):
260
"""Return leaf set as Newick string."""
261
262
def is_compatible_with(self, other_bipartition):
263
"""Check if bipartition is compatible with another."""
264
265
def is_nested_within(self, other_bipartition):
266
"""Check if bipartition is nested within another."""
267
268
def encode_bipartitions(tree):
269
"""
270
Encode all bipartitions in tree.
271
272
Parameters:
273
- tree: Tree object to encode
274
275
Returns:
276
None (adds bipartition encoding to tree nodes)
277
"""
278
279
def update_bipartitions(tree, suppress_unifurcations=True, collapse_unrooted_basal_bifurcation=True):
280
"""Update bipartition encoding on tree."""
281
```
282
283
### Tree Shape and Balance Analysis
284
285
Functions for analyzing tree shape, balance, and other topological properties.
286
287
```python { .api }
288
def colless_tree_imbalance(tree, normalize="max"):
289
"""
290
Calculate Colless index of tree imbalance.
291
292
Parameters:
293
- tree: Tree object to analyze
294
- normalize: Normalization method ("max", "yule", or None)
295
296
Returns:
297
float: Colless imbalance index
298
"""
299
300
def sackin_index(tree, normalize=True):
301
"""
302
Calculate Sackin index of tree balance.
303
304
Parameters:
305
- tree: Tree object to analyze
306
- normalize: Whether to normalize by number of leaves
307
308
Returns:
309
float: Sackin balance index
310
"""
311
312
def b1_index(tree):
313
"""
314
Calculate B1 balance index.
315
316
Parameters:
317
- tree: Tree object to analyze
318
319
Returns:
320
float: B1 balance index
321
"""
322
323
def treeness(tree):
324
"""
325
Calculate treeness (proportion of internal edge length).
326
327
Parameters:
328
- tree: Tree object with branch lengths
329
330
Returns:
331
float: Treeness value (0-1)
332
"""
333
334
def resolution(tree):
335
"""
336
Calculate tree resolution (proportion of internal nodes that are bifurcating).
337
338
Parameters:
339
- tree: Tree object to analyze
340
341
Returns:
342
float: Resolution value (0-1)
343
"""
344
```
345
346
### Population Genetics Statistics
347
348
Statistical calculations relevant to population genetics and phylogeography.
349
350
```python { .api }
351
class PopulationPairSummaryStatistics:
352
"""
353
Summary statistics for pairs of populations.
354
355
Parameters:
356
- pop1_nodes: Nodes representing population 1
357
- pop2_nodes: Nodes representing population 2
358
"""
359
360
def __init__(self, pop1_nodes, pop2_nodes): ...
361
362
def fst(self):
363
"""Calculate Fst between populations."""
364
365
def average_number_of_pairwise_differences(self):
366
"""Calculate average pairwise differences."""
367
368
def average_number_of_pairwise_differences_between(self):
369
"""Calculate average differences between populations."""
370
371
def average_number_of_pairwise_differences_within(self):
372
"""Calculate average differences within populations."""
373
374
def num_segregating_sites(self):
375
"""Count segregating sites."""
376
377
def wattersons_theta(self):
378
"""Calculate Watterson's theta."""
379
380
def tajimas_d(self):
381
"""Calculate Tajima's D."""
382
```
383
384
### Statistical Functions
385
386
General statistical and probability functions used in phylogenetic analysis.
387
388
```python { .api }
389
def mean_and_sample_variance(values):
390
"""
391
Calculate mean and sample variance.
392
393
Parameters:
394
- values: Iterable of numeric values
395
396
Returns:
397
tuple: (mean, sample_variance)
398
"""
399
400
def mean_and_population_variance(values):
401
"""
402
Calculate mean and population variance.
403
404
Parameters:
405
- values: Iterable of numeric values
406
407
Returns:
408
tuple: (mean, population_variance)
409
"""
410
411
def mode(values):
412
"""
413
Find mode (most frequent value).
414
415
Parameters:
416
- values: Iterable of values
417
418
Returns:
419
Value that appears most frequently
420
"""
421
422
def median(values):
423
"""
424
Calculate median value.
425
426
Parameters:
427
- values: Iterable of numeric values
428
429
Returns:
430
float: Median value
431
"""
432
433
def quantile(sorted_values, q):
434
"""
435
Calculate quantile of sorted data.
436
437
Parameters:
438
- sorted_values: Sorted list of values
439
- q: Quantile to calculate (0.0-1.0)
440
441
Returns:
442
float: Quantile value
443
"""
444
445
def empirical_hpd(samples, level=0.95):
446
"""
447
Calculate empirical highest posterior density interval.
448
449
Parameters:
450
- samples: List of sample values
451
- level: Credibility level (default 0.95)
452
453
Returns:
454
tuple: (lower_bound, upper_bound)
455
"""
456
457
class FishersExactTest:
458
"""Fisher's exact test for contingency tables."""
459
460
def __init__(self, a, b, c, d): ...
461
def left_tail_p(self): ...
462
def right_tail_p(self): ...
463
def two_tail_p(self): ...
464
```