0
# Phylogenetic Analysis
1
2
Advanced phylogenetic tree analysis capabilities including species tree operations, monophyly testing, evolutionary analysis, and specialized phylogenetic methods. These features extend core tree functionality with domain-specific phylogenetic tools.
3
4
## Capabilities
5
6
### Phylogenetic Tree Classes
7
8
Enhanced tree classes with phylogenetic-specific features and methods.
9
10
```python { .api }
11
class PhyloTree(Tree):
12
"""
13
Phylogenetic tree with species-aware operations.
14
Inherits all Tree functionality plus phylogenetic methods.
15
"""
16
17
def __init__(self, newick=None, alignment=None, alg_format="fasta",
18
sp_naming_function=None, format=0):
19
"""
20
Initialize phylogenetic tree.
21
22
Parameters:
23
- newick (str): Newick format string or file
24
- alignment (str): Sequence alignment file or string
25
- alg_format (str): Alignment format ("fasta", "phylip", "iphylip")
26
- sp_naming_function (function): Function to extract species from node names
27
- format (int): Newick format specification
28
"""
29
30
class PhyloNode(PhyloTree):
31
"""Alias for PhyloTree - same functionality."""
32
pass
33
```
34
35
### Species Naming and Annotation
36
37
Configure how species names are extracted from node names and manage species-level operations.
38
39
```python { .api }
40
def set_species_naming_function(self, fn):
41
"""
42
Set function to extract species name from node name.
43
44
Parameters:
45
- fn (function): Function that takes node name, returns species name
46
Example: lambda x: x.split('_')[0]
47
"""
48
49
species: str # Species name property (read-only)
50
51
def get_species(self):
52
"""
53
Get set of all species in tree.
54
55
Returns:
56
set: Species names present in tree
57
"""
58
59
def annotate_gtdb_taxa(self, taxid_attr="name"):
60
"""
61
Annotate tree with GTDB (Genome Taxonomy Database) taxonomic information.
62
63
Parameters:
64
- taxid_attr (str): Node attribute containing taxonomic IDs
65
"""
66
```
67
68
### Monophyly Testing
69
70
Test and analyze monophyletic groups in phylogenetic trees.
71
72
```python { .api }
73
def check_monophyly(self, values, target_attr, ignore_missing=False):
74
"""
75
Check if specified values form monophyletic group.
76
77
Parameters:
78
- values (list): List of values to test for monophyly
79
- target_attr (str): Node attribute to check ("species", "name", etc.)
80
- ignore_missing (bool): Ignore nodes without target attribute
81
82
Returns:
83
tuple: (is_monophyletic: bool, clade_type: str, broken_branches: list)
84
clade_type can be "monophyletic", "paraphyletic", or "polyphyletic"
85
"""
86
87
def get_monophyletic(self, values, target_attr):
88
"""
89
Get node that represents monophyletic group of specified values.
90
91
Parameters:
92
- values (list): Values that should form monophyletic group
93
- target_attr (str): Node attribute to match against
94
95
Returns:
96
TreeNode: Node representing the monophyletic group, or None if not monophyletic
97
"""
98
```
99
100
### Distance and Divergence Analysis
101
102
Calculate evolutionary distances and analyze tree metrics.
103
104
```python { .api }
105
def get_age(self, species2age):
106
"""
107
Get age of node based on species age information.
108
109
Parameters:
110
- species2age (dict): Mapping from species names to ages
111
112
Returns:
113
float: Estimated age of node
114
"""
115
116
def get_closest_leaf(self, topology_only=False):
117
"""
118
Find closest leaf node with phylogenetic distance.
119
120
Parameters:
121
- topology_only (bool): Use only topology, ignore branch lengths
122
123
Returns:
124
tuple: (closest_leaf_node, distance)
125
"""
126
127
def get_farthest_leaf(self, topology_only=False):
128
"""
129
Find most distant leaf node.
130
131
Parameters:
132
- topology_only (bool): Use only topology, ignore branch lengths
133
134
Returns:
135
tuple: (farthest_leaf_node, distance)
136
"""
137
138
def get_farthest_node(self, topology_only=False):
139
"""
140
Find most distant node (leaf or internal).
141
142
Parameters:
143
- topology_only (bool): Use only topology, ignore branch lengths
144
145
Returns:
146
tuple: (farthest_node, distance)
147
"""
148
149
def get_midpoint_outgroup(self):
150
"""
151
Find optimal outgroup for midpoint rooting.
152
153
Returns:
154
TreeNode: Node that serves as midpoint outgroup
155
"""
156
```
157
158
### Sequence Integration
159
160
Link phylogenetic trees with molecular sequence data.
161
162
```python { .api }
163
def link_to_alignment(self, alignment, alg_format="fasta", **kwargs):
164
"""
165
Associate sequence alignment with tree nodes.
166
167
Parameters:
168
- alignment (str): Alignment file path or sequence string
169
- alg_format (str): Format ("fasta", "phylip", "iphylip", "paml")
170
- kwargs: Additional format-specific parameters
171
"""
172
173
sequence: str # Associated sequence data (when linked to alignment)
174
```
175
176
### NCBI Taxonomy Comparison
177
178
Compare phylogenetic trees with NCBI taxonomic relationships.
179
180
```python { .api }
181
def ncbi_compare(self, autodetect_duplications=True):
182
"""
183
Compare tree topology with NCBI taxonomy.
184
185
Parameters:
186
- autodetect_duplications (bool): Automatically detect gene duplications
187
188
Returns:
189
dict: Comparison results including conflicts and agreements
190
"""
191
```
192
193
### Tree Reconciliation
194
195
Reconcile gene trees with species trees to infer evolutionary events.
196
197
```python { .api }
198
def reconcile(self, species_tree, inplace=True):
199
"""
200
Reconcile gene tree with species tree.
201
202
Parameters:
203
- species_tree (PhyloTree): Reference species tree
204
- inplace (bool): Modify current tree or return new one
205
206
Returns:
207
PhyloTree: Reconciled tree with duplication/speciation events annotated
208
"""
209
210
# Properties set by reconciliation
211
evoltype: str # Event type: "S" (speciation), "D" (duplication)
212
```
213
214
### Phylogenetic Tree Statistics
215
216
Calculate various phylogenetic tree statistics and metrics.
217
218
```python { .api }
219
def get_cached_content(self, store_attr=None):
220
"""
221
Cache tree content for efficient repeated access.
222
223
Parameters:
224
- store_attr (str): Specific attribute to cache
225
226
Returns:
227
dict: Cached tree statistics and content
228
"""
229
230
def robinson_foulds(self, ref_tree, attr_t1="name", attr_t2="name",
231
expand_polytomies=False, polytomy_size_limit=5,
232
skip_large_polytomies=True):
233
"""
234
Calculate Robinson-Foulds distance between trees.
235
236
Parameters:
237
- ref_tree (Tree): Reference tree for comparison
238
- attr_t1 (str): Attribute for leaf matching in self
239
- attr_t2 (str): Attribute for leaf matching in ref_tree
240
- expand_polytomies (bool): Resolve polytomies before comparison
241
- polytomy_size_limit (int): Max size for polytomy expansion
242
- skip_large_polytomies (bool): Skip large polytomies
243
244
Returns:
245
tuple: (RF_distance, max_RF, common_leaves, parts_t1, parts_t2,
246
discard_t1, discard_t2)
247
"""
248
```
249
250
## Evolution-Specific Tree Classes
251
252
### EvolTree for Evolutionary Analysis
253
254
Specialized tree class for evolutionary model analysis and molecular evolution studies.
255
256
```python { .api }
257
class EvolTree(PhyloTree):
258
"""
259
Tree specialized for evolutionary analysis and molecular evolution models.
260
"""
261
262
def get_evol_model(self, model_name):
263
"""
264
Get evolutionary model associated with tree.
265
266
Parameters:
267
- model_name (str): Name of evolutionary model
268
269
Returns:
270
EvolModel: Evolutionary model object
271
"""
272
273
def link_to_evol_model(self, model_file, workdir=None):
274
"""
275
Link tree to evolutionary analysis results.
276
277
Parameters:
278
- model_file (str): Path to model results file
279
- workdir (str): Working directory for analysis files
280
"""
281
282
def run_model(self, model_name_or_fname):
283
"""
284
Run evolutionary model analysis.
285
286
Parameters:
287
- model_name_or_fname (str): Model name or file path
288
289
Returns:
290
dict: Model analysis results
291
"""
292
293
class EvolNode(EvolTree):
294
"""Alias for EvolTree - same functionality."""
295
pass
296
```
297
298
## Utility Functions
299
300
### Species Tree Analysis
301
302
```python { .api }
303
def get_subtrees(tree, full_copy=False, features=None, newick_only=False):
304
"""
305
Calculate all possible species trees within a gene tree.
306
307
Parameters:
308
- tree (PhyloTree): Input gene tree
309
- full_copy (bool): Create full copies of subtrees
310
- features (list): Features to preserve in subtrees
311
- newick_only (bool): Return only Newick strings
312
313
Returns:
314
tuple: (num_trees, num_duplications, tree_iterator)
315
"""
316
317
def is_dup(node):
318
"""
319
Check if node represents a duplication event.
320
321
Parameters:
322
- node (TreeNode): Node to test
323
324
Returns:
325
bool: True if node is duplication
326
"""
327
```
328
329
## Usage Examples
330
331
### Basic Phylogenetic Operations
332
333
```python
334
from ete3 import PhyloTree
335
336
# Create phylogenetic tree with species naming
337
tree = PhyloTree("(human_gene1:0.1,(chimp_gene1:0.05,bonobo_gene1:0.05):0.02);")
338
tree.set_species_naming_function(lambda x: x.split('_')[0])
339
340
# Check species representation
341
species = tree.get_species()
342
print(f"Species in tree: {species}")
343
344
# Test monophyly
345
is_mono, clade_type, broken = tree.check_monophyly(['human', 'chimp'], 'species')
346
print(f"Human-Chimp monophyly: {is_mono} ({clade_type})")
347
```
348
349
### Sequence Integration
350
351
```python
352
from ete3 import PhyloTree
353
354
# Create tree and link to alignment
355
tree = PhyloTree("(seq1:0.1,seq2:0.2,seq3:0.15);")
356
tree.link_to_alignment("alignment.fasta")
357
358
# Access sequence data
359
for leaf in tree.get_leaves():
360
print(f"{leaf.name}: {leaf.sequence}")
361
```
362
363
### Tree Reconciliation
364
365
```python
366
from ete3 import PhyloTree
367
368
# Gene tree and species tree
369
gene_tree = PhyloTree("(human_gene1:0.1,(chimp_gene1:0.05,chimp_gene2:0.05):0.02);")
370
species_tree = PhyloTree("(human:0.1,chimp:0.1);")
371
372
# Set species naming
373
gene_tree.set_species_naming_function(lambda x: x.split('_')[0])
374
375
# Reconcile trees
376
reconciled = gene_tree.reconcile(species_tree)
377
378
# Check event types
379
for node in reconciled.traverse():
380
if hasattr(node, 'evoltype'):
381
print(f"Node {node.name}: {node.evoltype}")
382
```
383
384
### NCBI Taxonomy Integration
385
386
```python
387
from ete3 import PhyloTree, NCBITaxa
388
389
ncbi = NCBITaxa()
390
391
# Create tree from NCBI taxonomy
392
tree = ncbi.get_topology([9606, 9598, 9597]) # Human, chimp, bonobo
393
394
# Compare with gene tree
395
gene_tree = PhyloTree("(human:0.1,(chimp:0.05,bonobo:0.05):0.02);")
396
comparison = gene_tree.ncbi_compare()
397
print(f"Topology conflicts: {comparison}")
398
```