0
# External Format Support
1
2
Support for reading and writing multiple phylogenetic data formats including PhyloXML, NeXML, and various sequence formats. ETE3 provides comprehensive interoperability with standard bioinformatics file formats.
3
4
## Capabilities
5
6
### PhyloXML Format Support
7
8
Complete support for PhyloXML standard for phylogenetic data exchange.
9
10
```python { .api }
11
class Phyloxml:
12
"""
13
PhyloXML format parser and writer for phylogenetic data exchange.
14
"""
15
16
def __init__(self):
17
"""Initialize PhyloXML handler."""
18
19
def build_from_file(self, fname):
20
"""
21
Parse PhyloXML file and build tree structure.
22
23
Parameters:
24
- fname (str): Path to PhyloXML file
25
26
Returns:
27
PhyloxmlTree: Parsed phylogenetic tree with PhyloXML annotations
28
"""
29
30
def export(self, outfile=None):
31
"""
32
Export tree to PhyloXML format.
33
34
Parameters:
35
- outfile (str): Output file path, if None returns string
36
37
Returns:
38
str: PhyloXML formatted string (if outfile is None)
39
"""
40
41
class PhyloxmlTree(PhyloTree):
42
"""
43
Phylogenetic tree with PhyloXML-specific features and annotations.
44
"""
45
46
def __init__(self, phyloxml_file=None):
47
"""
48
Initialize PhyloXML tree.
49
50
Parameters:
51
- phyloxml_file (str): Path to PhyloXML file to load
52
"""
53
54
# PhyloXML-specific properties
55
phyloxml: dict # PhyloXML annotations and metadata
56
confidence: float # Confidence value for branches
57
taxonomy: dict # Taxonomic information
58
sequence: dict # Sequence data and annotations
59
events: dict # Evolutionary events (duplication, speciation)
60
properties: dict # Custom properties from PhyloXML
61
```
62
63
### NeXML Format Support
64
65
Support for NeXML format, the NeXus XML standard for phylogenetic data.
66
67
```python { .api }
68
class Nexml:
69
"""
70
NeXML format parser and writer for phylogenetic data exchange.
71
"""
72
73
def __init__(self):
74
"""Initialize NeXML handler."""
75
76
def build_from_file(self, fname):
77
"""
78
Parse NeXML file and build tree structure.
79
80
Parameters:
81
- fname (str): Path to NeXML file
82
83
Returns:
84
NexmlTree: Parsed phylogenetic tree with NeXML annotations
85
"""
86
87
def export(self, outfile=None):
88
"""
89
Export tree to NeXML format.
90
91
Parameters:
92
- outfile (str): Output file path, if None returns string
93
94
Returns:
95
str: NeXML formatted string (if outfile is None)
96
"""
97
98
class NexmlTree(PhyloTree):
99
"""
100
Phylogenetic tree with NeXML-specific features and annotations.
101
"""
102
103
def __init__(self, nexml_file=None):
104
"""
105
Initialize NeXML tree.
106
107
Parameters:
108
- nexml_file (str): Path to NeXML file to load
109
"""
110
111
# NeXML-specific properties
112
nexml: dict # NeXML annotations and metadata
113
otus: dict # Operational Taxonomic Units information
114
characters: dict # Character data and matrices
115
meta: dict # Metadata annotations
116
```
117
118
### Newick Format Variations
119
120
Enhanced support for different Newick format variations and extensions.
121
122
```python { .api }
123
def read_newick(newick_string, root_node=None, format=0, quoted_node_names=False):
124
"""
125
Parse Newick format string with extensive format support.
126
127
Parameters:
128
- newick_string (str): Newick formatted tree string
129
- root_node (TreeNode): Existing node to use as root
130
- format (int): Newick subformat (0-9)
131
0: flexible with support values
132
1: flexible with internal node names
133
2: all branches + leaf names + internal supports
134
3: all branches + all names
135
4: leaf branches + leaf names
136
5: internal and leaf branches + leaf names
137
6: internal branches + leaf names
138
7: leaf branches + all names
139
8: all names
140
9: leaf names
141
- quoted_node_names (bool): Handle quoted node names with special characters
142
143
Returns:
144
TreeNode: Parsed tree structure
145
"""
146
147
def write_newick(tree, features=None, format=0, format_root_node=True,
148
is_leaf_fn=None, quoted_node_names=False):
149
"""
150
Export tree to Newick format with customizable options.
151
152
Parameters:
153
- tree (TreeNode): Tree to export
154
- features (list): Node features to include in output
155
- format (int): Newick output format (0-9)
156
- format_root_node (bool): Include root node in output
157
- is_leaf_fn (function): Custom function to determine leaf nodes
158
- quoted_node_names (bool): Quote node names with special characters
159
160
Returns:
161
str: Newick formatted string
162
"""
163
```
164
165
### Sequence Format Integration
166
167
Support for various sequence formats when working with phylogenetic data.
168
169
```python { .api }
170
# FASTA format with phylogenetic extensions
171
def read_fasta_with_tree(fasta_file, tree_file=None):
172
"""
173
Read FASTA sequences and optionally associate with tree.
174
175
Parameters:
176
- fasta_file (str): Path to FASTA file
177
- tree_file (str): Optional tree file for sequence-tree association
178
179
Returns:
180
tuple: (SeqGroup, PhyloTree) if tree_file provided, else SeqGroup
181
"""
182
183
# PHYLIP format variations
184
def read_phylip(source, interleaved=False, relaxed=False, tree_names=True):
185
"""
186
Read PHYLIP format with tree compatibility options.
187
188
Parameters:
189
- source (str): PHYLIP file path or string
190
- interleaved (bool): Interleaved PHYLIP format
191
- relaxed (bool): Relaxed naming (>10 characters)
192
- tree_names (bool): Ensure names compatible with tree formats
193
194
Returns:
195
SeqGroup: Parsed sequences
196
"""
197
198
# Nexus format support
199
def read_nexus(nexus_file):
200
"""
201
Read Nexus format files containing trees and/or data.
202
203
Parameters:
204
- nexus_file (str): Path to Nexus file
205
206
Returns:
207
dict: Dictionary containing trees, data, and metadata
208
"""
209
```
210
211
### Format Detection and Auto-parsing
212
213
Automatic format detection and parsing for mixed-format workflows.
214
215
```python { .api }
216
def detect_format(filename):
217
"""
218
Automatically detect file format based on content and extension.
219
220
Parameters:
221
- filename (str): Path to file
222
223
Returns:
224
str: Detected format ("newick", "phyloxml", "nexml", "nexus", "fasta", "phylip")
225
"""
226
227
def auto_parse(filename, **kwargs):
228
"""
229
Automatically parse file using detected format.
230
231
Parameters:
232
- filename (str): File to parse
233
- kwargs: Format-specific parsing options
234
235
Returns:
236
Tree or SeqGroup: Parsed data structure
237
"""
238
```
239
240
## Web Integration
241
242
### WebTreeApplication
243
244
Web-based tree visualization and sharing capabilities.
245
246
```python { .api }
247
class WebTreeApplication:
248
"""
249
Web-based tree visualization application for interactive tree exploration.
250
"""
251
252
def __init__(self, tree, name=None, host="localhost", port=8080):
253
"""
254
Initialize web tree application.
255
256
Parameters:
257
- tree (Tree): Tree to visualize
258
- name (str): Application name
259
- host (str): Server host address
260
- port (int): Server port number
261
"""
262
263
def launch(self, open_browser=True):
264
"""
265
Launch web server for tree visualization.
266
267
Parameters:
268
- open_browser (bool): Automatically open browser
269
270
Returns:
271
str: URL of launched application
272
"""
273
274
def add_tree(self, tree, name=None):
275
"""
276
Add additional tree to web application.
277
278
Parameters:
279
- tree (Tree): Tree to add
280
- name (str): Tree identifier
281
"""
282
283
def set_tree_style(self, tree_style):
284
"""
285
Set default tree style for web display.
286
287
Parameters:
288
- tree_style (TreeStyle): Style configuration
289
"""
290
```
291
292
## Database Integration
293
294
### PhylomeDB Integration
295
296
Integration with PhylomeDB phylogenomic database.
297
298
```python { .api }
299
class PhylomeDB3Connector:
300
"""
301
Interface to PhylomeDB3 phylogenomic database.
302
"""
303
304
def __init__(self, host="phylomedb.org"):
305
"""
306
Initialize PhylomeDB connector.
307
308
Parameters:
309
- host (str): PhylomeDB server hostname
310
"""
311
312
def get_best_tree(self, seed_taxid, target_taxid=None):
313
"""
314
Retrieve best phylogenetic tree for given taxonomic IDs.
315
316
Parameters:
317
- seed_taxid (int): Seed species taxonomic ID
318
- target_taxid (int): Target species taxonomic ID (optional)
319
320
Returns:
321
PhyloTree: Best available phylogenetic tree
322
"""
323
324
def search_trees(self, seed_taxid, target_species=None):
325
"""
326
Search for trees containing specified taxa.
327
328
Parameters:
329
- seed_taxid (int): Seed taxonomic ID
330
- target_species (list): Target species list
331
332
Returns:
333
list: Available trees matching criteria
334
"""
335
336
def get_tree_ages(self, phylome_id):
337
"""
338
Get age estimates for trees in phylome.
339
340
Parameters:
341
- phylome_id (int): PhylomeDB phylome identifier
342
343
Returns:
344
dict: Age estimates for phylome trees
345
"""
346
```
347
348
## Usage Examples
349
350
### PhyloXML Operations
351
352
```python
353
from ete3 import Phyloxml, PhyloxmlTree
354
355
# Parse PhyloXML file
356
phyloxml_parser = Phyloxml()
357
tree = phyloxml_parser.build_from_file("example.phyloxml")
358
359
# Access PhyloXML-specific data
360
for node in tree.traverse():
361
if hasattr(node, 'confidence'):
362
print(f"Node confidence: {node.confidence}")
363
if hasattr(node, 'taxonomy'):
364
print(f"Taxonomy: {node.taxonomy}")
365
366
# Export with annotations
367
output_xml = phyloxml_parser.export("output.phyloxml")
368
```
369
370
### NeXML Operations
371
372
```python
373
from ete3 import Nexml, NexmlTree
374
375
# Parse NeXML file
376
nexml_parser = Nexml()
377
tree = nexml_parser.build_from_file("data.nexml")
378
379
# Access NeXML metadata
380
if hasattr(tree, 'meta'):
381
print(f"Metadata: {tree.meta}")
382
383
# Work with character data
384
if hasattr(tree, 'characters'):
385
print(f"Character matrix: {tree.characters}")
386
387
# Export to NeXML
388
nexml_parser.export("output.nexml")
389
```
390
391
### Format Conversion
392
393
```python
394
from ete3 import Tree, Phyloxml, Nexml
395
396
# Load tree in Newick format
397
tree = Tree("(A:1,(B:1,C:1):0.5);")
398
399
# Convert to PhyloXML
400
phyloxml = Phyloxml()
401
tree.phyloxml = {"description": "Example tree"}
402
phyloxml_output = phyloxml.export()
403
404
# Convert to NeXML
405
nexml = Nexml()
406
tree.nexml = {"title": "Example phylogeny"}
407
nexml_output = nexml.export()
408
409
# Save in different formats
410
with open("tree.phyloxml", "w") as f:
411
f.write(phyloxml_output)
412
413
with open("tree.nexml", "w") as f:
414
f.write(nexml_output)
415
```
416
417
### Auto-parsing and Format Detection
418
419
```python
420
from ete3 import detect_format, auto_parse
421
422
# Detect format automatically
423
file_format = detect_format("unknown_format_file.txt")
424
print(f"Detected format: {file_format}")
425
426
# Parse automatically based on format
427
data = auto_parse("phylogenetic_data.xml")
428
429
if isinstance(data, Tree):
430
print(f"Parsed tree with {len(data)} nodes")
431
elif isinstance(data, SeqGroup):
432
print(f"Parsed {len(data)} sequences")
433
```
434
435
### Web Application
436
437
```python
438
from ete3 import Tree, WebTreeApplication, TreeStyle
439
440
# Create tree and style
441
tree = Tree("(human:1,(chimp:0.5,bonobo:0.5):0.5);")
442
ts = TreeStyle()
443
ts.show_leaf_name = True
444
ts.show_branch_length = True
445
446
# Launch web application
447
webapp = WebTreeApplication(tree, name="Primate Tree")
448
webapp.set_tree_style(ts)
449
url = webapp.launch()
450
451
print(f"Tree visualization available at: {url}")
452
```
453
454
### PhylomeDB Integration
455
456
```python
457
from ete3 import PhylomeDB3Connector
458
459
# Connect to PhylomeDB
460
phylomedb = PhylomeDB3Connector()
461
462
# Search for trees
463
human_trees = phylomedb.search_trees(seed_taxid=9606) # Human
464
print(f"Found {len(human_trees)} trees with human sequences")
465
466
# Get best tree
467
best_tree = phylomedb.get_best_tree(seed_taxid=9606, target_taxid=9598) # Human-chimp
468
print(f"Best tree: {best_tree.get_ascii()}")
469
470
# Get age estimates
471
ages = phylomedb.get_tree_ages(phylome_id=1)
472
print(f"Age estimates: {ages}")
473
```
474
475
### Comprehensive Format Workflow
476
477
```python
478
from ete3 import Tree, SeqGroup, Phyloxml, NCBITaxa
479
480
# Multi-format phylogenetic workflow
481
def process_phylogenetic_data(tree_file, sequence_file=None):
482
# Auto-detect and parse tree
483
tree_format = detect_format(tree_file)
484
tree = auto_parse(tree_file)
485
486
# Load sequences if provided
487
if sequence_file:
488
seq_format = detect_format(sequence_file)
489
sequences = auto_parse(sequence_file)
490
491
# Link sequences to tree
492
if isinstance(tree, PhyloTree):
493
tree.link_to_alignment(sequences)
494
495
# Add taxonomic information
496
ncbi = NCBITaxa()
497
tree = ncbi.annotate_tree(tree)
498
499
# Export in multiple formats
500
results = {
501
'newick': tree.write(format=1),
502
'phyloxml': None,
503
'ascii': tree.get_ascii()
504
}
505
506
# Export to PhyloXML with annotations
507
if hasattr(tree, 'taxonomy'):
508
phyloxml = Phyloxml()
509
results['phyloxml'] = phyloxml.export()
510
511
return tree, results
512
513
# Process data
514
tree, outputs = process_phylogenetic_data("input.nw", "sequences.fasta")
515
516
# Save all formats
517
for format_name, content in outputs.items():
518
if content:
519
with open(f"output.{format_name}", "w") as f:
520
f.write(content)
521
```