0
# Data Input/Output
1
2
Comprehensive I/O framework supporting all major phylogenetic file formats with configurable reading and writing options. DendroPy handles NEXUS, Newick, NeXML, FASTA, PHYLIP formats with automatic format detection and extensive customization options.
3
4
## Capabilities
5
6
### Universal I/O Methods
7
8
All DendroPy data classes (Tree, TreeList, CharacterMatrix, DataSet) support unified I/O methods for reading and writing data.
9
10
```python { .api }
11
# Factory method for reading from external sources
12
@classmethod
13
def get(cls, **kwargs):
14
"""
15
Factory method to create object by reading from external source.
16
17
Parameters:
18
- file: File object or file-like object
19
- path: File path string
20
- url: URL string
21
- data: Raw data string
22
- schema: Format specification ('newick', 'nexus', 'nexml', 'fasta', 'phylip')
23
- preserve_underscores: Keep underscores in taxon names (default: False)
24
- suppress_internal_node_taxa: Ignore internal node labels as taxa (default: False)
25
- rooting: How to handle rooting ('force-rooted', 'force-unrooted', 'default-rooted', 'default-unrooted')
26
- taxon_namespace: TaxonNamespace to use for taxa
27
- collection_offset: Skip first N items when reading multiple items
28
- tree_offset: Skip first N trees (for tree sources)
29
- ignore_unrecognized_keyword_arguments: Suppress warnings for unknown kwargs
30
31
Returns:
32
New object of appropriate type with data loaded
33
"""
34
35
def read(self, **kwargs):
36
"""
37
Read data from external source into existing object.
38
39
Same parameters as get() method, but loads into existing object
40
rather than creating new one.
41
"""
42
43
def write(self, **kwargs):
44
"""
45
Write object data to external destination.
46
47
Parameters:
48
- file: File object or file-like object for output
49
- path: File path string for output
50
- schema: Output format ('newick', 'nexus', 'nexml', 'fasta', 'phylip')
51
- suppress_leaf_taxon_labels: Don't write leaf taxon names (default: False)
52
- suppress_internal_taxon_labels: Don't write internal taxon names (default: False)
53
- suppress_rooting: Don't write rooting information (default: False)
54
- suppress_edge_lengths: Don't write branch lengths (default: False)
55
- unquoted_underscores: Don't quote underscores in names (default: False)
56
- preserve_spaces: Keep spaces in taxon names (default: False)
57
- store_tree_weights: Include tree weights in output (default: False)
58
- suppress_annotations: Don't write annotations (default: True)
59
- annotations_as_nhx: Write annotations in NHX format (default: False)
60
- suppress_item_comments: Don't write item comments
61
- ignore_unrecognized_keyword_arguments: Suppress warnings for unknown kwargs
62
"""
63
64
def write_to_stream(self, dest, schema, **kwargs):
65
"""Write data to stream in specified format."""
66
```
67
68
### Format-Specific Reading
69
70
DendroPy supports reading from multiple phylogenetic file formats with format-specific options.
71
72
```python { .api }
73
# Newick format options (for trees)
74
Tree.get(path="tree.nwk", schema="newick",
75
rooting="default-unrooted",
76
preserve_underscores=True)
77
78
# NEXUS format options (for trees and character data)
79
TreeList.get(path="trees.nex", schema="nexus",
80
preserve_underscores=False,
81
suppress_internal_node_taxa=True)
82
83
# NeXML format options
84
DataSet.get(path="data.xml", schema="nexml")
85
86
# FASTA format options (for character matrices)
87
DnaCharacterMatrix.get(path="seqs.fasta", schema="fasta",
88
data_type="dna")
89
90
# PHYLIP format options
91
ProteinCharacterMatrix.get(path="alignment.phy", schema="phylip",
92
multispace_delimiter=True,
93
interleaved=False)
94
```
95
96
### Format-Specific Writing
97
98
Write data in various phylogenetic formats with extensive customization options.
99
100
```python { .api }
101
# Newick output with options
102
tree.write(path="output.nwk", schema="newick",
103
suppress_edge_lengths=False,
104
suppress_leaf_taxon_labels=False,
105
unquoted_underscores=True)
106
107
# NEXUS output with metadata
108
trees.write(path="output.nex", schema="nexus",
109
suppress_rooting=False,
110
store_tree_weights=True,
111
suppress_annotations=False)
112
113
# NeXML structured output
114
dataset.write(path="output.xml", schema="nexml")
115
116
# FASTA sequence output
117
char_matrix.write(path="output.fasta", schema="fasta",
118
wrap_width=70)
119
120
# PHYLIP alignment output
121
char_matrix.write(path="output.phy", schema="phylip",
122
force_unique_taxon_labels=True,
123
spaces_to_underscores=True)
124
```
125
126
### I/O Factory Functions
127
128
Factory functions for creating format-specific readers, writers, and tree yielders.
129
130
```python { .api }
131
def get_reader(schema, **kwargs):
132
"""
133
Get reader instance for specified format.
134
135
Parameters:
136
- schema: Format name ('newick', 'nexus', 'nexml', 'fasta', 'phylip')
137
- **kwargs: Format-specific options
138
139
Returns:
140
Reader object for specified format
141
"""
142
143
def get_writer(schema, **kwargs):
144
"""
145
Get writer instance for specified format.
146
147
Parameters:
148
- schema: Format name ('newick', 'nexus', 'nexml', 'fasta', 'phylip')
149
- **kwargs: Format-specific options
150
151
Returns:
152
Writer object for specified format
153
"""
154
155
def get_tree_yielder(files, schema, **kwargs):
156
"""
157
Get iterator for reading trees from multiple files.
158
159
Parameters:
160
- files: List of file paths or file objects
161
- schema: Format specification
162
- **kwargs: Format-specific options
163
164
Returns:
165
Iterator yielding Tree objects
166
"""
167
```
168
169
### Streaming Tree I/O
170
171
For large tree collections, DendroPy provides memory-efficient streaming iterators.
172
173
```python { .api }
174
# Stream trees from single file
175
for tree in Tree.yield_from_files([path], schema="nexus"):
176
# Process one tree at a time without loading all into memory
177
print(f"Tree has {len(tree.leaf_nodes())} leaves")
178
179
# Stream trees from multiple files
180
tree_files = ["trees1.nex", "trees2.nex", "trees3.nex"]
181
for tree in Tree.yield_from_files(tree_files, schema="nexus"):
182
# Process trees from all files sequentially
183
analyze_tree(tree)
184
185
# Tree yielder with filtering
186
def large_tree_filter(tree):
187
return len(tree.leaf_nodes()) > 100
188
189
for tree in Tree.yield_from_files([path], schema="newick",
190
tree_filter=large_tree_filter):
191
# Only process trees with >100 leaves
192
process_large_tree(tree)
193
```
194
195
### Character Matrix I/O
196
197
Specialized I/O methods for different types of character data with format-specific options.
198
199
```python { .api }
200
# DNA sequence matrices
201
dna_matrix = DnaCharacterMatrix.get(
202
path="alignment.fasta",
203
schema="fasta",
204
data_type="dna"
205
)
206
207
# Protein sequence matrices
208
protein_matrix = ProteinCharacterMatrix.get(
209
path="proteins.fasta",
210
schema="fasta",
211
data_type="protein"
212
)
213
214
# Standard morphological matrices
215
morpho_matrix = StandardCharacterMatrix.get(
216
path="morphology.nex",
217
schema="nexus",
218
default_state_alphabet=BINARY_STATE_ALPHABET
219
)
220
221
# Continuous character matrices
222
continuous_matrix = ContinuousCharacterMatrix.get(
223
path="measurements.nex",
224
schema="nexus"
225
)
226
227
# Writing character matrices with format options
228
dna_matrix.write(
229
path="output.phy",
230
schema="phylip",
231
strict=True, # Strict PHYLIP format
232
spaces_to_underscores=True,
233
force_unique_taxon_labels=True
234
)
235
```
236
237
### Multi-Format Dataset I/O
238
239
DataSet objects can read and write files containing multiple data types.
240
241
```python { .api }
242
# Read mixed data (trees + character matrices)
243
dataset = DataSet.get(path="combined.nex", schema="nexus")
244
245
# Access different data types
246
for tree_list in dataset.tree_lists:
247
print(f"Tree list has {len(tree_list)} trees")
248
249
for char_matrix in dataset.char_matrices:
250
print(f"Character matrix: {type(char_matrix).__name__}")
251
print(f" {len(char_matrix)} taxa, {char_matrix.max_sequence_size} characters")
252
253
# Write entire dataset
254
dataset.write(path="complete_dataset.xml", schema="nexml")
255
```
256
257
### Format Support Details
258
259
```python { .api }
260
# Supported input/output schemas
261
SUPPORTED_SCHEMAS = [
262
"newick", # Newick tree format
263
"nexus", # NEXUS format (trees + character data)
264
"nexml", # NeXML format (XML-based)
265
"fasta", # FASTA sequence format
266
"phylip", # PHYLIP format variants
267
"phylip-relaxed", # Relaxed PHYLIP format
268
"fasta-relaxed", # FASTA with relaxed parsing
269
]
270
271
# Character data types
272
CHARACTER_DATA_TYPES = [
273
"dna", # DNA sequences
274
"rna", # RNA sequences
275
"protein", # Protein sequences
276
"nucleotide", # General nucleotide
277
"standard", # Standard morphological
278
"continuous", # Continuous characters
279
"restriction", # Restriction sites
280
"infinite-sites", # Infinite sites
281
]
282
```
283
284
## Reader and Writer Classes
285
286
```python { .api }
287
# Format-specific reader classes
288
class NewickReader:
289
"""Reader for Newick format trees."""
290
def __init__(self, **kwargs): ...
291
def read(self, stream): ...
292
293
class NexusReader:
294
"""Reader for NEXUS format files."""
295
def __init__(self, **kwargs): ...
296
def read(self, stream): ...
297
298
class NexmlReader:
299
"""Reader for NeXML format files."""
300
def __init__(self, **kwargs): ...
301
def read(self, stream): ...
302
303
class FastaReader:
304
"""Reader for FASTA sequence files."""
305
def __init__(self, **kwargs): ...
306
def read(self, stream): ...
307
308
class PhylipReader:
309
"""Reader for PHYLIP format files."""
310
def __init__(self, **kwargs): ...
311
def read(self, stream): ...
312
313
# Format-specific writer classes
314
class NewickWriter:
315
"""Writer for Newick format trees."""
316
def __init__(self, **kwargs): ...
317
def write(self, obj, stream): ...
318
319
class NexusWriter:
320
"""Writer for NEXUS format files."""
321
def __init__(self, **kwargs): ...
322
def write(self, obj, stream): ...
323
324
class NexmlWriter:
325
"""Writer for NeXML format files."""
326
def __init__(self, **kwargs): ...
327
def write(self, obj, stream): ...
328
329
class FastaWriter:
330
"""Writer for FASTA sequence files."""
331
def __init__(self, **kwargs): ...
332
def write(self, obj, stream): ...
333
334
class PhylipWriter:
335
"""Writer for PHYLIP format files."""
336
def __init__(self, **kwargs): ...
337
def write(self, obj, stream): ...
338
```
339
340
### Error Handling
341
342
```python { .api }
343
# I/O related exceptions
344
class DataParseError(Exception):
345
"""Raised when data cannot be parsed in expected format."""
346
347
class UnsupportedSchemaError(Exception):
348
"""Raised when unsupported file format is specified."""
349
350
class UnspecifiedSchemaError(Exception):
351
"""Raised when file format is not specified and cannot be auto-detected."""
352
353
class UnspecifiedSourceError(Exception):
354
"""Raised when no data source is provided."""
355
```