Tessl Tile for pypi/dendropy@5.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

character-data.md core-data-models.md data-io.md index.md simulation.md tree-analysis.md visualization-interop.md

data-io.mddocs/

0
# Data Input/Output
1

2
Comprehensive I/O framework supporting all major phylogenetic file formats with configurable reading and writing options. DendroPy handles NEXUS, Newick, NeXML, FASTA, PHYLIP formats with automatic format detection and extensive customization options.
3

4
## Capabilities
5

6
### Universal I/O Methods
7

8
All DendroPy data classes (Tree, TreeList, CharacterMatrix, DataSet) support unified I/O methods for reading and writing data.
9

10
```python { .api }
11
# Factory method for reading from external sources
12
@classmethod
13
def get(cls, **kwargs):
14
    """
15
    Factory method to create object by reading from external source.
16
    
17
    Parameters:
18
    - file: File object or file-like object
19
    - path: File path string
20
    - url: URL string  
21
    - data: Raw data string
22
    - schema: Format specification ('newick', 'nexus', 'nexml', 'fasta', 'phylip')
23
    - preserve_underscores: Keep underscores in taxon names (default: False)
24
    - suppress_internal_node_taxa: Ignore internal node labels as taxa (default: False)
25
    - rooting: How to handle rooting ('force-rooted', 'force-unrooted', 'default-rooted', 'default-unrooted')
26
    - taxon_namespace: TaxonNamespace to use for taxa
27
    - collection_offset: Skip first N items when reading multiple items
28
    - tree_offset: Skip first N trees (for tree sources)
29
    - ignore_unrecognized_keyword_arguments: Suppress warnings for unknown kwargs
30
    
31
    Returns:
32
    New object of appropriate type with data loaded
33
    """
34

35
def read(self, **kwargs):
36
    """
37
    Read data from external source into existing object.
38
    
39
    Same parameters as get() method, but loads into existing object
40
    rather than creating new one.
41
    """
42

43
def write(self, **kwargs):
44
    """
45
    Write object data to external destination.
46
    
47
    Parameters:
48
    - file: File object or file-like object for output
49
    - path: File path string for output
50
    - schema: Output format ('newick', 'nexus', 'nexml', 'fasta', 'phylip')
51
    - suppress_leaf_taxon_labels: Don't write leaf taxon names (default: False)
52
    - suppress_internal_taxon_labels: Don't write internal taxon names (default: False) 
53
    - suppress_rooting: Don't write rooting information (default: False)
54
    - suppress_edge_lengths: Don't write branch lengths (default: False)
55
    - unquoted_underscores: Don't quote underscores in names (default: False)
56
    - preserve_spaces: Keep spaces in taxon names (default: False)
57
    - store_tree_weights: Include tree weights in output (default: False)
58
    - suppress_annotations: Don't write annotations (default: True)
59
    - annotations_as_nhx: Write annotations in NHX format (default: False)
60
    - suppress_item_comments: Don't write item comments
61
    - ignore_unrecognized_keyword_arguments: Suppress warnings for unknown kwargs
62
    """
63

64
def write_to_stream(self, dest, schema, **kwargs):
65
    """Write data to stream in specified format."""
66
```
67

68
### Format-Specific Reading
69

70
DendroPy supports reading from multiple phylogenetic file formats with format-specific options.
71

72
```python { .api }
73
# Newick format options (for trees)
74
Tree.get(path="tree.nwk", schema="newick", 
75
         rooting="default-unrooted",
76
         preserve_underscores=True)
77

78
# NEXUS format options (for trees and character data)  
79
TreeList.get(path="trees.nex", schema="nexus",
80
             preserve_underscores=False,
81
             suppress_internal_node_taxa=True)
82

83
# NeXML format options
84
DataSet.get(path="data.xml", schema="nexml")
85

86
# FASTA format options (for character matrices)
87
DnaCharacterMatrix.get(path="seqs.fasta", schema="fasta",
88
                       data_type="dna")
89

90
# PHYLIP format options
91
ProteinCharacterMatrix.get(path="alignment.phy", schema="phylip",
92
                           multispace_delimiter=True,
93
                           interleaved=False)
94
```
95

96
### Format-Specific Writing
97

98
Write data in various phylogenetic formats with extensive customization options.
99

100
```python { .api }
101
# Newick output with options
102
tree.write(path="output.nwk", schema="newick",
103
           suppress_edge_lengths=False,
104
           suppress_leaf_taxon_labels=False,
105
           unquoted_underscores=True)
106

107
# NEXUS output with metadata
108
trees.write(path="output.nex", schema="nexus", 
109
            suppress_rooting=False,
110
            store_tree_weights=True,
111
            suppress_annotations=False)
112

113
# NeXML structured output
114
dataset.write(path="output.xml", schema="nexml")
115

116
# FASTA sequence output
117
char_matrix.write(path="output.fasta", schema="fasta",
118
                  wrap_width=70)
119

120
# PHYLIP alignment output  
121
char_matrix.write(path="output.phy", schema="phylip",
122
                  force_unique_taxon_labels=True,
123
                  spaces_to_underscores=True)
124
```
125

126
### I/O Factory Functions
127

128
Factory functions for creating format-specific readers, writers, and tree yielders.
129

130
```python { .api }
131
def get_reader(schema, **kwargs):
132
    """
133
    Get reader instance for specified format.
134
    
135
    Parameters:
136
    - schema: Format name ('newick', 'nexus', 'nexml', 'fasta', 'phylip')
137
    - **kwargs: Format-specific options
138
    
139
    Returns:
140
    Reader object for specified format
141
    """
142

143
def get_writer(schema, **kwargs):
144
    """
145
    Get writer instance for specified format.
146
    
147
    Parameters:
148
    - schema: Format name ('newick', 'nexus', 'nexml', 'fasta', 'phylip')
149
    - **kwargs: Format-specific options
150
    
151
    Returns:
152
    Writer object for specified format
153
    """
154

155
def get_tree_yielder(files, schema, **kwargs):
156
    """
157
    Get iterator for reading trees from multiple files.
158
    
159
    Parameters:
160
    - files: List of file paths or file objects
161
    - schema: Format specification
162
    - **kwargs: Format-specific options
163
    
164
    Returns:
165
    Iterator yielding Tree objects
166
    """
167
```
168

169
### Streaming Tree I/O
170

171
For large tree collections, DendroPy provides memory-efficient streaming iterators.
172

173
```python { .api }
174
# Stream trees from single file
175
for tree in Tree.yield_from_files([path], schema="nexus"):
176
    # Process one tree at a time without loading all into memory
177
    print(f"Tree has {len(tree.leaf_nodes())} leaves")
178

179
# Stream trees from multiple files
180
tree_files = ["trees1.nex", "trees2.nex", "trees3.nex"]
181
for tree in Tree.yield_from_files(tree_files, schema="nexus"):
182
    # Process trees from all files sequentially
183
    analyze_tree(tree)
184

185
# Tree yielder with filtering
186
def large_tree_filter(tree):
187
    return len(tree.leaf_nodes()) > 100
188

189
for tree in Tree.yield_from_files([path], schema="newick", 
190
                                  tree_filter=large_tree_filter):
191
    # Only process trees with >100 leaves
192
    process_large_tree(tree)
193
```
194

195
### Character Matrix I/O
196

197
Specialized I/O methods for different types of character data with format-specific options.
198

199
```python { .api }
200
# DNA sequence matrices
201
dna_matrix = DnaCharacterMatrix.get(
202
    path="alignment.fasta", 
203
    schema="fasta",
204
    data_type="dna"
205
)
206

207
# Protein sequence matrices  
208
protein_matrix = ProteinCharacterMatrix.get(
209
    path="proteins.fasta",
210
    schema="fasta", 
211
    data_type="protein"
212
)
213

214
# Standard morphological matrices
215
morpho_matrix = StandardCharacterMatrix.get(
216
    path="morphology.nex",
217
    schema="nexus",
218
    default_state_alphabet=BINARY_STATE_ALPHABET
219
)
220

221
# Continuous character matrices
222
continuous_matrix = ContinuousCharacterMatrix.get(
223
    path="measurements.nex", 
224
    schema="nexus"
225
)
226

227
# Writing character matrices with format options
228
dna_matrix.write(
229
    path="output.phy",
230
    schema="phylip",
231
    strict=True,  # Strict PHYLIP format
232
    spaces_to_underscores=True,
233
    force_unique_taxon_labels=True
234
)
235
```
236

237
### Multi-Format Dataset I/O
238

239
DataSet objects can read and write files containing multiple data types.
240

241
```python { .api }
242
# Read mixed data (trees + character matrices)
243
dataset = DataSet.get(path="combined.nex", schema="nexus")
244

245
# Access different data types
246
for tree_list in dataset.tree_lists:
247
    print(f"Tree list has {len(tree_list)} trees")
248

249
for char_matrix in dataset.char_matrices:
250
    print(f"Character matrix: {type(char_matrix).__name__}")
251
    print(f"  {len(char_matrix)} taxa, {char_matrix.max_sequence_size} characters")
252

253
# Write entire dataset
254
dataset.write(path="complete_dataset.xml", schema="nexml")
255
```
256

257
### Format Support Details
258

259
```python { .api }
260
# Supported input/output schemas
261
SUPPORTED_SCHEMAS = [
262
    "newick",           # Newick tree format  
263
    "nexus",            # NEXUS format (trees + character data)
264
    "nexml",            # NeXML format (XML-based)
265
    "fasta",            # FASTA sequence format
266
    "phylip",           # PHYLIP format variants
267
    "phylip-relaxed",   # Relaxed PHYLIP format
268
    "fasta-relaxed",    # FASTA with relaxed parsing
269
]
270

271
# Character data types
272
CHARACTER_DATA_TYPES = [
273
    "dna",              # DNA sequences
274
    "rna",              # RNA sequences  
275
    "protein",          # Protein sequences
276
    "nucleotide",       # General nucleotide
277
    "standard",         # Standard morphological
278
    "continuous",       # Continuous characters
279
    "restriction",      # Restriction sites
280
    "infinite-sites",   # Infinite sites
281
]
282
```
283

284
## Reader and Writer Classes
285

286
```python { .api }
287
# Format-specific reader classes
288
class NewickReader:
289
    """Reader for Newick format trees."""
290
    def __init__(self, **kwargs): ...
291
    def read(self, stream): ...
292

293
class NexusReader:
294
    """Reader for NEXUS format files."""
295
    def __init__(self, **kwargs): ...
296
    def read(self, stream): ...
297

298
class NexmlReader:
299
    """Reader for NeXML format files."""
300
    def __init__(self, **kwargs): ...
301
    def read(self, stream): ...
302

303
class FastaReader:
304
    """Reader for FASTA sequence files."""
305
    def __init__(self, **kwargs): ...
306
    def read(self, stream): ...
307

308
class PhylipReader:
309
    """Reader for PHYLIP format files."""
310
    def __init__(self, **kwargs): ...
311
    def read(self, stream): ...
312

313
# Format-specific writer classes
314
class NewickWriter:
315
    """Writer for Newick format trees."""
316
    def __init__(self, **kwargs): ...
317
    def write(self, obj, stream): ...
318

319
class NexusWriter:
320
    """Writer for NEXUS format files."""
321
    def __init__(self, **kwargs): ...
322
    def write(self, obj, stream): ...
323

324
class NexmlWriter:
325
    """Writer for NeXML format files."""  
326
    def __init__(self, **kwargs): ...
327
    def write(self, obj, stream): ...
328

329
class FastaWriter:
330
    """Writer for FASTA sequence files."""
331
    def __init__(self, **kwargs): ...
332
    def write(self, obj, stream): ...
333

334
class PhylipWriter:
335
    """Writer for PHYLIP format files."""
336
    def __init__(self, **kwargs): ...
337
    def write(self, obj, stream): ...
338
```
339

340
### Error Handling
341

342
```python { .api }
343
# I/O related exceptions
344
class DataParseError(Exception):
345
    """Raised when data cannot be parsed in expected format."""
346

347
class UnsupportedSchemaError(Exception):
348
    """Raised when unsupported file format is specified."""
349

350
class UnspecifiedSchemaError(Exception):
351
    """Raised when file format is not specified and cannot be auto-detected."""
352

353
class UnspecifiedSourceError(Exception):
354
    """Raised when no data source is provided."""
355
```

Version

Tile

Files

data-io.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-io.mddocs/