or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

character-data.mdcore-data-models.mddata-io.mdindex.mdsimulation.mdtree-analysis.mdvisualization-interop.md

data-io.mddocs/

0

# Data Input/Output

1

2

Comprehensive I/O framework supporting all major phylogenetic file formats with configurable reading and writing options. DendroPy handles NEXUS, Newick, NeXML, FASTA, PHYLIP formats with automatic format detection and extensive customization options.

3

4

## Capabilities

5

6

### Universal I/O Methods

7

8

All DendroPy data classes (Tree, TreeList, CharacterMatrix, DataSet) support unified I/O methods for reading and writing data.

9

10

```python { .api }

11

# Factory method for reading from external sources

12

@classmethod

13

def get(cls, **kwargs):

14

"""

15

Factory method to create object by reading from external source.

16

17

Parameters:

18

- file: File object or file-like object

19

- path: File path string

20

- url: URL string

21

- data: Raw data string

22

- schema: Format specification ('newick', 'nexus', 'nexml', 'fasta', 'phylip')

23

- preserve_underscores: Keep underscores in taxon names (default: False)

24

- suppress_internal_node_taxa: Ignore internal node labels as taxa (default: False)

25

- rooting: How to handle rooting ('force-rooted', 'force-unrooted', 'default-rooted', 'default-unrooted')

26

- taxon_namespace: TaxonNamespace to use for taxa

27

- collection_offset: Skip first N items when reading multiple items

28

- tree_offset: Skip first N trees (for tree sources)

29

- ignore_unrecognized_keyword_arguments: Suppress warnings for unknown kwargs

30

31

Returns:

32

New object of appropriate type with data loaded

33

"""

34

35

def read(self, **kwargs):

36

"""

37

Read data from external source into existing object.

38

39

Same parameters as get() method, but loads into existing object

40

rather than creating new one.

41

"""

42

43

def write(self, **kwargs):

44

"""

45

Write object data to external destination.

46

47

Parameters:

48

- file: File object or file-like object for output

49

- path: File path string for output

50

- schema: Output format ('newick', 'nexus', 'nexml', 'fasta', 'phylip')

51

- suppress_leaf_taxon_labels: Don't write leaf taxon names (default: False)

52

- suppress_internal_taxon_labels: Don't write internal taxon names (default: False)

53

- suppress_rooting: Don't write rooting information (default: False)

54

- suppress_edge_lengths: Don't write branch lengths (default: False)

55

- unquoted_underscores: Don't quote underscores in names (default: False)

56

- preserve_spaces: Keep spaces in taxon names (default: False)

57

- store_tree_weights: Include tree weights in output (default: False)

58

- suppress_annotations: Don't write annotations (default: True)

59

- annotations_as_nhx: Write annotations in NHX format (default: False)

60

- suppress_item_comments: Don't write item comments

61

- ignore_unrecognized_keyword_arguments: Suppress warnings for unknown kwargs

62

"""

63

64

def write_to_stream(self, dest, schema, **kwargs):

65

"""Write data to stream in specified format."""

66

```

67

68

### Format-Specific Reading

69

70

DendroPy supports reading from multiple phylogenetic file formats with format-specific options.

71

72

```python { .api }

73

# Newick format options (for trees)

74

Tree.get(path="tree.nwk", schema="newick",

75

rooting="default-unrooted",

76

preserve_underscores=True)

77

78

# NEXUS format options (for trees and character data)

79

TreeList.get(path="trees.nex", schema="nexus",

80

preserve_underscores=False,

81

suppress_internal_node_taxa=True)

82

83

# NeXML format options

84

DataSet.get(path="data.xml", schema="nexml")

85

86

# FASTA format options (for character matrices)

87

DnaCharacterMatrix.get(path="seqs.fasta", schema="fasta",

88

data_type="dna")

89

90

# PHYLIP format options

91

ProteinCharacterMatrix.get(path="alignment.phy", schema="phylip",

92

multispace_delimiter=True,

93

interleaved=False)

94

```

95

96

### Format-Specific Writing

97

98

Write data in various phylogenetic formats with extensive customization options.

99

100

```python { .api }

101

# Newick output with options

102

tree.write(path="output.nwk", schema="newick",

103

suppress_edge_lengths=False,

104

suppress_leaf_taxon_labels=False,

105

unquoted_underscores=True)

106

107

# NEXUS output with metadata

108

trees.write(path="output.nex", schema="nexus",

109

suppress_rooting=False,

110

store_tree_weights=True,

111

suppress_annotations=False)

112

113

# NeXML structured output

114

dataset.write(path="output.xml", schema="nexml")

115

116

# FASTA sequence output

117

char_matrix.write(path="output.fasta", schema="fasta",

118

wrap_width=70)

119

120

# PHYLIP alignment output

121

char_matrix.write(path="output.phy", schema="phylip",

122

force_unique_taxon_labels=True,

123

spaces_to_underscores=True)

124

```

125

126

### I/O Factory Functions

127

128

Factory functions for creating format-specific readers, writers, and tree yielders.

129

130

```python { .api }

131

def get_reader(schema, **kwargs):

132

"""

133

Get reader instance for specified format.

134

135

Parameters:

136

- schema: Format name ('newick', 'nexus', 'nexml', 'fasta', 'phylip')

137

- **kwargs: Format-specific options

138

139

Returns:

140

Reader object for specified format

141

"""

142

143

def get_writer(schema, **kwargs):

144

"""

145

Get writer instance for specified format.

146

147

Parameters:

148

- schema: Format name ('newick', 'nexus', 'nexml', 'fasta', 'phylip')

149

- **kwargs: Format-specific options

150

151

Returns:

152

Writer object for specified format

153

"""

154

155

def get_tree_yielder(files, schema, **kwargs):

156

"""

157

Get iterator for reading trees from multiple files.

158

159

Parameters:

160

- files: List of file paths or file objects

161

- schema: Format specification

162

- **kwargs: Format-specific options

163

164

Returns:

165

Iterator yielding Tree objects

166

"""

167

```

168

169

### Streaming Tree I/O

170

171

For large tree collections, DendroPy provides memory-efficient streaming iterators.

172

173

```python { .api }

174

# Stream trees from single file

175

for tree in Tree.yield_from_files([path], schema="nexus"):

176

# Process one tree at a time without loading all into memory

177

print(f"Tree has {len(tree.leaf_nodes())} leaves")

178

179

# Stream trees from multiple files

180

tree_files = ["trees1.nex", "trees2.nex", "trees3.nex"]

181

for tree in Tree.yield_from_files(tree_files, schema="nexus"):

182

# Process trees from all files sequentially

183

analyze_tree(tree)

184

185

# Tree yielder with filtering

186

def large_tree_filter(tree):

187

return len(tree.leaf_nodes()) > 100

188

189

for tree in Tree.yield_from_files([path], schema="newick",

190

tree_filter=large_tree_filter):

191

# Only process trees with >100 leaves

192

process_large_tree(tree)

193

```

194

195

### Character Matrix I/O

196

197

Specialized I/O methods for different types of character data with format-specific options.

198

199

```python { .api }

200

# DNA sequence matrices

201

dna_matrix = DnaCharacterMatrix.get(

202

path="alignment.fasta",

203

schema="fasta",

204

data_type="dna"

205

)

206

207

# Protein sequence matrices

208

protein_matrix = ProteinCharacterMatrix.get(

209

path="proteins.fasta",

210

schema="fasta",

211

data_type="protein"

212

)

213

214

# Standard morphological matrices

215

morpho_matrix = StandardCharacterMatrix.get(

216

path="morphology.nex",

217

schema="nexus",

218

default_state_alphabet=BINARY_STATE_ALPHABET

219

)

220

221

# Continuous character matrices

222

continuous_matrix = ContinuousCharacterMatrix.get(

223

path="measurements.nex",

224

schema="nexus"

225

)

226

227

# Writing character matrices with format options

228

dna_matrix.write(

229

path="output.phy",

230

schema="phylip",

231

strict=True, # Strict PHYLIP format

232

spaces_to_underscores=True,

233

force_unique_taxon_labels=True

234

)

235

```

236

237

### Multi-Format Dataset I/O

238

239

DataSet objects can read and write files containing multiple data types.

240

241

```python { .api }

242

# Read mixed data (trees + character matrices)

243

dataset = DataSet.get(path="combined.nex", schema="nexus")

244

245

# Access different data types

246

for tree_list in dataset.tree_lists:

247

print(f"Tree list has {len(tree_list)} trees")

248

249

for char_matrix in dataset.char_matrices:

250

print(f"Character matrix: {type(char_matrix).__name__}")

251

print(f" {len(char_matrix)} taxa, {char_matrix.max_sequence_size} characters")

252

253

# Write entire dataset

254

dataset.write(path="complete_dataset.xml", schema="nexml")

255

```

256

257

### Format Support Details

258

259

```python { .api }

260

# Supported input/output schemas

261

SUPPORTED_SCHEMAS = [

262

"newick", # Newick tree format

263

"nexus", # NEXUS format (trees + character data)

264

"nexml", # NeXML format (XML-based)

265

"fasta", # FASTA sequence format

266

"phylip", # PHYLIP format variants

267

"phylip-relaxed", # Relaxed PHYLIP format

268

"fasta-relaxed", # FASTA with relaxed parsing

269

]

270

271

# Character data types

272

CHARACTER_DATA_TYPES = [

273

"dna", # DNA sequences

274

"rna", # RNA sequences

275

"protein", # Protein sequences

276

"nucleotide", # General nucleotide

277

"standard", # Standard morphological

278

"continuous", # Continuous characters

279

"restriction", # Restriction sites

280

"infinite-sites", # Infinite sites

281

]

282

```

283

284

## Reader and Writer Classes

285

286

```python { .api }

287

# Format-specific reader classes

288

class NewickReader:

289

"""Reader for Newick format trees."""

290

def __init__(self, **kwargs): ...

291

def read(self, stream): ...

292

293

class NexusReader:

294

"""Reader for NEXUS format files."""

295

def __init__(self, **kwargs): ...

296

def read(self, stream): ...

297

298

class NexmlReader:

299

"""Reader for NeXML format files."""

300

def __init__(self, **kwargs): ...

301

def read(self, stream): ...

302

303

class FastaReader:

304

"""Reader for FASTA sequence files."""

305

def __init__(self, **kwargs): ...

306

def read(self, stream): ...

307

308

class PhylipReader:

309

"""Reader for PHYLIP format files."""

310

def __init__(self, **kwargs): ...

311

def read(self, stream): ...

312

313

# Format-specific writer classes

314

class NewickWriter:

315

"""Writer for Newick format trees."""

316

def __init__(self, **kwargs): ...

317

def write(self, obj, stream): ...

318

319

class NexusWriter:

320

"""Writer for NEXUS format files."""

321

def __init__(self, **kwargs): ...

322

def write(self, obj, stream): ...

323

324

class NexmlWriter:

325

"""Writer for NeXML format files."""

326

def __init__(self, **kwargs): ...

327

def write(self, obj, stream): ...

328

329

class FastaWriter:

330

"""Writer for FASTA sequence files."""

331

def __init__(self, **kwargs): ...

332

def write(self, obj, stream): ...

333

334

class PhylipWriter:

335

"""Writer for PHYLIP format files."""

336

def __init__(self, **kwargs): ...

337

def write(self, obj, stream): ...

338

```

339

340

### Error Handling

341

342

```python { .api }

343

# I/O related exceptions

344

class DataParseError(Exception):

345

"""Raised when data cannot be parsed in expected format."""

346

347

class UnsupportedSchemaError(Exception):

348

"""Raised when unsupported file format is specified."""

349

350

class UnspecifiedSchemaError(Exception):

351

"""Raised when file format is not specified and cannot be auto-detected."""

352

353

class UnspecifiedSourceError(Exception):

354

"""Raised when no data source is provided."""

355

```