0
# DendroPy
1
2
DendroPy is a comprehensive Python library for phylogenetic computing, providing extensive functionality for reading, writing, simulation, processing, and manipulation of phylogenetic trees and character matrices. It supports the reading and writing of phylogenetic data in multiple formats (NEXUS, NEWICK, NeXML, Phylip, FASTA), and can function as a stand-alone library for phylogenetics, a component of complex multi-library phyloinformatic pipelines, or as scripting "glue" that assembles and drives such pipelines.
3
4
## Package Information
5
6
- **Package Name**: DendroPy
7
- **Language**: Python
8
- **Installation**: `pip install dendropy` or `conda install -c conda-forge dendropy`
9
- **Documentation**: https://jeetsukumaran.github.io/DendroPy/
10
11
## Core Imports
12
13
```python
14
import dendropy
15
```
16
17
Most common usage patterns:
18
19
```python
20
# Core data model classes
21
from dendropy import Tree, Node, Taxon, TaxonNamespace
22
from dendropy import DataSet, TreeList
23
24
# Character matrices
25
from dendropy import DnaCharacterMatrix, ProteinCharacterMatrix
26
from dendropy import StandardCharacterMatrix, ContinuousCharacterMatrix
27
28
# Predefined alphabets
29
from dendropy import DNA_STATE_ALPHABET, PROTEIN_STATE_ALPHABET
30
31
# Tree analysis and comparison (from submodules)
32
from dendropy.calculate.treecompare import unweighted_robinson_foulds_distance
33
from dendropy.calculate.phylogeneticdistance import PhylogeneticDistanceMatrix
34
```
35
36
## Basic Usage
37
38
```python
39
import dendropy
40
41
# Read a tree from a file
42
tree = dendropy.Tree.get(path="example.nwk", schema="newick")
43
print(f"Tree has {len(tree.leaf_nodes())} taxa")
44
45
# Read multiple trees
46
trees = dendropy.TreeList.get(path="trees.nex", schema="nexus")
47
print(f"Read {len(trees)} trees")
48
49
# Create a simple tree manually
50
tns = dendropy.TaxonNamespace(["A", "B", "C", "D"])
51
tree = dendropy.Tree(taxon_namespace=tns)
52
root = tree.seed_node
53
n1 = root.new_child(taxon=tns[0]) # Leaf A
54
n2 = root.new_child()
55
n2.new_child(taxon=tns[1]) # Leaf B
56
n3 = n2.new_child()
57
n3.new_child(taxon=tns[2]) # Leaf C
58
n3.new_child(taxon=tns[3]) # Leaf D
59
60
# Work with character data
61
dna = dendropy.DnaCharacterMatrix.get(path="sequences.fasta", schema="fasta")
62
print(f"Matrix has {len(dna)} sequences of length {dna.max_sequence_size}")
63
64
# Basic tree analysis
65
from dendropy.calculate import treecompare
66
rf_distance = treecompare.unweighted_robinson_foulds_distance(tree1, tree2)
67
```
68
69
## Architecture
70
71
DendroPy is built around a hierarchical object model that mirrors phylogenetic data structures:
72
73
- **TaxonNamespace**: Central registry of operational taxonomic units
74
- **Tree**: Phylogenetic tree with nodes and edges, supporting rich metadata
75
- **Node**: Tree vertices with parent-child relationships and optional taxon associations
76
- **DataSet**: Container that can hold multiple trees and character matrices with shared taxon namespaces
77
- **CharacterMatrix**: Alignment data for various sequence types (DNA, protein, morphological)
78
- **I/O Framework**: Pluggable readers and writers for all major phylogenetic file formats
79
80
This design enables seamless integration between tree and character data, automatic taxon namespace management, and flexible data interchange between different phylogenetic file formats.
81
82
## Capabilities
83
84
### Core Data Models
85
86
Fundamental phylogenetic data structures including trees, nodes, taxa, character matrices, and datasets. These classes form the foundation of DendroPy's phylogenetic computing capabilities.
87
88
```python { .api }
89
class Tree:
90
def __init__(self, taxon_namespace=None): ...
91
def get(cls, **kwargs): ...
92
def read(self, **kwargs): ...
93
def write(self, **kwargs): ...
94
def nodes(self): ...
95
def leaf_nodes(self): ...
96
def internal_nodes(self): ...
97
def find_node(self, filter_fn): ...
98
def mrca(self, **kwargs): ...
99
100
class Node:
101
def __init__(self, **kwargs): ...
102
def add_child(self, node): ...
103
def new_child(self, **kwargs): ...
104
def remove_child(self, node): ...
105
def leaf_nodes(self): ...
106
def preorder_iter(self): ...
107
108
class Taxon:
109
def __init__(self, label=None): ...
110
111
class TaxonNamespace:
112
def __init__(self, taxa=None): ...
113
def new_taxon(self, label): ...
114
def get_taxon(self, label): ...
115
```
116
117
[Core Data Models](./core-data-models.md)
118
119
### Data Input/Output
120
121
Comprehensive I/O framework supporting all major phylogenetic file formats with configurable reading and writing options. Handles NEXUS, Newick, NeXML, FASTA, PHYLIP, and more.
122
123
```python { .api }
124
# Tree I/O
125
@classmethod
126
def get(cls, **kwargs): ...
127
def read(self, **kwargs): ...
128
def write(self, **kwargs): ...
129
def write_to_stream(self, dest, schema, **kwargs): ...
130
131
# Factory functions for readers/writers
132
def get_reader(schema, **kwargs): ...
133
def get_writer(schema, **kwargs): ...
134
def get_tree_yielder(files, schema, **kwargs): ...
135
```
136
137
[Data I/O](./data-io.md)
138
139
### Tree Analysis & Comparison
140
141
Phylogenetic tree analysis including distance calculations, tree comparison metrics, summarization algorithms, and topological analysis. Supports Robinson-Foulds distances, consensus trees, and statistical comparisons.
142
143
```python { .api }
144
def symmetric_difference(tree1, tree2): ...
145
def unweighted_robinson_foulds_distance(tree1, tree2): ...
146
def weighted_robinson_foulds_distance(tree1, tree2): ...
147
def euclidean_distance(tree1, tree2): ...
148
149
class PhylogeneticDistanceMatrix:
150
def __init__(self, tree): ...
151
def patristic_distances(self): ...
152
153
class TreeSummarizer:
154
def __init__(self): ...
155
def summarize(self, trees): ...
156
```
157
158
[Tree Analysis](./tree-analysis.md)
159
160
### Character Data & Evolution
161
162
Character matrices for molecular and morphological data, state alphabets, and evolutionary models. Supports DNA, RNA, protein, restriction sites, standard morphological, and continuous character data.
163
164
```python { .api }
165
class DnaCharacterMatrix:
166
def __init__(self, **kwargs): ...
167
def get(cls, **kwargs): ...
168
def read(self, **kwargs): ...
169
def concatenate(self, other_matrices): ...
170
171
class ProteinCharacterMatrix:
172
def __init__(self, **kwargs): ...
173
174
# Predefined alphabets
175
DNA_STATE_ALPHABET: StateAlphabet
176
PROTEIN_STATE_ALPHABET: StateAlphabet
177
BINARY_STATE_ALPHABET: StateAlphabet
178
179
def new_standard_state_alphabet(symbols): ...
180
```
181
182
[Character Data](./character-data.md)
183
184
### Simulation & Modeling
185
186
Tree simulation using birth-death and coalescent processes, character evolution simulation, and phylogenetic modeling. Includes population genetics simulation and parametric evolutionary models.
187
188
```python { .api }
189
def birth_death_tree(birth_rate, death_rate, **kwargs): ...
190
def pure_kingman_tree(taxon_namespace, pop_size): ...
191
def uniform_pure_birth_tree(taxon_namespace, birth_rate): ...
192
193
def simulate_discrete_char_dataset(tree, seq_len, **kwargs): ...
194
def hky85_chars(tree, seq_len, **kwargs): ...
195
196
class DiscreteCharacterEvolver:
197
def __init__(self, **kwargs): ...
198
def evolve_states(self, tree): ...
199
```
200
201
[Simulation](./simulation.md)
202
203
### Visualization & Interoperability
204
205
Tree visualization, plotting, and integration with external phylogenetic software and databases. Supports ASCII tree plots, LaTeX TikZ output, and interfaces to PAUP*, RAxML, R, and other tools.
206
207
```python { .api }
208
class AsciiTreePlot:
209
def __init__(self, **kwargs): ...
210
def compose(self, tree): ...
211
212
class TikzTreePlot:
213
def __init__(self, **kwargs): ...
214
def compose(self, tree): ...
215
216
# External software interfaces
217
class PaupRunner:
218
def __init__(self): ...
219
def run(self, commands): ...
220
221
class RaxmlRunner:
222
def __init__(self): ...
223
def estimate_tree(self, char_matrix): ...
224
```
225
226
[Visualization & Interoperability](./visualization-interop.md)
227
228
## Types
229
230
```python { .api }
231
# Core type aliases and constants
232
from typing import List, Dict, Optional, Union, Callable, Iterator
233
234
TreeList = List[Tree]
235
NodeList = List[Node]
236
TaxonList = List[Taxon]
237
238
# State alphabet constants
239
DNA_STATE_ALPHABET: StateAlphabet
240
RNA_STATE_ALPHABET: StateAlphabet
241
NUCLEOTIDE_STATE_ALPHABET: StateAlphabet
242
PROTEIN_STATE_ALPHABET: StateAlphabet
243
BINARY_STATE_ALPHABET: StateAlphabet
244
RESTRICTION_SITES_STATE_ALPHABET: StateAlphabet
245
INFINITE_SITES_STATE_ALPHABET: StateAlphabet
246
```