or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

character-data.mdcore-data-models.mddata-io.mdindex.mdsimulation.mdtree-analysis.mdvisualization-interop.md

tree-analysis.mddocs/

0

# Tree Analysis & Comparison

1

2

Phylogenetic tree analysis including distance calculations, tree comparison metrics, summarization algorithms, and topological analysis. DendroPy provides comprehensive tools for comparing trees, calculating phylogenetic distances, and summarizing tree collections.

3

4

## Capabilities

5

6

### Tree Comparison Metrics

7

8

Functions for comparing phylogenetic trees using various distance metrics and topological measures.

9

10

```python { .api }

11

# Import tree comparison functions

12

from dendropy.calculate.treecompare import (

13

symmetric_difference,

14

unweighted_robinson_foulds_distance,

15

weighted_robinson_foulds_distance,

16

euclidean_distance

17

)

18

19

# Tree comparison functions

20

def symmetric_difference(tree1, tree2, is_bipartitions_updated=False):

21

"""

22

Calculate symmetric difference between two trees.

23

24

Parameters:

25

- tree1, tree2: Tree objects to compare

26

- is_bipartitions_updated: Whether bipartitions are already calculated

27

28

Returns:

29

int: Number of bipartitions in one tree but not the other

30

"""

31

32

def unweighted_robinson_foulds_distance(tree1, tree2, is_bipartitions_updated=False):

33

"""

34

Calculate unweighted Robinson-Foulds distance between trees.

35

36

Parameters:

37

- tree1, tree2: Tree objects to compare

38

- is_bipartitions_updated: Whether bipartitions are already calculated

39

40

Returns:

41

int: Robinson-Foulds distance (0 = identical topologies)

42

"""

43

44

def weighted_robinson_foulds_distance(tree1, tree2, edge_weight_attr="length", is_bipartitions_updated=False):

45

"""

46

Calculate weighted Robinson-Foulds distance using branch lengths.

47

48

Parameters:

49

- tree1, tree2: Tree objects to compare

50

- edge_weight_attr: Attribute name for edge weights (default: "length")

51

- is_bipartitions_updated: Whether bipartitions are already calculated

52

53

Returns:

54

float: Weighted Robinson-Foulds distance

55

"""

56

57

def euclidean_distance(tree1, tree2, edge_weight_attr="length", is_bipartitions_updated=False, value_type=float):

58

"""

59

Calculate Euclidean distance between trees based on branch lengths.

60

61

Parameters:

62

- tree1, tree2: Tree objects to compare

63

- edge_weight_attr: Attribute name for edge weights

64

- is_bipartitions_updated: Whether bipartitions are already calculated

65

- value_type: Type for calculations (float, Decimal, etc.)

66

67

Returns:

68

float: Euclidean distance between tree vectors

69

"""

70

71

def false_positives_and_negatives(reference_tree, comparison_tree, is_bipartitions_updated=False):

72

"""

73

Calculate false positives and false negatives when comparing trees.

74

75

Parameters:

76

- reference_tree: Reference (true) tree

77

- comparison_tree: Tree being evaluated

78

- is_bipartitions_updated: Whether bipartitions are already calculated

79

80

Returns:

81

tuple: (false_positives, false_negatives)

82

"""

83

84

def find_missing_bipartitions(reference_tree, comparison_tree, is_bipartitions_updated=False):

85

"""

86

Find bipartitions present in reference but missing in comparison tree.

87

88

Parameters:

89

- reference_tree: Reference tree with expected bipartitions

90

- comparison_tree: Tree to check for missing bipartitions

91

- is_bipartitions_updated: Whether bipartitions are already calculated

92

93

Returns:

94

set: Set of Bipartition objects missing from comparison tree

95

"""

96

97

def mason_gamer_kellogg_score(tree):

98

"""

99

Calculate Mason-Gamer-Kellogg score for tree shape.

100

101

Parameters:

102

- tree: Tree object to analyze

103

104

Returns:

105

float: MGK score measuring tree balance

106

"""

107

```

108

109

### Phylogenetic Distance Matrices

110

111

Classes for calculating and storing various types of phylogenetic distances.

112

113

```python { .api }

114

class PhylogeneticDistanceMatrix:

115

"""

116

Matrix of phylogenetic distances between taxa.

117

118

Parameters:

119

- tree: Tree object for distance calculations

120

- taxon_namespace: TaxonNamespace for matrix indexing

121

"""

122

123

def __init__(self, tree=None, taxon_namespace=None): ...

124

125

def patristic_distances(self, tree):

126

"""

127

Calculate patristic (tree path) distances between all taxa.

128

129

Parameters:

130

- tree: Tree for distance calculation

131

132

Returns:

133

None (updates internal distance matrix)

134

"""

135

136

def path_distance(self, taxon1, taxon2):

137

"""

138

Get path distance between two specific taxa.

139

140

Parameters:

141

- taxon1, taxon2: Taxon objects

142

143

Returns:

144

float: Path distance between taxa

145

"""

146

147

def max_dist(self):

148

"""Get maximum distance in matrix."""

149

150

def mean_dist(self):

151

"""Get mean distance in matrix."""

152

153

def distances(self):

154

"""Iterator over all pairwise distances."""

155

156

def taxon_pairs(self):

157

"""Iterator over all taxon pairs."""

158

159

class PatristicDistanceMatrix(PhylogeneticDistanceMatrix):

160

"""Specialized matrix for patristic distances."""

161

162

def __init__(self, tree): ...

163

164

class NodeDistanceMatrix:

165

"""

166

Matrix of distances between tree nodes.

167

168

Parameters:

169

- tree: Tree object for node distance calculations

170

"""

171

172

def __init__(self, tree=None): ...

173

174

def distances(self, node1, node2):

175

"""Get distance between two nodes."""

176

```

177

178

### Tree Summarization

179

180

Tools for summarizing collections of trees and extracting consensus information.

181

182

```python { .api }

183

class TreeSummarizer:

184

"""

185

Summarizes collections of trees into consensus trees and statistics.

186

187

Parameters:

188

- taxon_namespace: TaxonNamespace for summary trees

189

"""

190

191

def __init__(self, taxon_namespace=None): ...

192

193

def summarize(self, trees, min_freq=0.5):

194

"""

195

Create consensus tree from tree collection.

196

197

Parameters:

198

- trees: Iterable of Tree objects

199

- min_freq: Minimum frequency for bipartition inclusion

200

201

Returns:

202

Tree: Consensus tree with support values

203

"""

204

205

def map_support_as_node_ages(self, tree, trees):

206

"""Map bipartition support values as node ages on tree."""

207

208

def map_support_as_node_labels(self, tree, trees, label_format="%.2f"):

209

"""Map bipartition support values as node labels."""

210

211

class TopologyCounter:

212

"""

213

Counts and analyzes tree topologies in collections.

214

215

Parameters:

216

- taxon_namespace: TaxonNamespace for topology comparison

217

"""

218

219

def __init__(self, taxon_namespace=None): ...

220

221

def count(self, trees, topology_counter=None):

222

"""

223

Count unique topologies in tree collection.

224

225

Parameters:

226

- trees: Iterable of Tree objects

227

- topology_counter: Optional existing counter to update

228

229

Returns:

230

dict: Mapping from topology to count

231

"""

232

233

def topology_frequencies(self, trees):

234

"""Get frequencies of different topologies."""

235

236

def unique_topologies(self, trees):

237

"""Get set of unique topologies."""

238

```

239

240

### Bipartition Analysis

241

242

Classes and functions for working with bipartitions (splits) in phylogenetic trees.

243

244

```python { .api }

245

class Bipartition:

246

"""

247

Represents a bipartition (split) in a phylogenetic tree.

248

249

Parameters:

250

- taxon_namespace: TaxonNamespace defining taxa

251

- bitmask: Bitmask representing the split

252

"""

253

254

def __init__(self, taxon_namespace=None, **kwargs): ...

255

256

def split_as_newick_string(self, taxon_namespace=None):

257

"""Return bipartition as Newick string."""

258

259

def leafset_as_newick_string(self, taxon_namespace=None):

260

"""Return leaf set as Newick string."""

261

262

def is_compatible_with(self, other_bipartition):

263

"""Check if bipartition is compatible with another."""

264

265

def is_nested_within(self, other_bipartition):

266

"""Check if bipartition is nested within another."""

267

268

def encode_bipartitions(tree):

269

"""

270

Encode all bipartitions in tree.

271

272

Parameters:

273

- tree: Tree object to encode

274

275

Returns:

276

None (adds bipartition encoding to tree nodes)

277

"""

278

279

def update_bipartitions(tree, suppress_unifurcations=True, collapse_unrooted_basal_bifurcation=True):

280

"""Update bipartition encoding on tree."""

281

```

282

283

### Tree Shape and Balance Analysis

284

285

Functions for analyzing tree shape, balance, and other topological properties.

286

287

```python { .api }

288

def colless_tree_imbalance(tree, normalize="max"):

289

"""

290

Calculate Colless index of tree imbalance.

291

292

Parameters:

293

- tree: Tree object to analyze

294

- normalize: Normalization method ("max", "yule", or None)

295

296

Returns:

297

float: Colless imbalance index

298

"""

299

300

def sackin_index(tree, normalize=True):

301

"""

302

Calculate Sackin index of tree balance.

303

304

Parameters:

305

- tree: Tree object to analyze

306

- normalize: Whether to normalize by number of leaves

307

308

Returns:

309

float: Sackin balance index

310

"""

311

312

def b1_index(tree):

313

"""

314

Calculate B1 balance index.

315

316

Parameters:

317

- tree: Tree object to analyze

318

319

Returns:

320

float: B1 balance index

321

"""

322

323

def treeness(tree):

324

"""

325

Calculate treeness (proportion of internal edge length).

326

327

Parameters:

328

- tree: Tree object with branch lengths

329

330

Returns:

331

float: Treeness value (0-1)

332

"""

333

334

def resolution(tree):

335

"""

336

Calculate tree resolution (proportion of internal nodes that are bifurcating).

337

338

Parameters:

339

- tree: Tree object to analyze

340

341

Returns:

342

float: Resolution value (0-1)

343

"""

344

```

345

346

### Population Genetics Statistics

347

348

Statistical calculations relevant to population genetics and phylogeography.

349

350

```python { .api }

351

class PopulationPairSummaryStatistics:

352

"""

353

Summary statistics for pairs of populations.

354

355

Parameters:

356

- pop1_nodes: Nodes representing population 1

357

- pop2_nodes: Nodes representing population 2

358

"""

359

360

def __init__(self, pop1_nodes, pop2_nodes): ...

361

362

def fst(self):

363

"""Calculate Fst between populations."""

364

365

def average_number_of_pairwise_differences(self):

366

"""Calculate average pairwise differences."""

367

368

def average_number_of_pairwise_differences_between(self):

369

"""Calculate average differences between populations."""

370

371

def average_number_of_pairwise_differences_within(self):

372

"""Calculate average differences within populations."""

373

374

def num_segregating_sites(self):

375

"""Count segregating sites."""

376

377

def wattersons_theta(self):

378

"""Calculate Watterson's theta."""

379

380

def tajimas_d(self):

381

"""Calculate Tajima's D."""

382

```

383

384

### Statistical Functions

385

386

General statistical and probability functions used in phylogenetic analysis.

387

388

```python { .api }

389

def mean_and_sample_variance(values):

390

"""

391

Calculate mean and sample variance.

392

393

Parameters:

394

- values: Iterable of numeric values

395

396

Returns:

397

tuple: (mean, sample_variance)

398

"""

399

400

def mean_and_population_variance(values):

401

"""

402

Calculate mean and population variance.

403

404

Parameters:

405

- values: Iterable of numeric values

406

407

Returns:

408

tuple: (mean, population_variance)

409

"""

410

411

def mode(values):

412

"""

413

Find mode (most frequent value).

414

415

Parameters:

416

- values: Iterable of values

417

418

Returns:

419

Value that appears most frequently

420

"""

421

422

def median(values):

423

"""

424

Calculate median value.

425

426

Parameters:

427

- values: Iterable of numeric values

428

429

Returns:

430

float: Median value

431

"""

432

433

def quantile(sorted_values, q):

434

"""

435

Calculate quantile of sorted data.

436

437

Parameters:

438

- sorted_values: Sorted list of values

439

- q: Quantile to calculate (0.0-1.0)

440

441

Returns:

442

float: Quantile value

443

"""

444

445

def empirical_hpd(samples, level=0.95):

446

"""

447

Calculate empirical highest posterior density interval.

448

449

Parameters:

450

- samples: List of sample values

451

- level: Credibility level (default 0.95)

452

453

Returns:

454

tuple: (lower_bound, upper_bound)

455

"""

456

457

class FishersExactTest:

458

"""Fisher's exact test for contingency tables."""

459

460

def __init__(self, a, b, c, d): ...

461

def left_tail_p(self): ...

462

def right_tail_p(self): ...

463

def two_tail_p(self): ...

464

```