or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

clustering.mdcore-tree.mddata-tables.mdexternal-formats.mdindex.mdncbi-taxonomy.mdphylogenetic.mdsequences.mdvisualization.md

ncbi-taxonomy.mddocs/

0

# NCBI Taxonomy Integration

1

2

Comprehensive integration with the NCBI Taxonomy database for taxonomic annotation, lineage retrieval, species tree construction, and taxonomic analysis. ETE3 provides seamless access to taxonomic information and tree-based taxonomic operations.

3

4

## Capabilities

5

6

### NCBITaxa Class

7

8

Main interface for accessing and working with NCBI Taxonomy data.

9

10

```python { .api }

11

class NCBITaxa:

12

"""

13

Interface to NCBI Taxonomy database with local caching and tree integration.

14

"""

15

16

def __init__(self, dbfile=None, taxdump_file=None, update=True):

17

"""

18

Initialize NCBI Taxonomy database interface.

19

20

Parameters:

21

- dbfile (str): Path to local taxonomy database file

22

If None, uses default location (~/.etetoolkit/taxa.sqlite)

23

- taxdump_file (str): Path to custom taxdump file for database initialization

24

- update (bool): Whether to automatically update database if outdated

25

"""

26

```

27

28

### Database Management

29

30

Manage local taxonomy database and updates.

31

32

```python { .api }

33

def update_taxonomy_database(self):

34

"""

35

Update local NCBI taxonomy database with latest data.

36

Downloads and processes current NCBI taxonomy dump files.

37

"""

38

39

def get_topology(self, taxids, intermediate_nodes=False, rank_limit=None, annotate=True):

40

"""

41

Build taxonomic tree from list of taxonomic IDs.

42

43

Parameters:

44

- taxids (list): List of NCBI taxonomic IDs

45

- intermediate_nodes (bool): Include intermediate taxonomic nodes

46

- rank_limit (str): Limit tree to specific taxonomic rank

47

- annotate (bool): Annotate nodes with taxonomic information

48

49

Returns:

50

Tree: Taxonomic tree with specified taxa

51

"""

52

```

53

54

### Taxonomic ID Translation

55

56

Convert between taxonomic names and NCBI taxonomic IDs.

57

58

```python { .api }

59

def get_name_translator(self, names):

60

"""

61

Translate organism names to NCBI taxonomic IDs.

62

63

Parameters:

64

- names (list): List of organism names to translate

65

66

Returns:

67

dict: Mapping from names to taxonomic IDs

68

"""

69

70

def get_taxid_translator(self, taxids):

71

"""

72

Translate NCBI taxonomic IDs to organism names.

73

74

Parameters:

75

- taxids (list): List of taxonomic IDs to translate

76

77

Returns:

78

dict: Mapping from taxonomic IDs to names

79

"""

80

81

def translate_to_names(self, taxids):

82

"""

83

Convert taxonomic IDs to scientific names.

84

85

Parameters:

86

- taxids (list): List of taxonomic IDs

87

88

Returns:

89

list: List of corresponding scientific names

90

"""

91

92

def get_fuzzy_name_translation(self, names, sim=0.9):

93

"""

94

Fuzzy matching for organism names to taxonomic IDs.

95

96

Parameters:

97

- names (list): List of organism names (may contain typos/variations)

98

- sim (float): Similarity threshold (0.0-1.0)

99

100

Returns:

101

dict: Best matches mapping names to taxonomic IDs

102

"""

103

```

104

105

### Taxonomic Hierarchy and Lineages

106

107

Retrieve taxonomic classifications and hierarchical relationships.

108

109

```python { .api }

110

def get_lineage(self, taxid):

111

"""

112

Get complete taxonomic lineage for a taxonomic ID.

113

114

Parameters:

115

- taxid (int): NCBI taxonomic ID

116

117

Returns:

118

list: List of taxonomic IDs from root to target taxon

119

"""

120

121

def get_rank(self, taxids):

122

"""

123

Get taxonomic ranks for taxonomic IDs.

124

125

Parameters:

126

- taxids (list): List of taxonomic IDs

127

128

Returns:

129

dict: Mapping from taxonomic IDs to their ranks

130

"""

131

132

def get_common_names(self, taxids):

133

"""

134

Get common names for taxonomic IDs.

135

136

Parameters:

137

- taxids (list): List of taxonomic IDs

138

139

Returns:

140

dict: Mapping from taxonomic IDs to common names

141

"""

142

143

def get_descendant_taxa(self, parent, collapse_subspecies=False, rank_limit=None):

144

"""

145

Get all descendant taxa for a parent taxonomic ID.

146

147

Parameters:

148

- parent (int): Parent taxonomic ID

149

- collapse_subspecies (bool): Exclude subspecies level taxa

150

- rank_limit (str): Only include taxa at or above specified rank

151

152

Returns:

153

list: List of descendant taxonomic IDs

154

"""

155

```

156

157

### Tree Annotation

158

159

Annotate phylogenetic trees with taxonomic information.

160

161

```python { .api }

162

def annotate_tree(self, tree, taxid_attr="species", tax2name=None, tax2track=None):

163

"""

164

Annotate tree nodes with taxonomic information.

165

166

Parameters:

167

- tree (Tree): Tree to annotate

168

- taxid_attr (str): Node attribute containing taxonomic information

169

- tax2name (dict): Custom mapping from taxids to names

170

- tax2track (dict): Additional attributes to track

171

172

Returns:

173

Tree: Annotated tree with taxonomic data

174

"""

175

```

176

177

## Taxonomic Analysis Functions

178

179

### Species Tree Construction

180

181

```python { .api }

182

def get_broken_branches(self, tree, species_attr="species"):

183

"""

184

Identify branches that break species monophyly.

185

186

Parameters:

187

- tree (Tree): Input phylogenetic tree

188

- species_attr (str): Node attribute containing species information

189

190

Returns:

191

list: List of branches breaking monophyly

192

"""

193

194

def annotate_tree_with_taxa(self, tree, taxid_attr="name", tax2name=None):

195

"""

196

Add taxonomic annotations to all tree nodes.

197

198

Parameters:

199

- tree (Tree): Tree to annotate

200

- taxid_attr (str): Attribute containing taxonomic identifiers

201

- tax2name (dict): Custom taxonomic ID to name mapping

202

203

Returns:

204

Tree: Tree with taxonomic annotations added

205

"""

206

```

207

208

## Usage Examples

209

210

### Basic Taxonomy Operations

211

212

```python

213

from ete3 import NCBITaxa

214

215

# Initialize NCBI taxonomy

216

ncbi = NCBITaxa()

217

218

# Translate names to taxonomic IDs

219

name2taxid = ncbi.get_name_translator(['Homo sapiens', 'Pan troglodytes', 'Gorilla gorilla'])

220

print(f"Human taxid: {name2taxid['Homo sapiens']}")

221

222

# Translate taxonomic IDs to names

223

taxid2name = ncbi.get_taxid_translator([9606, 9598, 9593])

224

print(f"Taxid 9606: {taxid2name[9606]}")

225

226

# Get taxonomic lineage

227

lineage = ncbi.get_lineage(9606) # Human

228

print(f"Human lineage: {lineage}")

229

230

# Get ranks for lineage

231

ranks = ncbi.get_rank(lineage)

232

for taxid in lineage:

233

print(f"{taxid}: {ranks[taxid]}")

234

```

235

236

### Building Taxonomic Trees

237

238

```python

239

from ete3 import NCBITaxa

240

241

ncbi = NCBITaxa()

242

243

# Create taxonomic tree from species list

244

species_names = ['Homo sapiens', 'Pan troglodytes', 'Gorilla gorilla', 'Macaca mulatta']

245

name2taxid = ncbi.get_name_translator(species_names)

246

taxids = [name2taxid[name] for name in species_names]

247

248

# Build taxonomic tree

249

tree = ncbi.get_topology(taxids)

250

print(tree.get_ascii())

251

252

# Include intermediate nodes for complete taxonomy

253

full_tree = ncbi.get_topology(taxids, intermediate_nodes=True)

254

print(full_tree.get_ascii())

255

```

256

257

### Tree Annotation

258

259

```python

260

from ete3 import PhyloTree, NCBITaxa

261

262

# Create phylogenetic tree

263

tree = PhyloTree("(9606:1,(9598:0.5,9593:0.5):0.5);") # Using taxids as names

264

265

# Initialize NCBI taxonomy

266

ncbi = NCBITaxa()

267

268

# Annotate tree with taxonomic information

269

annotated_tree = ncbi.annotate_tree(tree, taxid_attr="name")

270

271

# Access taxonomic information

272

for node in annotated_tree.traverse():

273

if hasattr(node, 'sci_name'):

274

print(f"Node {node.name}: {node.sci_name} ({node.rank})")

275

```

276

277

### Fuzzy Name Matching

278

279

```python

280

from ete3 import NCBITaxa

281

282

ncbi = NCBITaxa()

283

284

# Handle names with potential typos or variations

285

fuzzy_names = ['Homo sapian', 'chimpanzee', 'gorill']

286

matches = ncbi.get_fuzzy_name_translation(fuzzy_names, sim=0.8)

287

288

for name, taxid in matches.items():

289

correct_name = ncbi.translate_to_names([taxid])[0]

290

print(f"'{name}' -> {taxid} ({correct_name})")

291

```

292

293

### Advanced Taxonomic Analysis

294

295

```python

296

from ete3 import NCBITaxa, PhyloTree

297

298

ncbi = NCBITaxa()

299

300

# Get all primates

301

primate_taxid = ncbi.get_name_translator(['Primates'])['Primates']

302

primate_descendants = ncbi.get_descendant_taxa(primate_taxid, rank_limit='species')

303

304

# Create comprehensive primate tree

305

primate_tree = ncbi.get_topology(primate_descendants[:50]) # Limit for example

306

307

# Analyze taxonomic ranks

308

ranks = ncbi.get_rank(primate_descendants[:20])

309

rank_counts = {}

310

for taxid, rank in ranks.items():

311

rank_counts[rank] = rank_counts.get(rank, 0) + 1

312

313

print(f"Taxonomic rank distribution: {rank_counts}")

314

```

315

316

### Database Updates and Management

317

318

```python

319

from ete3 import NCBITaxa

320

321

# Update local taxonomy database (run periodically)

322

ncbi = NCBITaxa()

323

# ncbi.update_taxonomy_database() # Uncomment to actually update

324

325

# Use custom database file

326

ncbi_custom = NCBITaxa(dbfile="/path/to/custom/taxa.sqlite")

327

328

# Check database version/status

329

# Access internal database methods if needed for maintenance

330

```

331

332

### Integration with Phylogenetic Analysis

333

334

```python

335

from ete3 import PhyloTree, NCBITaxa

336

337

# Gene tree with species information

338

gene_tree = PhyloTree("(human_gene1:0.1,(chimp_gene1:0.05,gorilla_gene1:0.05):0.02);")

339

340

# Set up species naming

341

gene_tree.set_species_naming_function(lambda x: x.split('_')[0])

342

343

# Get NCBI taxonomy for comparison

344

ncbi = NCBITaxa()

345

species_names = ['human', 'chimp', 'gorilla']

346

name_mapping = {'human': 'Homo sapiens', 'chimp': 'Pan troglodytes', 'gorilla': 'Gorilla gorilla'}

347

full_names = [name_mapping[sp] for sp in species_names]

348

taxids = [ncbi.get_name_translator([name])[name] for name in full_names]

349

350

# Create species tree from NCBI

351

species_tree = ncbi.get_topology(taxids)

352

353

# Compare gene tree topology with species tree

354

# (This would involve reconciliation analysis)

355

print("Gene tree topology:")

356

print(gene_tree.get_ascii())

357

print("Species tree topology:")

358

print(species_tree.get_ascii())

359

```