or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli-commands.mdcomparison.mdconsensus.mdformat-conversion.mdgenbank-tbl.mdgff-processing.mdindex.mdsequence-operations.mdutilities.md

genbank-tbl.mddocs/

0

# GenBank and TBL Format Handling

1

2

Comprehensive support for NCBI GenBank and TBL annotation formats including bidirectional conversion, validation, and integration with NCBI table2asn for GenBank record generation. These functions provide the core functionality for working with NCBI-compliant annotation files.

3

4

## Capabilities

5

6

### TBL Format Parsing

7

8

Parse NCBI TBL annotation files into the gfftk annotation dictionary format with support for multiple transcript isoforms and complex gene structures.

9

10

```python { .api }

11

def tbl2dict(inputfile, fasta, annotation=False, table=1, debug=False):

12

"""

13

Convert NCBI TBL format to annotation dictionary.

14

15

Parses NCBI TBL files which contain gene models in tab-delimited format

16

used by GenBank submission. Handles multiple transcript isoforms per gene,

17

partial features, and all annotation qualifiers.

18

19

Parameters:

20

- inputfile (str|io.BytesIO): Path to TBL file or file-like object

21

- fasta (str): Path to corresponding genome FASTA file

22

- annotation (dict|bool): Existing annotation dictionary to update, or False

23

- table (int): Genetic code table (1=standard, 11=bacterial)

24

- debug (bool): Enable debug output

25

26

Returns:

27

dict: Annotation dictionary with gene models

28

"""

29

```

30

31

### TBL Format Writing

32

33

Convert annotation dictionary to NCBI TBL format with proper formatting and validation for GenBank submission compatibility.

34

35

```python { .api }

36

def dict2tbl(annots, seqs, outfile, table=1, debug=False):

37

"""

38

Convert annotation dictionary to NCBI TBL format.

39

40

Writes annotations in NCBI TBL format suitable for GenBank submission

41

via table2asn. Handles complex gene structures, multiple isoforms,

42

and all annotation qualifiers with proper formatting.

43

44

Parameters:

45

- annots (dict): Annotation dictionary

46

- seqs (dict): Sequence dictionary from FASTA

47

- outfile (str): Output TBL file path

48

- table (int): Genetic code table (1=standard, 11=bacterial)

49

- debug (bool): Enable debug output

50

51

Returns:

52

None

53

"""

54

```

55

56

### GenBank Format Generation

57

58

Generate GenBank format files directly from annotation dictionary with organism metadata and formatting options.

59

60

```python { .api }

61

def dict2gbff(annots, seqs, outfile, organism=None, circular=False, lowercase=False):

62

"""

63

Convert annotation dictionary to GenBank format.

64

65

Generates GenBank flat file format (.gbff) with complete annotation

66

information, sequence data, and proper GenBank formatting. Includes

67

organism metadata and circular DNA support.

68

69

Parameters:

70

- annots (dict): Annotation dictionary

71

- seqs (dict): Sequence dictionary from FASTA

72

- outfile (str): Output GenBank file path

73

- organism (str|None): Organism name for ORGANISM field

74

- circular (bool): Mark sequences as circular DNA

75

- lowercase (bool): Output sequence in lowercase

76

77

Returns:

78

None

79

"""

80

```

81

82

### NCBI table2asn Integration

83

84

Interface with NCBI's table2asn tool for generating GenBank submission files from TBL and FASTA inputs.

85

86

```python { .api }

87

def table2asn(sbt, tbl, fasta, out, organism, strain, table=1):

88

"""

89

Run NCBI table2asn to generate GenBank files.

90

91

Executes NCBI table2asn tool to convert TBL annotation files and

92

FASTA sequences into GenBank submission format. Requires table2asn

93

to be installed and available in PATH.

94

95

Parameters:

96

- sbt (str): Path to submission template (.sbt) file

97

- tbl (str): Path to TBL annotation file

98

- fasta (str): Path to genome FASTA file

99

- out (str): Output directory path

100

- organism (str): Organism name

101

- strain (str): Strain identifier

102

- table (int): Genetic code table (1=standard, 11=bacterial)

103

104

Returns:

105

None

106

"""

107

```

108

109

### Submission Template Generation

110

111

Generate NCBI submission template files required for table2asn processing.

112

113

```python { .api }

114

def sbt_writer(out):

115

"""

116

Generate NCBI submission template (.sbt) file.

117

118

Creates a basic submission template file required by table2asn

119

for GenBank submission processing. Template contains minimal

120

required metadata fields.

121

122

Parameters:

123

- out (str): Output path for .sbt file

124

125

Returns:

126

None

127

"""

128

```

129

130

### Coordinate Manipulation

131

132

Utilities for working with genomic coordinates in TBL format annotations.

133

134

```python { .api }

135

def fetch_coords(v, i=0, feature="gene"):

136

"""

137

Extract genomic coordinates from annotation data.

138

139

Parses coordinate information from various annotation formats

140

and returns standardized coordinate tuples. Handles partial

141

features and strand information.

142

143

Parameters:

144

- v (list): Coordinate data structure

145

- i (int): Index for transcript/feature selection

146

- feature (str): Feature type ("gene", "mRNA", "CDS")

147

148

Returns:

149

tuple: (start, end) coordinates

150

"""

151

152

def duplicate_coords(cds):

153

"""

154

Identify duplicate CDS coordinates.

155

156

Scans CDS coordinate lists to identify duplicate exons

157

or coordinate ranges that may indicate annotation errors

158

or alternative splicing variants.

159

160

Parameters:

161

- cds (list): List of CDS coordinate tuples

162

163

Returns:

164

list: Indices of duplicate coordinate sets

165

"""

166

167

def drop_alt_coords(info, idxs):

168

"""

169

Remove alternative coordinate sets from annotation.

170

171

Removes specified coordinate sets from annotation data

172

structure, typically used to clean up alternative

173

splicing variants or duplicate annotations.

174

175

Parameters:

176

- info (dict): Annotation information dictionary

177

- idxs (list): Indices of coordinate sets to remove

178

179

Returns:

180

dict: Updated annotation dictionary

181

"""

182

```

183

184

### UTR Processing

185

186

Specialized functions for UTR (Untranslated Region) identification and processing.

187

188

```python { .api }

189

def findUTRs(cds, mrna, strand):

190

"""

191

Identify UTR regions from CDS and mRNA coordinates.

192

193

Calculates 5' and 3' UTR regions by comparing CDS coordinates

194

with mRNA boundaries. Handles strand orientation and returns

195

coordinate tuples for UTR regions.

196

197

Parameters:

198

- cds (list): List of CDS coordinate tuples

199

- mrna (list): List of mRNA coordinate tuples

200

- strand (str): Strand orientation ("+"/"-")

201

202

Returns:

203

tuple: (five_utr_coords, three_utr_coords) as coordinate lists

204

"""

205

```

206

207

### GO Term Processing

208

209

Handle Gene Ontology term formatting for GenBank submissions.

210

211

```python { .api }

212

def reformatGO(term, goDict={}):

213

"""

214

Reformat GO terms for GenBank submission.

215

216

Converts GO terms to proper format for GenBank annotation

217

files, handling term descriptions and maintaining consistency

218

with NCBI requirements.

219

220

Parameters:

221

- term (str): GO term identifier (e.g., "GO:0008150")

222

- goDict (dict): GO term dictionary for lookups

223

224

Returns:

225

str: Reformatted GO term description

226

"""

227

```

228

229

## Usage Examples

230

231

### Converting TBL to Annotation Dictionary

232

233

```python

234

from gfftk.genbank import tbl2dict

235

from gfftk.fasta import fasta2dict

236

237

# Load sequences and parse TBL file

238

sequences = fasta2dict("genome.fasta")

239

annotations = tbl2dict("annotation.tbl", "genome.fasta")

240

241

# Access parsed data

242

for gene_id, gene_data in annotations.items():

243

print(f"Gene: {gene_id}")

244

print(f"Products: {gene_data['product']}")

245

print(f"Location: {gene_data['location']}")

246

```

247

248

### Generating GenBank Files

249

250

```python

251

from gfftk.genbank import dict2gbff, dict2tbl

252

from gfftk.fasta import fasta2dict

253

from gfftk.gff import gff2dict

254

255

# Parse GFF3 annotation

256

sequences = fasta2dict("genome.fasta")

257

annotations = gff2dict("annotation.gff3", "genome.fasta")

258

259

# Generate GenBank format

260

dict2gbff(

261

annotations,

262

sequences,

263

"output.gbff",

264

organism="Escherichia coli",

265

circular=True

266

)

267

268

# Generate TBL format for NCBI submission

269

dict2tbl(annotations, sequences, "annotation.tbl")

270

```

271

272

### NCBI Submission Workflow

273

274

```python

275

from gfftk.genbank import sbt_writer, table2asn, dict2tbl

276

from gfftk.fasta import fasta2dict

277

from gfftk.gff import gff2dict

278

279

# Prepare annotation data

280

sequences = fasta2dict("genome.fasta")

281

annotations = gff2dict("annotation.gff3", "genome.fasta")

282

283

# Generate TBL file

284

dict2tbl(annotations, sequences, "submission.tbl")

285

286

# Create submission template

287

sbt_writer("template.sbt")

288

289

# Run table2asn (requires table2asn installation)

290

table2asn(

291

"template.sbt",

292

"submission.tbl",

293

"genome.fasta",

294

"output_dir",

295

"Escherichia coli",

296

"K-12",

297

table=11 # Bacterial genetic code

298

)

299

```