Tessl Tile for pypi/gfftk@25.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cli-commands.md comparison.md consensus.md format-conversion.md genbank-tbl.md gff-processing.md index.md sequence-operations.md utilities.md

genbank-tbl.mddocs/

0
# GenBank and TBL Format Handling
1

2
Comprehensive support for NCBI GenBank and TBL annotation formats including bidirectional conversion, validation, and integration with NCBI table2asn for GenBank record generation. These functions provide the core functionality for working with NCBI-compliant annotation files.
3

4
## Capabilities
5

6
### TBL Format Parsing
7

8
Parse NCBI TBL annotation files into the gfftk annotation dictionary format with support for multiple transcript isoforms and complex gene structures.
9

10
```python { .api }
11
def tbl2dict(inputfile, fasta, annotation=False, table=1, debug=False):
12
    """
13
    Convert NCBI TBL format to annotation dictionary.
14

15
    Parses NCBI TBL files which contain gene models in tab-delimited format
16
    used by GenBank submission. Handles multiple transcript isoforms per gene,
17
    partial features, and all annotation qualifiers.
18

19
    Parameters:
20
    - inputfile (str|io.BytesIO): Path to TBL file or file-like object
21
    - fasta (str): Path to corresponding genome FASTA file
22
    - annotation (dict|bool): Existing annotation dictionary to update, or False
23
    - table (int): Genetic code table (1=standard, 11=bacterial)
24
    - debug (bool): Enable debug output
25

26
    Returns:
27
    dict: Annotation dictionary with gene models
28
    """
29
```
30

31
### TBL Format Writing
32

33
Convert annotation dictionary to NCBI TBL format with proper formatting and validation for GenBank submission compatibility.
34

35
```python { .api }
36
def dict2tbl(annots, seqs, outfile, table=1, debug=False):
37
    """
38
    Convert annotation dictionary to NCBI TBL format.
39

40
    Writes annotations in NCBI TBL format suitable for GenBank submission
41
    via table2asn. Handles complex gene structures, multiple isoforms,
42
    and all annotation qualifiers with proper formatting.
43

44
    Parameters:
45
    - annots (dict): Annotation dictionary
46
    - seqs (dict): Sequence dictionary from FASTA
47
    - outfile (str): Output TBL file path
48
    - table (int): Genetic code table (1=standard, 11=bacterial)
49
    - debug (bool): Enable debug output
50

51
    Returns:
52
    None
53
    """
54
```
55

56
### GenBank Format Generation
57

58
Generate GenBank format files directly from annotation dictionary with organism metadata and formatting options.
59

60
```python { .api }
61
def dict2gbff(annots, seqs, outfile, organism=None, circular=False, lowercase=False):
62
    """
63
    Convert annotation dictionary to GenBank format.
64

65
    Generates GenBank flat file format (.gbff) with complete annotation
66
    information, sequence data, and proper GenBank formatting. Includes
67
    organism metadata and circular DNA support.
68

69
    Parameters:
70
    - annots (dict): Annotation dictionary
71
    - seqs (dict): Sequence dictionary from FASTA
72
    - outfile (str): Output GenBank file path
73
    - organism (str|None): Organism name for ORGANISM field
74
    - circular (bool): Mark sequences as circular DNA
75
    - lowercase (bool): Output sequence in lowercase
76

77
    Returns:
78
    None
79
    """
80
```
81

82
### NCBI table2asn Integration
83

84
Interface with NCBI's table2asn tool for generating GenBank submission files from TBL and FASTA inputs.
85

86
```python { .api }
87
def table2asn(sbt, tbl, fasta, out, organism, strain, table=1):
88
    """
89
    Run NCBI table2asn to generate GenBank files.
90

91
    Executes NCBI table2asn tool to convert TBL annotation files and
92
    FASTA sequences into GenBank submission format. Requires table2asn
93
    to be installed and available in PATH.
94

95
    Parameters:
96
    - sbt (str): Path to submission template (.sbt) file
97
    - tbl (str): Path to TBL annotation file
98
    - fasta (str): Path to genome FASTA file
99
    - out (str): Output directory path
100
    - organism (str): Organism name
101
    - strain (str): Strain identifier
102
    - table (int): Genetic code table (1=standard, 11=bacterial)
103

104
    Returns:
105
    None
106
    """
107
```
108

109
### Submission Template Generation
110

111
Generate NCBI submission template files required for table2asn processing.
112

113
```python { .api }
114
def sbt_writer(out):
115
    """
116
    Generate NCBI submission template (.sbt) file.
117

118
    Creates a basic submission template file required by table2asn
119
    for GenBank submission processing. Template contains minimal
120
    required metadata fields.
121

122
    Parameters:
123
    - out (str): Output path for .sbt file
124

125
    Returns:
126
    None
127
    """
128
```
129

130
### Coordinate Manipulation
131

132
Utilities for working with genomic coordinates in TBL format annotations.
133

134
```python { .api }
135
def fetch_coords(v, i=0, feature="gene"):
136
    """
137
    Extract genomic coordinates from annotation data.
138

139
    Parses coordinate information from various annotation formats
140
    and returns standardized coordinate tuples. Handles partial
141
    features and strand information.
142

143
    Parameters:
144
    - v (list): Coordinate data structure
145
    - i (int): Index for transcript/feature selection
146
    - feature (str): Feature type ("gene", "mRNA", "CDS")
147

148
    Returns:
149
    tuple: (start, end) coordinates
150
    """
151

152
def duplicate_coords(cds):
153
    """
154
    Identify duplicate CDS coordinates.
155

156
    Scans CDS coordinate lists to identify duplicate exons
157
    or coordinate ranges that may indicate annotation errors
158
    or alternative splicing variants.
159

160
    Parameters:
161
    - cds (list): List of CDS coordinate tuples
162

163
    Returns:
164
    list: Indices of duplicate coordinate sets
165
    """
166

167
def drop_alt_coords(info, idxs):
168
    """
169
    Remove alternative coordinate sets from annotation.
170

171
    Removes specified coordinate sets from annotation data
172
    structure, typically used to clean up alternative
173
    splicing variants or duplicate annotations.
174

175
    Parameters:
176
    - info (dict): Annotation information dictionary
177
    - idxs (list): Indices of coordinate sets to remove
178

179
    Returns:
180
    dict: Updated annotation dictionary
181
    """
182
```
183

184
### UTR Processing
185

186
Specialized functions for UTR (Untranslated Region) identification and processing.
187

188
```python { .api }
189
def findUTRs(cds, mrna, strand):
190
    """
191
    Identify UTR regions from CDS and mRNA coordinates.
192

193
    Calculates 5' and 3' UTR regions by comparing CDS coordinates
194
    with mRNA boundaries. Handles strand orientation and returns
195
    coordinate tuples for UTR regions.
196

197
    Parameters:
198
    - cds (list): List of CDS coordinate tuples
199
    - mrna (list): List of mRNA coordinate tuples
200
    - strand (str): Strand orientation ("+"/"-")
201

202
    Returns:
203
    tuple: (five_utr_coords, three_utr_coords) as coordinate lists
204
    """
205
```
206

207
### GO Term Processing
208

209
Handle Gene Ontology term formatting for GenBank submissions.
210

211
```python { .api }
212
def reformatGO(term, goDict={}):
213
    """
214
    Reformat GO terms for GenBank submission.
215

216
    Converts GO terms to proper format for GenBank annotation
217
    files, handling term descriptions and maintaining consistency
218
    with NCBI requirements.
219

220
    Parameters:
221
    - term (str): GO term identifier (e.g., "GO:0008150")
222
    - goDict (dict): GO term dictionary for lookups
223

224
    Returns:
225
    str: Reformatted GO term description
226
    """
227
```
228

229
## Usage Examples
230

231
### Converting TBL to Annotation Dictionary
232

233
```python
234
from gfftk.genbank import tbl2dict
235
from gfftk.fasta import fasta2dict
236

237
# Load sequences and parse TBL file
238
sequences = fasta2dict("genome.fasta")
239
annotations = tbl2dict("annotation.tbl", "genome.fasta")
240

241
# Access parsed data
242
for gene_id, gene_data in annotations.items():
243
    print(f"Gene: {gene_id}")
244
    print(f"Products: {gene_data['product']}")
245
    print(f"Location: {gene_data['location']}")
246
```
247

248
### Generating GenBank Files
249

250
```python
251
from gfftk.genbank import dict2gbff, dict2tbl
252
from gfftk.fasta import fasta2dict
253
from gfftk.gff import gff2dict
254

255
# Parse GFF3 annotation
256
sequences = fasta2dict("genome.fasta")
257
annotations = gff2dict("annotation.gff3", "genome.fasta")
258

259
# Generate GenBank format
260
dict2gbff(
261
    annotations,
262
    sequences,
263
    "output.gbff",
264
    organism="Escherichia coli",
265
    circular=True
266
)
267

268
# Generate TBL format for NCBI submission
269
dict2tbl(annotations, sequences, "annotation.tbl")
270
```
271

272
### NCBI Submission Workflow
273

274
```python
275
from gfftk.genbank import sbt_writer, table2asn, dict2tbl
276
from gfftk.fasta import fasta2dict
277
from gfftk.gff import gff2dict
278

279
# Prepare annotation data
280
sequences = fasta2dict("genome.fasta")
281
annotations = gff2dict("annotation.gff3", "genome.fasta")
282

283
# Generate TBL file
284
dict2tbl(annotations, sequences, "submission.tbl")
285

286
# Create submission template
287
sbt_writer("template.sbt")
288

289
# Run table2asn (requires table2asn installation)
290
table2asn(
291
    "template.sbt",
292
    "submission.tbl",
293
    "genome.fasta",
294
    "output_dir",
295
    "Escherichia coli",
296
    "K-12",
297
    table=11  # Bacterial genetic code
298
)
299
```

Version

Tile

Files

genbank-tbl.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

genbank-tbl.mddocs/