Comprehensive Python toolkit for working with genome annotation files in GFF3, GTF, and TBL formats with format conversion and analysis capabilities
Complete command-line interface for GFFtk providing direct access to all conversion, analysis, and manipulation functions through simple CLI commands. Each command corresponds to a Python function that can also be called programmatically.
Main command-line interface for format conversion operations.
def convert(args):
"""
Command-line interface for format conversion operations.
Provides CLI access to all format conversion functions including
GFF3, GTF, TBL, GenBank, and protein/transcript extraction with
flexible filtering and output options.
Parameters:
- args (argparse.Namespace): Parsed command-line arguments containing:
- input: Input file path
- fasta: Genome FASTA file path
- output: Output file path
- format: Output format (gff3, gtf, tbl, genbank, proteins, etc.)
- table: Genetic code table
- grep: Filter patterns to include
- grepv: Filter patterns to exclude
- debug: Enable debug output
Returns:
None
"""Command-line interface for consensus gene prediction.
def consensus(args):
"""
Command-line interface for EvidenceModeler-like consensus prediction.
Combines multiple gene prediction sources with protein and transcript
evidence to generate high-quality consensus gene models using
configurable weights and validation criteria.
Parameters:
- args (argparse.Namespace): Parsed command-line arguments containing:
- fasta: Genome FASTA file path
- genes: List of gene prediction file paths
- proteins: List of protein alignment file paths
- transcripts: List of transcript alignment file paths
- weights: Source weight configuration file
- output: Output consensus GFF3 file
- minscore: Minimum score threshold
- repeats: Repeat annotation file for filtering
- debug: Enable debug output
Returns:
None
"""Command-line interface for comparing two genome annotations.
def compare(args):
"""
Command-line interface for annotation comparison analysis.
Compares two genome annotations to identify differences, calculate
similarity metrics, and generate detailed comparison reports with
feature-level analysis and statistics.
Parameters:
- args (argparse.Namespace): Parsed command-line arguments containing:
- old: Path to reference annotation file
- new: Path to query annotation file
- fasta: Genome FASTA file path
- output: Output comparison report path
- debug: Enable debug output
Returns:
None
"""Command-line interface for calculating annotation statistics.
def stats(args):
"""
Command-line interface for annotation statistics calculation.
Calculates comprehensive statistics for genome annotations including
gene counts, feature distributions, sequence lengths, and quality
metrics with detailed reporting options.
Parameters:
- args (argparse.Namespace): Parsed command-line arguments containing:
- input: Input annotation file path
- fasta: Genome FASTA file path (optional)
- output: Output statistics file path
- format: Output format (text, json, csv)
Returns:
None
"""Command-line interface for sorting GFF3 files by genomic coordinates.
def sort(args):
"""
Command-line interface for GFF3 file sorting.
Sorts GFF3 files by genomic coordinates ensuring proper feature
ordering and parent-child relationships are maintained.
Parameters:
- args (argparse.Namespace): Parsed command-line arguments containing:
- input: Input GFF3 file path
- output: Output sorted GFF3 file path
Returns:
None
"""
def sortGFF3(input, output):
"""
Sort GFF3 file by genomic coordinates.
Parameters:
- input (str): Input GFF3 file path
- output (str): Output sorted GFF3 file path
Returns:
None
"""Command-line interface for cleaning and validating GFF3 files.
def sanitize(args):
"""
Command-line interface for GFF3 file sanitization.
Cleans and validates GFF3 files by fixing common format issues,
removing invalid features, and ensuring compliance with GFF3
specification requirements.
Parameters:
- args (argparse.Namespace): Parsed command-line arguments containing:
- input: Input GFF3 file path
- output: Output sanitized GFF3 file path
- strict: Enable strict validation mode
Returns:
None
"""Command-line interface for systematic renaming of annotation features.
def rename(args):
"""
Command-line interface for feature renaming operations.
Systematically renames annotation features using configurable
patterns and rules to ensure consistent naming conventions
across annotation files.
Parameters:
- args (argparse.Namespace): Parsed command-line arguments containing:
- input: Input annotation file path
- output: Output renamed annotation file path
- pattern: Renaming pattern specification
- prefix: Prefix for new feature names
Returns:
None
"""# Convert GFF3 to GTF format
gfftk convert -i annotation.gff3 -f genome.fasta -o output.gtf
# Extract protein sequences
gfftk convert -i annotation.gff3 -f genome.fasta -o proteins.faa --output-format proteins
# Convert with filtering
gfftk convert -i annotation.gff3 -f genome.fasta -o filtered.gff3 --grep product:kinase# Basic consensus prediction
gfftk consensus -f genome.fasta -g augustus.gff3 genemark.gff3 -p proteins.gff3 -o consensus.gff3
# With custom weights and repeat filtering
gfftk consensus -f genome.fasta -g augustus.gff3 genemark.gff3 \
-p proteins.gff3 -t transcripts.gff3 \
-w weights.txt --repeats repeats.bed -o consensus.gff3# Compare two annotations
gfftk compare --old reference.gff3 --new updated.gff3 -f genome.fasta -o comparison.txt
# Calculate statistics
gfftk stats -i annotation.gff3 -f genome.fasta -o stats.txt
# Sort GFF3 file
gfftk sort -i unsorted.gff3 -o sorted.gff3# Sanitize GFF3 file
gfftk sanitize -i messy.gff3 -o clean.gff3
# Rename features systematically
gfftk rename -i annotation.gff3 -o renamed.gff3 --prefix GENEAll CLI commands can be accessed programmatically by importing the corresponding functions:
import argparse
from gfftk.convert import convert
from gfftk.consensus import consensus
from gfftk.compare import compare
from gfftk.stats import stats
from gfftk.sort import sort
from gfftk.sanitize import sanitize
from gfftk.rename import rename
# Create argument namespace (equivalent to CLI args)
args = argparse.Namespace(
input='annotation.gff3',
fasta='genome.fasta',
output='output.gtf',
format='gtf',
table=1,
debug=False
)
# Call conversion function
convert(args)| Command | Function | Description |
|---|---|---|
gfftk convert | convert() | Format conversion and sequence extraction |
gfftk consensus | consensus() | EvidenceModeler-like consensus prediction |
gfftk compare | compare() | Annotation comparison and analysis |
gfftk stats | stats() | Annotation statistics calculation |
gfftk sort | sort() | GFF3 coordinate-based sorting |
gfftk sanitize | sanitize() | GFF3 validation and cleaning |
gfftk rename | rename() | Systematic feature renaming |
Each command provides comprehensive help via gfftk <command> --help with detailed parameter descriptions and usage examples.
Install with Tessl CLI
npx tessl i tessl/pypi-gfftk