CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-gfftk

Comprehensive Python toolkit for working with genome annotation files in GFF3, GTF, and TBL formats with format conversion and analysis capabilities

Overview
Eval results
Files

cli-commands.mddocs/

Command Line Interface

Complete command-line interface for GFFtk providing direct access to all conversion, analysis, and manipulation functions through simple CLI commands. Each command corresponds to a Python function that can also be called programmatically.

Capabilities

Format Conversion CLI

Main command-line interface for format conversion operations.

def convert(args):
    """
    Command-line interface for format conversion operations.

    Provides CLI access to all format conversion functions including
    GFF3, GTF, TBL, GenBank, and protein/transcript extraction with
    flexible filtering and output options.

    Parameters:
    - args (argparse.Namespace): Parsed command-line arguments containing:
        - input: Input file path
        - fasta: Genome FASTA file path
        - output: Output file path
        - format: Output format (gff3, gtf, tbl, genbank, proteins, etc.)
        - table: Genetic code table
        - grep: Filter patterns to include
        - grepv: Filter patterns to exclude
        - debug: Enable debug output

    Returns:
    None
    """

Consensus Prediction CLI

Command-line interface for consensus gene prediction.

def consensus(args):
    """
    Command-line interface for EvidenceModeler-like consensus prediction.

    Combines multiple gene prediction sources with protein and transcript
    evidence to generate high-quality consensus gene models using
    configurable weights and validation criteria.

    Parameters:
    - args (argparse.Namespace): Parsed command-line arguments containing:
        - fasta: Genome FASTA file path
        - genes: List of gene prediction file paths
        - proteins: List of protein alignment file paths
        - transcripts: List of transcript alignment file paths
        - weights: Source weight configuration file
        - output: Output consensus GFF3 file
        - minscore: Minimum score threshold
        - repeats: Repeat annotation file for filtering
        - debug: Enable debug output

    Returns:
    None
    """

Annotation Comparison CLI

Command-line interface for comparing two genome annotations.

def compare(args):
    """
    Command-line interface for annotation comparison analysis.

    Compares two genome annotations to identify differences, calculate
    similarity metrics, and generate detailed comparison reports with
    feature-level analysis and statistics.

    Parameters:
    - args (argparse.Namespace): Parsed command-line arguments containing:
        - old: Path to reference annotation file
        - new: Path to query annotation file
        - fasta: Genome FASTA file path
        - output: Output comparison report path
        - debug: Enable debug output

    Returns:
    None
    """

Annotation Statistics CLI

Command-line interface for calculating annotation statistics.

def stats(args):
    """
    Command-line interface for annotation statistics calculation.

    Calculates comprehensive statistics for genome annotations including
    gene counts, feature distributions, sequence lengths, and quality
    metrics with detailed reporting options.

    Parameters:
    - args (argparse.Namespace): Parsed command-line arguments containing:
        - input: Input annotation file path
        - fasta: Genome FASTA file path (optional)
        - output: Output statistics file path
        - format: Output format (text, json, csv)

    Returns:
    None
    """

GFF3 File Sorting CLI

Command-line interface for sorting GFF3 files by genomic coordinates.

def sort(args):
    """
    Command-line interface for GFF3 file sorting.

    Sorts GFF3 files by genomic coordinates ensuring proper feature
    ordering and parent-child relationships are maintained.

    Parameters:
    - args (argparse.Namespace): Parsed command-line arguments containing:
        - input: Input GFF3 file path
        - output: Output sorted GFF3 file path

    Returns:
    None
    """

def sortGFF3(input, output):
    """
    Sort GFF3 file by genomic coordinates.

    Parameters:
    - input (str): Input GFF3 file path
    - output (str): Output sorted GFF3 file path

    Returns:
    None
    """

GFF3 Sanitization CLI

Command-line interface for cleaning and validating GFF3 files.

def sanitize(args):
    """
    Command-line interface for GFF3 file sanitization.

    Cleans and validates GFF3 files by fixing common format issues,
    removing invalid features, and ensuring compliance with GFF3
    specification requirements.

    Parameters:
    - args (argparse.Namespace): Parsed command-line arguments containing:
        - input: Input GFF3 file path
        - output: Output sanitized GFF3 file path
        - strict: Enable strict validation mode

    Returns:
    None
    """

Feature Renaming CLI

Command-line interface for systematic renaming of annotation features.

def rename(args):
    """
    Command-line interface for feature renaming operations.

    Systematically renames annotation features using configurable
    patterns and rules to ensure consistent naming conventions
    across annotation files.

    Parameters:
    - args (argparse.Namespace): Parsed command-line arguments containing:
        - input: Input annotation file path
        - output: Output renamed annotation file path
        - pattern: Renaming pattern specification
        - prefix: Prefix for new feature names

    Returns:
    None
    """

Usage Examples

Basic Format Conversion

# Convert GFF3 to GTF format
gfftk convert -i annotation.gff3 -f genome.fasta -o output.gtf

# Extract protein sequences
gfftk convert -i annotation.gff3 -f genome.fasta -o proteins.faa --output-format proteins

# Convert with filtering
gfftk convert -i annotation.gff3 -f genome.fasta -o filtered.gff3 --grep product:kinase

Consensus Prediction

# Basic consensus prediction
gfftk consensus -f genome.fasta -g augustus.gff3 genemark.gff3 -p proteins.gff3 -o consensus.gff3

# With custom weights and repeat filtering
gfftk consensus -f genome.fasta -g augustus.gff3 genemark.gff3 \
    -p proteins.gff3 -t transcripts.gff3 \
    -w weights.txt --repeats repeats.bed -o consensus.gff3

Annotation Analysis

# Compare two annotations
gfftk compare --old reference.gff3 --new updated.gff3 -f genome.fasta -o comparison.txt

# Calculate statistics
gfftk stats -i annotation.gff3 -f genome.fasta -o stats.txt

# Sort GFF3 file
gfftk sort -i unsorted.gff3 -o sorted.gff3

File Processing

# Sanitize GFF3 file
gfftk sanitize -i messy.gff3 -o clean.gff3

# Rename features systematically
gfftk rename -i annotation.gff3 -o renamed.gff3 --prefix GENE

Programmatic Access

All CLI commands can be accessed programmatically by importing the corresponding functions:

import argparse
from gfftk.convert import convert
from gfftk.consensus import consensus
from gfftk.compare import compare
from gfftk.stats import stats
from gfftk.sort import sort
from gfftk.sanitize import sanitize
from gfftk.rename import rename

# Create argument namespace (equivalent to CLI args)
args = argparse.Namespace(
    input='annotation.gff3',
    fasta='genome.fasta',
    output='output.gtf',
    format='gtf',
    table=1,
    debug=False
)

# Call conversion function
convert(args)

Command Reference

CommandFunctionDescription
gfftk convertconvert()Format conversion and sequence extraction
gfftk consensusconsensus()EvidenceModeler-like consensus prediction
gfftk comparecompare()Annotation comparison and analysis
gfftk statsstats()Annotation statistics calculation
gfftk sortsort()GFF3 coordinate-based sorting
gfftk sanitizesanitize()GFF3 validation and cleaning
gfftk renamerename()Systematic feature renaming

Each command provides comprehensive help via gfftk <command> --help with detailed parameter descriptions and usage examples.

Install with Tessl CLI

npx tessl i tessl/pypi-gfftk

docs

cli-commands.md

comparison.md

consensus.md

format-conversion.md

genbank-tbl.md

gff-processing.md

index.md

sequence-operations.md

utilities.md

tile.json