CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-tatsu

TatSu takes a grammar in a variation of EBNF as input, and outputs a memoizing PEG/Packrat parser in Python.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

code-generation.mddocs/

Code Generation

Generate static Python parser code and object model classes from EBNF grammars for deployment, distribution, and integration into applications without runtime dependencies on TatSu.

Capabilities

Python Parser Code Generation

Generate complete, standalone Python parser classes from EBNF grammars that can be distributed and used independently of TatSu.

def to_python_sourcecode(grammar, name=None, filename=None, config=None, **settings):
    """
    Generate Python parser source code from grammar.
    
    Parameters:
    - grammar (str): EBNF grammar definition string
    - name (str, optional): Parser class name (defaults to grammar filename base)
    - filename (str, optional): Source filename for error reporting and class naming
    - config (ParserConfig, optional): Parser configuration object
    - **settings: Additional generation settings (trace, left_recursion, etc.)
    
    Returns:
    str: Complete Python source code for a parser class
    
    Raises:
    GrammarError: If grammar contains syntax or semantic errors
    CodegenError: If code generation fails
    """

Usage example:

import tatsu

grammar = '''
    start = expr;
    expr = term ("+" term)*;
    term = factor ("*" factor)*;
    factor = "(" expr ")" | number;
    number = /\d+/;
'''

# Generate parser code
parser_code = tatsu.to_python_sourcecode(grammar, name="Calculator")

# Save to file
with open("calculator_parser.py", "w") as f:
    f.write(parser_code)

# The generated code can be imported and used:
# from calculator_parser import CalculatorParser
# parser = CalculatorParser()
# result = parser.parse("2 + 3 * 4")

Object Model Generation

Generate Python dataclass or custom class definitions that correspond to grammar rules, enabling strongly-typed parse results.

def to_python_model(grammar, name=None, filename=None, base_type=None, config=None, **settings):
    """
    Generate Python object model classes from grammar.
    
    Parameters:
    - grammar (str): EBNF grammar definition string
    - name (str, optional): Model class prefix (defaults to grammar filename base)
    - filename (str, optional): Source filename for error reporting
    - base_type (type, optional): Base class for generated model classes (default: Node)
    - config (ParserConfig, optional): Parser configuration object
    - **settings: Additional generation settings
    
    Returns:
    str: Python source code for object model classes
    
    Raises:
    GrammarError: If grammar contains syntax or semantic errors
    CodegenError: If model generation fails
    """

Usage example:

import tatsu
from tatsu.objectmodel import Node

grammar = '''
    start = expr;
    expr::Expr = term ("+" term)*;
    term::Term = factor ("*" factor)*;
    factor::Factor = "(" expr ")" | number;
    number::Number = /\d+/;
'''

# Generate object model with custom base type
class MyBaseNode(Node):
    def __repr__(self):
        return f"{self.__class__.__name__}({super().__repr__()})"

model_code = tatsu.to_python_model(
    grammar, 
    name="Calculator",
    base_type=MyBaseNode
)

# Save generated model classes
with open("calculator_model.py", "w") as f:
    f.write(model_code)

# Use with semantic actions
from calculator_model import *

class CalculatorSemantics:
    def number(self, ast):
        return Number(value=int(ast))
    
    def expr(self, ast):
        return Expr(terms=ast)

model = tatsu.compile(grammar)
result = model.parse("2 + 3", semantics=CalculatorSemantics())

Code Generation Options

Advanced options for customizing the generated parser and model code:

# Parser generation settings
trace: bool = False              # Include tracing support in generated parser
left_recursion: bool = True      # Enable left-recursion in generated parser  
nameguard: bool = None          # Include nameguard logic in generated parser
whitespace: str = None          # Default whitespace handling

# Model generation settings
base_type: type = None          # Base class for generated model classes
types: Dict[str, type] = None   # Custom type mappings for specific rules

Generated Code Structure

The generated parser code follows a consistent structure:

# Generated parser class structure
class GeneratedParser:
    """Generated parser class with all grammar rules as methods."""
    
    def __init__(self, **kwargs):
        """Initialize parser with optional configuration."""
    
    def parse(self, text, start=None, **kwargs):
        """Main parsing method."""
    
    def _rule_name_(self):
        """Generated method for each grammar rule."""
        
    # Error handling and utility methods
    def _error(self, item, pos):
        """Error reporting method."""
    
    def _call(self, rule):
        """Rule invocation method."""

The generated object model classes are dataclasses or Node subclasses:

@dataclass
class RuleName(BaseType):
    """Generated class for grammar rule 'rule_name'."""
    field1: Any
    field2: List[Any]
    # Fields correspond to named elements in the rule

Integration Examples

Using generated code in applications:

# Example: Using generated parser in a web application
from flask import Flask, request, jsonify
from my_generated_parser import MyParser
from tatsu.exceptions import ParseException

app = Flask(__name__)
parser = MyParser()

@app.route('/parse', methods=['POST'])
def parse_input():
    try:
        text = request.json['input']
        result = parser.parse(text)
        return jsonify({'success': True, 'ast': result})
    except ParseException as e:
        return jsonify({
            'success': False, 
            'error': str(e),
            'line': e.line,
            'column': e.col
        }), 400

Standalone Deployment

Generated parsers are completely standalone and can be deployed without TatSu:

# Requirements for generated parser (minimal dependencies)
# - Python 3.10+
# - No external dependencies (TatSu not required)

# The generated parser includes all necessary parsing logic:
# - PEG parsing algorithm
# - Memoization (packrat parsing)
# - Left-recursion support
# - Error handling and reporting
# - AST construction

Advanced Code Generation

Custom Code Templates

For advanced use cases, TatSu's code generation can be customized using the underlying code generation infrastructure:

from tatsu.ngcodegen import codegen
from tatsu.ngcodegen.objectmodel import modelgen

# Direct access to code generators
def custom_codegen(model, **kwargs):
    """Access to lower-level code generation."""
    return codegen(model, **kwargs)

def custom_modelgen(model, **kwargs):
    """Access to lower-level model generation.""" 
    return modelgen(model, **kwargs)

Generated Code Optimization

Generated parsers include several optimizations:

  • Memoization: Packrat parsing with automatic memoization
  • Left-recursion: Advanced left-recursion handling algorithms
  • Error recovery: Comprehensive error reporting with position information
  • Minimized overhead: Optimized for production deployment

The generated code is suitable for:

  • High-performance parsing applications
  • Production web services
  • Embedded parsing in larger applications
  • Distribution as standalone parsing libraries

Install with Tessl CLI

npx tessl i tessl/pypi-tatsu

docs

ast-models.md

code-generation.md

configuration.md

core-parsing.md

exceptions.md

index.md

semantic-actions.md

tree-walking.md

tile.json