CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-ply

Python implementation of lex and yacc parsing tools with LALR(1) algorithm and zero dependencies

Pending
Overview
Eval results
Files

syntax-parsing.mddocs/

Syntax Parsing

The ply.yacc module provides LALR(1) parsing capabilities, converting token streams into structured data using grammar rules defined in function docstrings. It supports precedence rules, error recovery, parser generation optimization, and comprehensive debugging.

Capabilities

Parser Creation

Creates a parser instance by analyzing grammar rules defined in the calling module. Uses the LALR(1) algorithm to build parsing tables and validate the grammar specification.

def yacc(*, debug=False, module=None, start=None, check_recursion=True, optimize=False, debugfile='parser.out', debuglog=None, errorlog=None):
    """
    Build a parser from grammar rules.

    Parameters:
    - debug: Enable debug mode (default: False)
    - module: Module containing grammar rules (default: calling module)
    - start: Start symbol for grammar (default: first rule)
    - check_recursion: Check for infinite recursion (default: True)
    - optimize: Enable parser optimization (default: False)
    - debugfile: Debug output filename (default: 'parser.out')
    - debuglog: Logger for debug output
    - errorlog: Logger for error messages

    Returns:
    LRParser instance
    """

def format_result(r):
    """
    Format result message for debug mode.

    Parameters:
    - r: Result value to format

    Returns:
    Formatted string representation
    """

def format_stack_entry(r):
    """
    Format stack entry for debug mode.

    Parameters:
    - r: Stack entry to format

    Returns:
    Formatted string representation
    """

LALR(1) Parser

Main parser class implementing the LALR(1) parsing algorithm with support for error recovery and debugging.

class LRParser:
    def parse(self, input=None, lexer=None, debug=False, tracking=False):
        """
        Parse input using the built grammar.

        Parameters:
        - input: Input string to parse (optional if lexer provided)
        - lexer: Lexer instance for tokenization
        - debug: Enable parse debugging
        - tracking: Enable position tracking for line/column info

        Returns:
        Parse result (value of start symbol)
        """

    def errok(self):
        """
        Clear the parser error state.
        Used in error recovery to continue parsing.
        """

    def restart(self):
        """
        Restart parsing from the beginning.
        Clears all parser state and positions.
        """

    def set_defaulted_states(self):
        """
        Set defaulted states for optimized parsing.
        Used internally for parser optimization.
        """

    def disable_defaulted_states(self):
        """
        Disable defaulted states.
        Used internally for parser optimization control.
        """

Production Rule Representation

Represents a grammar production rule and provides access to symbol attributes within grammar rule functions, including line numbers and lexer positions. The p parameter in grammar rules is a YaccProduction instance.

class YaccProduction:
    """
    Represents a grammar production rule.
    Used in grammar rule functions to access symbols and their attributes.
    """
    
    def __getitem__(self, n):
        """
        Get symbol value by index.

        Parameters:
        - n: Symbol index (0 = left-hand side, 1+ = right-hand side)

        Returns:
        Symbol value
        """

    def __setitem__(self, n, v):
        """
        Set symbol value by index.

        Parameters:
        - n: Symbol index (0 = left-hand side, 1+ = right-hand side)
        - v: Value to set
        """

    def __len__(self):
        """
        Get number of symbols in production.

        Returns:
        Number of symbols (including left-hand side)
        """

    def lineno(self, n):
        """
        Get line number for symbol n in grammar rule.

        Parameters:
        - n: Symbol index (0 = left-hand side, 1+ = right-hand side)

        Returns:
        Line number or None
        """

    def set_lineno(self, n, lineno):
        """
        Set line number for symbol n.

        Parameters:
        - n: Symbol index
        - lineno: Line number to set
        """

    def linespan(self, n):
        """
        Get line number span for symbol n.

        Parameters:
        - n: Symbol index

        Returns:
        Tuple of (start_line, end_line) or None
        """

    def lexpos(self, n):
        """
        Get lexer position for symbol n.

        Parameters:
        - n: Symbol index

        Returns:
        Character position or None
        """

    def set_lexpos(self, n, lexpos):
        """
        Set lexer position for symbol n.

        Parameters:
        - n: Symbol index
        - lexpos: Character position to set
        """

    def lexspan(self, n):
        """
        Get lexer position span for symbol n.

        Parameters:
        - n: Symbol index

        Returns:
        Tuple of (start_pos, end_pos) or None
        """

    def error(self):
        """
        Signal a syntax error.
        Triggers error recovery mechanisms.
        """

    # Public attributes
    slice: list      # List of symbols in the production
    stack: list      # Parser stack reference
    lexer: object    # Lexer instance reference
    parser: object   # Parser instance reference

Internal Parser Symbol

Internal representation of parser symbols during parsing.

class YaccSymbol:
    """
    Internal parser symbol representation.
    Used internally by the parser during parsing operations.
    """

Parser Error Handling

Exception hierarchy for different types of parsing errors.

class YaccError(Exception):
    """Base exception for parser errors."""

class GrammarError(YaccError):
    """
    Exception for grammar specification errors.
    Raised when grammar rules are invalid or conflicting.
    """

class LALRError(YaccError):
    """
    Exception for LALR parsing algorithm errors.
    Raised when the grammar is not LALR(1) parseable.
    """

Logging Utilities

Logging classes for parser construction and operation debugging.

class PlyLogger:
    """
    Logging utility for PLY operations.
    Provides structured logging for parser construction and operation.
    """

class NullLogger:
    """
    Null logging implementation.
    Used when logging is disabled.
    """

Grammar Rule Conventions

Basic Grammar Rules

Define grammar rules using functions with p_ prefix and BNF in docstrings:

def p_expression_binop(p):
    '''expression : expression PLUS term
                  | expression MINUS term'''
    if p[2] == '+':
        p[0] = p[1] + p[3]
    elif p[2] == '-':
        p[0] = p[1] - p[3]

def p_expression_term(p):
    '''expression : term'''
    p[0] = p[1]

def p_term_factor(p):
    '''term : factor'''
    p[0] = p[1]

Symbol Access

Access symbols in grammar rules through the p parameter:

def p_assignment(p):
    '''assignment : ID EQUALS expression'''
    # p[0] = result (left-hand side)
    # p[1] = ID token
    # p[2] = EQUALS token  
    # p[3] = expression value
    symbol_table[p[1]] = p[3]
    p[0] = p[3]

Precedence Rules

Define operator precedence and associativity:

precedence = (
    ('left', 'PLUS', 'MINUS'),
    ('left', 'TIMES', 'DIVIDE'),
    ('right', 'UMINUS'),  # Unary minus
)

def p_expression_uminus(p):
    '''expression : MINUS expression %prec UMINUS'''
    p[0] = -p[2]

Error Recovery

Handle syntax errors with error productions and recovery:

def p_error(p):
    if p:
        print(f"Syntax error at token {p.type} (line {p.lineno})")
    else:
        print("Syntax error at EOF")

def p_statement_error(p):
    '''statement : error SEMICOLON'''
    print("Syntax error in statement. Skipping to next semicolon.")
    p[0] = None

Parser Configuration

Global Configuration Variables

Module-level configuration constants:

yaccdebug = False        # Global debug mode flag
debug_file = 'parser.out'  # Default debug output filename  
error_count = 3          # Number of error recovery symbols
resultlimit = 40         # Debug result display size limit
MAXINT = sys.maxsize     # Maximum integer value

Start Symbol

The parser automatically uses the first grammar rule as the start symbol, or you can specify it explicitly:

# Automatic start symbol (first rule)
def p_program(p):
    '''program : statement_list'''
    p[0] = p[1]

# Or specify explicitly in yacc() call
parser = yacc.yacc(start='program')

Error Recovery Mechanisms

The parser provides several error recovery strategies:

  1. Error productions: Grammar rules with error token for local recovery
  2. Global error handler: p_error() function for unhandled syntax errors
  3. Error state management: errok() method to clear error state
  4. Token synchronization: Skip tokens until synchronization point
  5. Parser restart: restart() method for complete recovery

Position Tracking

Track source position information through tokens and productions using the YaccProduction parameter:

def p_assignment(p):
    '''assignment : ID EQUALS expression'''
    # p is a YaccProduction instance - access position information
    id_line = p.lineno(1)      # Line number of ID
    id_pos = p.lexpos(1)       # Character position of ID
    span = p.linespan(1)       # Line span of ID
    
    # Set position for result
    p.set_lineno(0, id_line)
    p[0] = AST.Assignment(p[1], p[3], line=id_line)

Global Variables

When yacc() is called, it sets a global variable:

  • parse: Global parse function bound to the created parser

This allows for simplified usage: result = parse(input, lexer=lexer)

Configuration Constants

yaccdebug = False        # Global debug mode flag
debug_file = 'parser.out'  # Default debug output filename  
error_count = 3          # Number of error recovery symbols
resultlimit = 40         # Debug result display size limit
MAXINT = sys.maxsize     # Maximum integer value

Install with Tessl CLI

npx tessl i tessl/pypi-ply

docs

index.md

lexical-analysis.md

syntax-parsing.md

tile.json