Python implementation of lex and yacc parsing tools with LALR(1) algorithm and zero dependencies
—
The ply.yacc module provides LALR(1) parsing capabilities, converting token streams into structured data using grammar rules defined in function docstrings. It supports precedence rules, error recovery, parser generation optimization, and comprehensive debugging.
Creates a parser instance by analyzing grammar rules defined in the calling module. Uses the LALR(1) algorithm to build parsing tables and validate the grammar specification.
def yacc(*, debug=False, module=None, start=None, check_recursion=True, optimize=False, debugfile='parser.out', debuglog=None, errorlog=None):
"""
Build a parser from grammar rules.
Parameters:
- debug: Enable debug mode (default: False)
- module: Module containing grammar rules (default: calling module)
- start: Start symbol for grammar (default: first rule)
- check_recursion: Check for infinite recursion (default: True)
- optimize: Enable parser optimization (default: False)
- debugfile: Debug output filename (default: 'parser.out')
- debuglog: Logger for debug output
- errorlog: Logger for error messages
Returns:
LRParser instance
"""
def format_result(r):
"""
Format result message for debug mode.
Parameters:
- r: Result value to format
Returns:
Formatted string representation
"""
def format_stack_entry(r):
"""
Format stack entry for debug mode.
Parameters:
- r: Stack entry to format
Returns:
Formatted string representation
"""Main parser class implementing the LALR(1) parsing algorithm with support for error recovery and debugging.
class LRParser:
def parse(self, input=None, lexer=None, debug=False, tracking=False):
"""
Parse input using the built grammar.
Parameters:
- input: Input string to parse (optional if lexer provided)
- lexer: Lexer instance for tokenization
- debug: Enable parse debugging
- tracking: Enable position tracking for line/column info
Returns:
Parse result (value of start symbol)
"""
def errok(self):
"""
Clear the parser error state.
Used in error recovery to continue parsing.
"""
def restart(self):
"""
Restart parsing from the beginning.
Clears all parser state and positions.
"""
def set_defaulted_states(self):
"""
Set defaulted states for optimized parsing.
Used internally for parser optimization.
"""
def disable_defaulted_states(self):
"""
Disable defaulted states.
Used internally for parser optimization control.
"""Represents a grammar production rule and provides access to symbol attributes within grammar rule functions, including line numbers and lexer positions. The p parameter in grammar rules is a YaccProduction instance.
class YaccProduction:
"""
Represents a grammar production rule.
Used in grammar rule functions to access symbols and their attributes.
"""
def __getitem__(self, n):
"""
Get symbol value by index.
Parameters:
- n: Symbol index (0 = left-hand side, 1+ = right-hand side)
Returns:
Symbol value
"""
def __setitem__(self, n, v):
"""
Set symbol value by index.
Parameters:
- n: Symbol index (0 = left-hand side, 1+ = right-hand side)
- v: Value to set
"""
def __len__(self):
"""
Get number of symbols in production.
Returns:
Number of symbols (including left-hand side)
"""
def lineno(self, n):
"""
Get line number for symbol n in grammar rule.
Parameters:
- n: Symbol index (0 = left-hand side, 1+ = right-hand side)
Returns:
Line number or None
"""
def set_lineno(self, n, lineno):
"""
Set line number for symbol n.
Parameters:
- n: Symbol index
- lineno: Line number to set
"""
def linespan(self, n):
"""
Get line number span for symbol n.
Parameters:
- n: Symbol index
Returns:
Tuple of (start_line, end_line) or None
"""
def lexpos(self, n):
"""
Get lexer position for symbol n.
Parameters:
- n: Symbol index
Returns:
Character position or None
"""
def set_lexpos(self, n, lexpos):
"""
Set lexer position for symbol n.
Parameters:
- n: Symbol index
- lexpos: Character position to set
"""
def lexspan(self, n):
"""
Get lexer position span for symbol n.
Parameters:
- n: Symbol index
Returns:
Tuple of (start_pos, end_pos) or None
"""
def error(self):
"""
Signal a syntax error.
Triggers error recovery mechanisms.
"""
# Public attributes
slice: list # List of symbols in the production
stack: list # Parser stack reference
lexer: object # Lexer instance reference
parser: object # Parser instance referenceInternal representation of parser symbols during parsing.
class YaccSymbol:
"""
Internal parser symbol representation.
Used internally by the parser during parsing operations.
"""Exception hierarchy for different types of parsing errors.
class YaccError(Exception):
"""Base exception for parser errors."""
class GrammarError(YaccError):
"""
Exception for grammar specification errors.
Raised when grammar rules are invalid or conflicting.
"""
class LALRError(YaccError):
"""
Exception for LALR parsing algorithm errors.
Raised when the grammar is not LALR(1) parseable.
"""Logging classes for parser construction and operation debugging.
class PlyLogger:
"""
Logging utility for PLY operations.
Provides structured logging for parser construction and operation.
"""
class NullLogger:
"""
Null logging implementation.
Used when logging is disabled.
"""Define grammar rules using functions with p_ prefix and BNF in docstrings:
def p_expression_binop(p):
'''expression : expression PLUS term
| expression MINUS term'''
if p[2] == '+':
p[0] = p[1] + p[3]
elif p[2] == '-':
p[0] = p[1] - p[3]
def p_expression_term(p):
'''expression : term'''
p[0] = p[1]
def p_term_factor(p):
'''term : factor'''
p[0] = p[1]Access symbols in grammar rules through the p parameter:
def p_assignment(p):
'''assignment : ID EQUALS expression'''
# p[0] = result (left-hand side)
# p[1] = ID token
# p[2] = EQUALS token
# p[3] = expression value
symbol_table[p[1]] = p[3]
p[0] = p[3]Define operator precedence and associativity:
precedence = (
('left', 'PLUS', 'MINUS'),
('left', 'TIMES', 'DIVIDE'),
('right', 'UMINUS'), # Unary minus
)
def p_expression_uminus(p):
'''expression : MINUS expression %prec UMINUS'''
p[0] = -p[2]Handle syntax errors with error productions and recovery:
def p_error(p):
if p:
print(f"Syntax error at token {p.type} (line {p.lineno})")
else:
print("Syntax error at EOF")
def p_statement_error(p):
'''statement : error SEMICOLON'''
print("Syntax error in statement. Skipping to next semicolon.")
p[0] = NoneModule-level configuration constants:
yaccdebug = False # Global debug mode flag
debug_file = 'parser.out' # Default debug output filename
error_count = 3 # Number of error recovery symbols
resultlimit = 40 # Debug result display size limit
MAXINT = sys.maxsize # Maximum integer valueThe parser automatically uses the first grammar rule as the start symbol, or you can specify it explicitly:
# Automatic start symbol (first rule)
def p_program(p):
'''program : statement_list'''
p[0] = p[1]
# Or specify explicitly in yacc() call
parser = yacc.yacc(start='program')The parser provides several error recovery strategies:
error token for local recoveryp_error() function for unhandled syntax errorserrok() method to clear error staterestart() method for complete recoveryTrack source position information through tokens and productions using the YaccProduction parameter:
def p_assignment(p):
'''assignment : ID EQUALS expression'''
# p is a YaccProduction instance - access position information
id_line = p.lineno(1) # Line number of ID
id_pos = p.lexpos(1) # Character position of ID
span = p.linespan(1) # Line span of ID
# Set position for result
p.set_lineno(0, id_line)
p[0] = AST.Assignment(p[1], p[3], line=id_line)When yacc() is called, it sets a global variable:
parse: Global parse function bound to the created parserThis allows for simplified usage: result = parse(input, lexer=lexer)
yaccdebug = False # Global debug mode flag
debug_file = 'parser.out' # Default debug output filename
error_count = 3 # Number of error recovery symbols
resultlimit = 40 # Debug result display size limit
MAXINT = sys.maxsize # Maximum integer valueInstall with Tessl CLI
npx tessl i tessl/pypi-ply