tessl/pypi-pygments

A syntax highlighting package that supports over 500 programming languages and text formats with extensive output format options

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview

Eval results

Files

Filter System

Name: tessl/pypi-pygments
Author: tessl

Token stream filters for post-processing highlighted code with special effects and transformations. Filters modify token streams after lexical analysis but before formatting.

Capabilities

Filter Discovery

Get filter instances by name.

def get_filter_by_name(filtername: str, **options):
    """
    Get filter instance by name with options.
    
    Parameters:
    - filtername: Filter name (e.g., 'codetagfilter', 'keywordcase')
    - **options: Filter-specific options
    
    Returns:
    Filter instance configured with the given options
    
    Raises:
    ClassNotFound: If no filter with that name is found
    """

def get_all_filters() -> Iterator[str]:
    """
    Return generator of all available filter names.
    
    Yields:
    Filter name strings for both built-in and plugin filters
    """

def find_filter_class(filtername: str):
    """
    Find filter class by name without instantiation.
    
    Parameters:
    - filtername: Filter name
    
    Returns:
    Filter class or None if not found
    """

Usage example:

from pygments.filters import get_filter_by_name, get_all_filters

# Get specific filter
codetag_filter = get_filter_by_name('codetagify', codetags=['TODO', 'FIXME', 'XXX'])

# List all available filters
print("Available filters:")
for filter_name in get_all_filters():
    print(f"  {filter_name}")

Filter Application

Apply filters to lexers:

from pygments.lexers import PythonLexer
from pygments.filters import get_filter_by_name

# Create lexer and add filters
lexer = PythonLexer()
lexer.add_filter(get_filter_by_name('codetagify'))
lexer.add_filter(get_filter_by_name('whitespace'))

# Or use filter names directly
lexer.add_filter('keywordcase', case='upper')

Built-in Filters

Code Tag Filter

Highlights special code tags in comments and docstrings.

class CodeTagFilter:
    """
    Highlight code tags like TODO, FIXME in comments.
    
    Options:
    - codetags: List of strings to highlight (default: ['XXX', 'TODO', 'FIXME', 'BUG', 'NOTE'])
    """

Usage example:

from pygments.filters import get_filter_by_name

# Default tags: XXX, TODO, FIXME, BUG, NOTE
codetag_filter = get_filter_by_name('codetagify')

# Custom tags
custom_filter = get_filter_by_name('codetagify', 
                                  codetags=['TODO', 'HACK', 'REVIEW', 'OPTIMIZE'])

Keyword Case Filter

Changes the case of language keywords.

class KeywordCaseFilter:
    """
    Convert keywords to upper or lower case.
    
    Options:
    - case: 'upper' or 'lower' (default: 'lower')
    """

Usage example:

# Make all keywords uppercase
upper_filter = get_filter_by_name('keywordcase', case='upper')

# Make all keywords lowercase  
lower_filter = get_filter_by_name('keywordcase', case='lower')

Name Highlight Filter

Highlights specific names/identifiers.

class NameHighlightFilter:
    """
    Highlight specific names in the code.
    
    Options:
    - names: List of names to highlight
    """

Usage example:

# Highlight specific function/variable names
name_filter = get_filter_by_name('highlight', 
                                names=['main', 'process_data', 'API_KEY'])

Visible Whitespace Filter

Makes whitespace characters visible.

class VisibleWhitespaceFilter:
    """
    Make whitespace visible by replacing with symbols.
    
    Options:
    - spaces: Character to represent spaces (default: '·')
    - tabs: Character to represent tabs (default: '»')
    - tabsize: Tab size for display (default: 8)
    - newlines: Character for newlines (default: '¶')
    - wsnl: Show newlines (default: False)
    """

Usage example:

# Show all whitespace
ws_filter = get_filter_by_name('whitespace', 
                              spaces='·', tabs='»', newlines='¶', wsnl=True)

# Show only spaces and tabs
ws_filter = get_filter_by_name('whitespace', 
                              spaces='·', tabs='»')

Gobble Filter

Removes common leading whitespace from all lines.

class GobbleFilter:
    """
    Remove common leading whitespace from all lines.
    
    Options:
    - n: Number of characters to remove from each line (auto-detected if not specified)
    """

Usage example:

# Auto-detect common indentation and remove it
gobble_filter = get_filter_by_name('gobble')

# Remove specific number of characters
gobble_filter = get_filter_by_name('gobble', n=4)

Token Merge Filter

Merges consecutive tokens of the same type.

class TokenMergeFilter:
    """
    Merge consecutive tokens of the same type to reduce token count.
    """

Usage example:

merge_filter = get_filter_by_name('tokenmerge')

Raise on Error Token Filter

Raises an exception when error tokens are encountered.

class RaiseOnErrorTokenFilter:
    """
    Raise FilterError on Error tokens.
    
    Options:
    - excclass: Exception class to raise (default: pygments.filters.ErrorToken)
    """

Usage example:

error_filter = get_filter_by_name('raiseonerror')

Symbol Filter

Highlights specific symbols or operators in the code.

class SymbolFilter:
    """
    Highlight specific symbols in the code.
    
    Options:
    - symbols: List of symbols to highlight
    """

Usage example:

# Highlight specific symbols
symbol_filter = get_filter_by_name('symbols', symbols=['==', '!=', '<=', '>='])

Filter Usage Examples

Basic Filter Application

from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter

code = '''
def process_data():
    # TODO: Optimize this function
    # FIXME: Handle edge cases
    data = "hello world"  # Some data
    return data
'''

# Create lexer and add filters
lexer = PythonLexer()
lexer.add_filter('codetagify')  # Highlight TODO, FIXME
lexer.add_filter('whitespace', spaces='·')  # Show spaces

# Highlight with filters applied
result = highlight(code, lexer, HtmlFormatter())

Multiple Filters

# Chain multiple filters
lexer = PythonLexer()
lexer.add_filter('gobble')  # Remove common indentation
lexer.add_filter('codetagify', codetags=['TODO', 'HACK'])
lexer.add_filter('keywordcase', case='upper')
lexer.add_filter('tokenmerge')  # Optimize token stream

result = highlight(code, lexer, HtmlFormatter())

Custom Filter Chain

from pygments.filter import Filter

class CustomFilter(Filter):
    def filter(self, lexer, stream):
        for ttype, value in stream:
            # Custom token processing
            if 'secret' in value.lower():
                value = value.replace('secret', '***')
            yield ttype, value

# Use custom filter
lexer = PythonLexer()
lexer.add_filter(CustomFilter())

Filter Order

Filters are applied in the order they are added to the lexer. The order can affect the final result:

lexer = PythonLexer()

# Order matters:
lexer.add_filter('gobble')        # 1. Remove indentation first
lexer.add_filter('codetagfilter') # 2. Then highlight code tags
lexer.add_filter('tokenmerge')    # 3. Finally merge tokens