CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-sphinx-autobuild

Rebuild Sphinx documentation on changes, with hot reloading in the browser.

Overview
Eval results
Files

filtering.mddocs/

File Filtering

The file filtering system provides flexible file filtering using glob patterns and regular expressions to ignore specific files and directories during file watching. This prevents unnecessary rebuilds when temporary or irrelevant files change.

Capabilities

IgnoreFilter Class

The main filtering class that determines whether files should be ignored during watching.

class IgnoreFilter:
    def __init__(self, regular, regex_based):
        """
        Initialize filter with glob patterns and regex patterns.
        
        Parameters:
        - regular: list[str] - Glob patterns for files/directories to ignore
        - regex_based: list[str] - Regular expression patterns to ignore
        
        Processing:
        - Normalizes all paths to POSIX format with resolved absolute paths
        - Compiles regex patterns for efficient matching
        - Removes duplicates while preserving order
        """
    
    def __repr__(self):
        """
        String representation of the filter.
        
        Returns:
        - str - Formatted string showing regular and regex patterns
        """
    
    def __call__(self, filename: str, /):
        """
        Determine if a file should be ignored.
        
        Parameters:
        - filename: str - File path to check (can be relative or absolute)
        
        Returns:
        - bool - True if file should be ignored, False otherwise
        
        Matching Logic:
        - Normalizes input path to absolute POSIX format
        - Tests against all glob patterns using fnmatch and prefix matching
        - Tests against all compiled regular expressions
        - Returns True on first match (short-circuit evaluation)
        """

Pattern Types

Glob Patterns (Regular)

Standard shell-style glob patterns for file and directory matching:

from sphinx_autobuild.filter import IgnoreFilter

# Basic glob patterns
ignore_filter = IgnoreFilter(
    regular=[
        "*.tmp",           # All .tmp files
        "*.log",           # All .log files  
        "__pycache__",     # __pycache__ directories
        "node_modules",    # node_modules directories
        ".git",            # .git directory
        "*.swp",           # Vim swap files
        "*~",              # Backup files
    ],
    regex_based=[]
)

Glob Pattern Features:

  • * - Matches any number of characters (except path separators)
  • ? - Matches single character
  • [chars] - Matches any character in brackets
  • ** - Not supported (use regex for recursive matching)

Directory Matching

Glob patterns can match directories by name or path prefix:

# Directory name matching
regular_patterns = [
    ".git",            # Matches any .git directory
    "__pycache__",     # Matches any __pycache__ directory
    "node_modules",    # Matches any node_modules directory
]

# Path prefix matching (directories)
# Files under these directories are automatically ignored
regular_patterns = [
    "/absolute/path/to/ignore",  # Ignore entire directory tree
    "relative/dir",              # Ignore relative directory tree
]

Regular Expression Patterns

Advanced pattern matching using Python regular expressions:

from sphinx_autobuild.filter import IgnoreFilter

# Regex patterns for complex matching
ignore_filter = IgnoreFilter(
    regular=[],
    regex_based=[
        r"\.tmp$",                    # Files ending with .tmp
        r"\.sw[po]$",                # Vim swap files (.swp, .swo)
        r".*\.backup$",              # Files ending with .backup
        r"^.*/__pycache__/.*$",      # Anything in __pycache__ directories
        r"^.*\.git/.*$",             # Anything in .git directories
        r"/build/temp/.*",           # Files in build/temp directories
        r".*\.(log|tmp|cache)$",     # Multiple extensions
        r"^.*\.(DS_Store|Thumbs\.db)$",  # System files
    ]
)

Regex Features:

  • Full Python regex syntax supported
  • Case-sensitive matching (use (?i) for case-insensitive)
  • Anchors: ^ (start), $ (end)
  • Character classes: [a-z], \d, \w, etc.
  • Quantifiers: *, +, ?, {n,m}

Usage Examples

Basic Filtering Setup

from sphinx_autobuild.filter import IgnoreFilter

# Common development file filtering
ignore_filter = IgnoreFilter(
    regular=[
        ".git",
        "__pycache__", 
        "*.pyc",
        "*.tmp",
        ".DS_Store",
        "Thumbs.db",
        ".vscode",
        ".idea",
    ],
    regex_based=[
        r".*\.swp$",      # Vim swap files
        r".*~$",          # Backup files
        r".*\.log$",      # Log files
    ]
)

# Test the filter
print(ignore_filter("main.py"))          # False (not ignored)
print(ignore_filter("temp.tmp"))         # True (ignored by glob)
print(ignore_filter("file.swp"))         # True (ignored by regex)
print(ignore_filter(".git/config"))      # True (ignored by directory)

Advanced Pattern Combinations

# Complex filtering for documentation project
ignore_filter = IgnoreFilter(
    regular=[
        # Build directories
        "_build",
        ".doctrees", 
        ".buildinfo",
        
        # Version control
        ".git",
        ".svn",
        ".hg",
        
        # IDE files
        ".vscode",
        ".idea",
        "*.sublime-*",
        
        # Python
        "__pycache__",
        "*.pyc",
        "*.pyo",
        ".pytest_cache",
        ".mypy_cache",
        
        # Node.js
        "node_modules",
        ".npm",
        
        # Temporary files
        "*.tmp",
        "*.temp",
    ],
    regex_based=[
        # Editor backup files
        r".*~$",
        r".*\.sw[po]$",     # Vim
        r"#.*#$",           # Emacs
        
        # Log files with timestamps
        r".*\.log\.\d{4}-\d{2}-\d{2}$",
        
        # Build artifacts
        r".*/build/temp/.*",
        r".*/dist/.*\.egg-info/.*",
        
        # OS files
        r".*\.DS_Store$",
        r".*Thumbs\.db$",
        
        # Lock files
        r".*\.lock$",
        r"package-lock\.json$",
    ]
)

Integration with Command Line

The filter integrates with command-line arguments:

# From command line: --ignore "*.tmp" --re-ignore ".*\.swp$"
ignore_patterns = ["*.tmp", "*.log"]      # From --ignore
regex_patterns = [r".*\.swp$", r".*~$"]   # From --re-ignore

ignore_filter = IgnoreFilter(ignore_patterns, regex_patterns)

Debug Mode

Enable debug output to see filtering decisions:

import os
os.environ["SPHINX_AUTOBUILD_DEBUG"] = "1"

# Now the filter will print debug info
ignore_filter = IgnoreFilter(["*.tmp"], [r".*\.swp$"])
ignore_filter("test.tmp")  # Prints: SPHINX_AUTOBUILD_DEBUG: '/path/test.tmp' has changed; ignores are ...

Debug Output Format:

SPHINX_AUTOBUILD_DEBUG: '/absolute/path/to/file.ext' has changed; ignores are IgnoreFilter(regular=['*.tmp'], regex_based=[re.compile('.*\\.swp$')])

Path Normalization

All paths are normalized before filtering:

from pathlib import Path

# Input paths (various formats)
paths = [
    "docs/index.rst",                    # Relative path
    "/home/user/project/docs/api.rst",   # Absolute path  
    Path("docs/modules/core.rst"),       # Path object
    "./docs/getting-started.rst",       # Current directory relative
    "../shared/templates/base.html",    # Parent directory relative
]

# All paths are normalized to absolute POSIX format:
# /home/user/project/docs/index.rst
# /home/user/project/docs/api.rst
# /home/user/project/docs/modules/core.rst
# /home/user/project/docs/getting-started.rst
# /home/user/shared/templates/base.html

Performance Characteristics

Efficient Matching

  • Short-circuit Evaluation: Returns True on first match
  • Compiled Regexes: Regular expressions are pre-compiled during initialization
  • Path Caching: Normalized paths avoid repeated resolution
  • Duplicate Removal: Patterns are deduplicated during initialization

Pattern Ordering

Patterns are tested in this order:

  1. Regular patterns (glob-style) - typically faster
  2. Regex patterns - more flexible but potentially slower

For best performance, put most common patterns first in each list.

Memory Usage

  • Pattern Storage: Minimal memory overhead for pattern storage
  • Compiled Regexes: Small memory cost for compiled regex objects
  • No Path Caching: File paths are not cached (stateless operation)

Common Use Cases

Documentation Projects

# Typical documentation project ignores
doc_filter = IgnoreFilter(
    regular=[
        "_build",         # Sphinx build directory
        ".doctrees",      # Sphinx doctree cache
        "*.tmp",          # Temporary files
        ".git",           # Version control
    ],
    regex_based=[
        r".*\.sw[po]$",  # Editor swap files
        r".*~$",         # Backup files  
    ]
)

Multi-language Projects

# Mixed Python/JavaScript/Docs project
mixed_filter = IgnoreFilter(
    regular=[
        # Python
        "__pycache__", "*.pyc", ".pytest_cache",
        
        # JavaScript
        "node_modules", ".npm", "*.min.js",
        
        # Documentation  
        "_build", ".doctrees",
        
        # General
        ".git", ".vscode", "*.tmp",
    ],
    regex_based=[
        # Build artifacts
        r".*/dist/.*",
        r".*/build/.*\.js$",
        
        # Logs with dates
        r".*\.log\.\d{4}-\d{2}-\d{2}$",
    ]
)

Editor Integration

Different editors create different temporary files:

# Editor-specific ignores
editor_filter = IgnoreFilter(
    regular=[
        # Vim
        "*.swp", "*.swo", "*~",
        
        # Emacs
        "#*#", ".#*",
        
        # VSCode
        ".vscode",
        
        # JetBrains
        ".idea",
        
        # Sublime Text
        "*.sublime-workspace", "*.sublime-project",
    ],
    regex_based=[
        # Temporary files with PIDs
        r".*\.tmp\.\d+$",
        
        # Lock files
        r".*\.lock$",
    ]
)

Install with Tessl CLI

npx tessl i tessl/pypi-sphinx-autobuild

docs

build.md

cli.md

filtering.md

index.md

middleware.md

server.md

utils.md

tile.json