0
# File Filtering
1
2
The file filtering system provides flexible file filtering using glob patterns and regular expressions to ignore specific files and directories during file watching. This prevents unnecessary rebuilds when temporary or irrelevant files change.
3
4
## Capabilities
5
6
### IgnoreFilter Class
7
8
The main filtering class that determines whether files should be ignored during watching.
9
10
```python { .api }
11
class IgnoreFilter:
12
def __init__(self, regular, regex_based):
13
"""
14
Initialize filter with glob patterns and regex patterns.
15
16
Parameters:
17
- regular: list[str] - Glob patterns for files/directories to ignore
18
- regex_based: list[str] - Regular expression patterns to ignore
19
20
Processing:
21
- Normalizes all paths to POSIX format with resolved absolute paths
22
- Compiles regex patterns for efficient matching
23
- Removes duplicates while preserving order
24
"""
25
26
def __repr__(self):
27
"""
28
String representation of the filter.
29
30
Returns:
31
- str - Formatted string showing regular and regex patterns
32
"""
33
34
def __call__(self, filename: str, /):
35
"""
36
Determine if a file should be ignored.
37
38
Parameters:
39
- filename: str - File path to check (can be relative or absolute)
40
41
Returns:
42
- bool - True if file should be ignored, False otherwise
43
44
Matching Logic:
45
- Normalizes input path to absolute POSIX format
46
- Tests against all glob patterns using fnmatch and prefix matching
47
- Tests against all compiled regular expressions
48
- Returns True on first match (short-circuit evaluation)
49
"""
50
```
51
52
## Pattern Types
53
54
### Glob Patterns (Regular)
55
56
Standard shell-style glob patterns for file and directory matching:
57
58
```python
59
from sphinx_autobuild.filter import IgnoreFilter
60
61
# Basic glob patterns
62
ignore_filter = IgnoreFilter(
63
regular=[
64
"*.tmp", # All .tmp files
65
"*.log", # All .log files
66
"__pycache__", # __pycache__ directories
67
"node_modules", # node_modules directories
68
".git", # .git directory
69
"*.swp", # Vim swap files
70
"*~", # Backup files
71
],
72
regex_based=[]
73
)
74
```
75
76
**Glob Pattern Features:**
77
- `*` - Matches any number of characters (except path separators)
78
- `?` - Matches single character
79
- `[chars]` - Matches any character in brackets
80
- `**` - Not supported (use regex for recursive matching)
81
82
### Directory Matching
83
84
Glob patterns can match directories by name or path prefix:
85
86
```python
87
# Directory name matching
88
regular_patterns = [
89
".git", # Matches any .git directory
90
"__pycache__", # Matches any __pycache__ directory
91
"node_modules", # Matches any node_modules directory
92
]
93
94
# Path prefix matching (directories)
95
# Files under these directories are automatically ignored
96
regular_patterns = [
97
"/absolute/path/to/ignore", # Ignore entire directory tree
98
"relative/dir", # Ignore relative directory tree
99
]
100
```
101
102
### Regular Expression Patterns
103
104
Advanced pattern matching using Python regular expressions:
105
106
```python
107
from sphinx_autobuild.filter import IgnoreFilter
108
109
# Regex patterns for complex matching
110
ignore_filter = IgnoreFilter(
111
regular=[],
112
regex_based=[
113
r"\.tmp$", # Files ending with .tmp
114
r"\.sw[po]$", # Vim swap files (.swp, .swo)
115
r".*\.backup$", # Files ending with .backup
116
r"^.*/__pycache__/.*$", # Anything in __pycache__ directories
117
r"^.*\.git/.*$", # Anything in .git directories
118
r"/build/temp/.*", # Files in build/temp directories
119
r".*\.(log|tmp|cache)$", # Multiple extensions
120
r"^.*\.(DS_Store|Thumbs\.db)$", # System files
121
]
122
)
123
```
124
125
**Regex Features:**
126
- Full Python regex syntax supported
127
- Case-sensitive matching (use `(?i)` for case-insensitive)
128
- Anchors: `^` (start), `$` (end)
129
- Character classes: `[a-z]`, `\d`, `\w`, etc.
130
- Quantifiers: `*`, `+`, `?`, `{n,m}`
131
132
## Usage Examples
133
134
### Basic Filtering Setup
135
136
```python
137
from sphinx_autobuild.filter import IgnoreFilter
138
139
# Common development file filtering
140
ignore_filter = IgnoreFilter(
141
regular=[
142
".git",
143
"__pycache__",
144
"*.pyc",
145
"*.tmp",
146
".DS_Store",
147
"Thumbs.db",
148
".vscode",
149
".idea",
150
],
151
regex_based=[
152
r".*\.swp$", # Vim swap files
153
r".*~$", # Backup files
154
r".*\.log$", # Log files
155
]
156
)
157
158
# Test the filter
159
print(ignore_filter("main.py")) # False (not ignored)
160
print(ignore_filter("temp.tmp")) # True (ignored by glob)
161
print(ignore_filter("file.swp")) # True (ignored by regex)
162
print(ignore_filter(".git/config")) # True (ignored by directory)
163
```
164
165
### Advanced Pattern Combinations
166
167
```python
168
# Complex filtering for documentation project
169
ignore_filter = IgnoreFilter(
170
regular=[
171
# Build directories
172
"_build",
173
".doctrees",
174
".buildinfo",
175
176
# Version control
177
".git",
178
".svn",
179
".hg",
180
181
# IDE files
182
".vscode",
183
".idea",
184
"*.sublime-*",
185
186
# Python
187
"__pycache__",
188
"*.pyc",
189
"*.pyo",
190
".pytest_cache",
191
".mypy_cache",
192
193
# Node.js
194
"node_modules",
195
".npm",
196
197
# Temporary files
198
"*.tmp",
199
"*.temp",
200
],
201
regex_based=[
202
# Editor backup files
203
r".*~$",
204
r".*\.sw[po]$", # Vim
205
r"#.*#$", # Emacs
206
207
# Log files with timestamps
208
r".*\.log\.\d{4}-\d{2}-\d{2}$",
209
210
# Build artifacts
211
r".*/build/temp/.*",
212
r".*/dist/.*\.egg-info/.*",
213
214
# OS files
215
r".*\.DS_Store$",
216
r".*Thumbs\.db$",
217
218
# Lock files
219
r".*\.lock$",
220
r"package-lock\.json$",
221
]
222
)
223
```
224
225
### Integration with Command Line
226
227
The filter integrates with command-line arguments:
228
229
```python
230
# From command line: --ignore "*.tmp" --re-ignore ".*\.swp$"
231
ignore_patterns = ["*.tmp", "*.log"] # From --ignore
232
regex_patterns = [r".*\.swp$", r".*~$"] # From --re-ignore
233
234
ignore_filter = IgnoreFilter(ignore_patterns, regex_patterns)
235
```
236
237
## Debug Mode
238
239
Enable debug output to see filtering decisions:
240
241
```python
242
import os
243
os.environ["SPHINX_AUTOBUILD_DEBUG"] = "1"
244
245
# Now the filter will print debug info
246
ignore_filter = IgnoreFilter(["*.tmp"], [r".*\.swp$"])
247
ignore_filter("test.tmp") # Prints: SPHINX_AUTOBUILD_DEBUG: '/path/test.tmp' has changed; ignores are ...
248
```
249
250
**Debug Output Format:**
251
```
252
SPHINX_AUTOBUILD_DEBUG: '/absolute/path/to/file.ext' has changed; ignores are IgnoreFilter(regular=['*.tmp'], regex_based=[re.compile('.*\\.swp$')])
253
```
254
255
## Path Normalization
256
257
All paths are normalized before filtering:
258
259
```python
260
from pathlib import Path
261
262
# Input paths (various formats)
263
paths = [
264
"docs/index.rst", # Relative path
265
"/home/user/project/docs/api.rst", # Absolute path
266
Path("docs/modules/core.rst"), # Path object
267
"./docs/getting-started.rst", # Current directory relative
268
"../shared/templates/base.html", # Parent directory relative
269
]
270
271
# All paths are normalized to absolute POSIX format:
272
# /home/user/project/docs/index.rst
273
# /home/user/project/docs/api.rst
274
# /home/user/project/docs/modules/core.rst
275
# /home/user/project/docs/getting-started.rst
276
# /home/user/shared/templates/base.html
277
```
278
279
## Performance Characteristics
280
281
### Efficient Matching
282
283
- **Short-circuit Evaluation**: Returns True on first match
284
- **Compiled Regexes**: Regular expressions are pre-compiled during initialization
285
- **Path Caching**: Normalized paths avoid repeated resolution
286
- **Duplicate Removal**: Patterns are deduplicated during initialization
287
288
### Pattern Ordering
289
290
Patterns are tested in this order:
291
1. **Regular patterns** (glob-style) - typically faster
292
2. **Regex patterns** - more flexible but potentially slower
293
294
For best performance, put most common patterns first in each list.
295
296
### Memory Usage
297
298
- **Pattern Storage**: Minimal memory overhead for pattern storage
299
- **Compiled Regexes**: Small memory cost for compiled regex objects
300
- **No Path Caching**: File paths are not cached (stateless operation)
301
302
## Common Use Cases
303
304
### Documentation Projects
305
306
```python
307
# Typical documentation project ignores
308
doc_filter = IgnoreFilter(
309
regular=[
310
"_build", # Sphinx build directory
311
".doctrees", # Sphinx doctree cache
312
"*.tmp", # Temporary files
313
".git", # Version control
314
],
315
regex_based=[
316
r".*\.sw[po]$", # Editor swap files
317
r".*~$", # Backup files
318
]
319
)
320
```
321
322
### Multi-language Projects
323
324
```python
325
# Mixed Python/JavaScript/Docs project
326
mixed_filter = IgnoreFilter(
327
regular=[
328
# Python
329
"__pycache__", "*.pyc", ".pytest_cache",
330
331
# JavaScript
332
"node_modules", ".npm", "*.min.js",
333
334
# Documentation
335
"_build", ".doctrees",
336
337
# General
338
".git", ".vscode", "*.tmp",
339
],
340
regex_based=[
341
# Build artifacts
342
r".*/dist/.*",
343
r".*/build/.*\.js$",
344
345
# Logs with dates
346
r".*\.log\.\d{4}-\d{2}-\d{2}$",
347
]
348
)
349
```
350
351
### Editor Integration
352
353
Different editors create different temporary files:
354
355
```python
356
# Editor-specific ignores
357
editor_filter = IgnoreFilter(
358
regular=[
359
# Vim
360
"*.swp", "*.swo", "*~",
361
362
# Emacs
363
"#*#", ".#*",
364
365
# VSCode
366
".vscode",
367
368
# JetBrains
369
".idea",
370
371
# Sublime Text
372
"*.sublime-workspace", "*.sublime-project",
373
],
374
regex_based=[
375
# Temporary files with PIDs
376
r".*\.tmp\.\d+$",
377
378
# Lock files
379
r".*\.lock$",
380
]
381
)
382
```