or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

build.mdcli.mdfiltering.mdindex.mdmiddleware.mdserver.mdutils.md

filtering.mddocs/

0

# File Filtering

1

2

The file filtering system provides flexible file filtering using glob patterns and regular expressions to ignore specific files and directories during file watching. This prevents unnecessary rebuilds when temporary or irrelevant files change.

3

4

## Capabilities

5

6

### IgnoreFilter Class

7

8

The main filtering class that determines whether files should be ignored during watching.

9

10

```python { .api }

11

class IgnoreFilter:

12

def __init__(self, regular, regex_based):

13

"""

14

Initialize filter with glob patterns and regex patterns.

15

16

Parameters:

17

- regular: list[str] - Glob patterns for files/directories to ignore

18

- regex_based: list[str] - Regular expression patterns to ignore

19

20

Processing:

21

- Normalizes all paths to POSIX format with resolved absolute paths

22

- Compiles regex patterns for efficient matching

23

- Removes duplicates while preserving order

24

"""

25

26

def __repr__(self):

27

"""

28

String representation of the filter.

29

30

Returns:

31

- str - Formatted string showing regular and regex patterns

32

"""

33

34

def __call__(self, filename: str, /):

35

"""

36

Determine if a file should be ignored.

37

38

Parameters:

39

- filename: str - File path to check (can be relative or absolute)

40

41

Returns:

42

- bool - True if file should be ignored, False otherwise

43

44

Matching Logic:

45

- Normalizes input path to absolute POSIX format

46

- Tests against all glob patterns using fnmatch and prefix matching

47

- Tests against all compiled regular expressions

48

- Returns True on first match (short-circuit evaluation)

49

"""

50

```

51

52

## Pattern Types

53

54

### Glob Patterns (Regular)

55

56

Standard shell-style glob patterns for file and directory matching:

57

58

```python

59

from sphinx_autobuild.filter import IgnoreFilter

60

61

# Basic glob patterns

62

ignore_filter = IgnoreFilter(

63

regular=[

64

"*.tmp", # All .tmp files

65

"*.log", # All .log files

66

"__pycache__", # __pycache__ directories

67

"node_modules", # node_modules directories

68

".git", # .git directory

69

"*.swp", # Vim swap files

70

"*~", # Backup files

71

],

72

regex_based=[]

73

)

74

```

75

76

**Glob Pattern Features:**

77

- `*` - Matches any number of characters (except path separators)

78

- `?` - Matches single character

79

- `[chars]` - Matches any character in brackets

80

- `**` - Not supported (use regex for recursive matching)

81

82

### Directory Matching

83

84

Glob patterns can match directories by name or path prefix:

85

86

```python

87

# Directory name matching

88

regular_patterns = [

89

".git", # Matches any .git directory

90

"__pycache__", # Matches any __pycache__ directory

91

"node_modules", # Matches any node_modules directory

92

]

93

94

# Path prefix matching (directories)

95

# Files under these directories are automatically ignored

96

regular_patterns = [

97

"/absolute/path/to/ignore", # Ignore entire directory tree

98

"relative/dir", # Ignore relative directory tree

99

]

100

```

101

102

### Regular Expression Patterns

103

104

Advanced pattern matching using Python regular expressions:

105

106

```python

107

from sphinx_autobuild.filter import IgnoreFilter

108

109

# Regex patterns for complex matching

110

ignore_filter = IgnoreFilter(

111

regular=[],

112

regex_based=[

113

r"\.tmp$", # Files ending with .tmp

114

r"\.sw[po]$", # Vim swap files (.swp, .swo)

115

r".*\.backup$", # Files ending with .backup

116

r"^.*/__pycache__/.*$", # Anything in __pycache__ directories

117

r"^.*\.git/.*$", # Anything in .git directories

118

r"/build/temp/.*", # Files in build/temp directories

119

r".*\.(log|tmp|cache)$", # Multiple extensions

120

r"^.*\.(DS_Store|Thumbs\.db)$", # System files

121

]

122

)

123

```

124

125

**Regex Features:**

126

- Full Python regex syntax supported

127

- Case-sensitive matching (use `(?i)` for case-insensitive)

128

- Anchors: `^` (start), `$` (end)

129

- Character classes: `[a-z]`, `\d`, `\w`, etc.

130

- Quantifiers: `*`, `+`, `?`, `{n,m}`

131

132

## Usage Examples

133

134

### Basic Filtering Setup

135

136

```python

137

from sphinx_autobuild.filter import IgnoreFilter

138

139

# Common development file filtering

140

ignore_filter = IgnoreFilter(

141

regular=[

142

".git",

143

"__pycache__",

144

"*.pyc",

145

"*.tmp",

146

".DS_Store",

147

"Thumbs.db",

148

".vscode",

149

".idea",

150

],

151

regex_based=[

152

r".*\.swp$", # Vim swap files

153

r".*~$", # Backup files

154

r".*\.log$", # Log files

155

]

156

)

157

158

# Test the filter

159

print(ignore_filter("main.py")) # False (not ignored)

160

print(ignore_filter("temp.tmp")) # True (ignored by glob)

161

print(ignore_filter("file.swp")) # True (ignored by regex)

162

print(ignore_filter(".git/config")) # True (ignored by directory)

163

```

164

165

### Advanced Pattern Combinations

166

167

```python

168

# Complex filtering for documentation project

169

ignore_filter = IgnoreFilter(

170

regular=[

171

# Build directories

172

"_build",

173

".doctrees",

174

".buildinfo",

175

176

# Version control

177

".git",

178

".svn",

179

".hg",

180

181

# IDE files

182

".vscode",

183

".idea",

184

"*.sublime-*",

185

186

# Python

187

"__pycache__",

188

"*.pyc",

189

"*.pyo",

190

".pytest_cache",

191

".mypy_cache",

192

193

# Node.js

194

"node_modules",

195

".npm",

196

197

# Temporary files

198

"*.tmp",

199

"*.temp",

200

],

201

regex_based=[

202

# Editor backup files

203

r".*~$",

204

r".*\.sw[po]$", # Vim

205

r"#.*#$", # Emacs

206

207

# Log files with timestamps

208

r".*\.log\.\d{4}-\d{2}-\d{2}$",

209

210

# Build artifacts

211

r".*/build/temp/.*",

212

r".*/dist/.*\.egg-info/.*",

213

214

# OS files

215

r".*\.DS_Store$",

216

r".*Thumbs\.db$",

217

218

# Lock files

219

r".*\.lock$",

220

r"package-lock\.json$",

221

]

222

)

223

```

224

225

### Integration with Command Line

226

227

The filter integrates with command-line arguments:

228

229

```python

230

# From command line: --ignore "*.tmp" --re-ignore ".*\.swp$"

231

ignore_patterns = ["*.tmp", "*.log"] # From --ignore

232

regex_patterns = [r".*\.swp$", r".*~$"] # From --re-ignore

233

234

ignore_filter = IgnoreFilter(ignore_patterns, regex_patterns)

235

```

236

237

## Debug Mode

238

239

Enable debug output to see filtering decisions:

240

241

```python

242

import os

243

os.environ["SPHINX_AUTOBUILD_DEBUG"] = "1"

244

245

# Now the filter will print debug info

246

ignore_filter = IgnoreFilter(["*.tmp"], [r".*\.swp$"])

247

ignore_filter("test.tmp") # Prints: SPHINX_AUTOBUILD_DEBUG: '/path/test.tmp' has changed; ignores are ...

248

```

249

250

**Debug Output Format:**

251

```

252

SPHINX_AUTOBUILD_DEBUG: '/absolute/path/to/file.ext' has changed; ignores are IgnoreFilter(regular=['*.tmp'], regex_based=[re.compile('.*\\.swp$')])

253

```

254

255

## Path Normalization

256

257

All paths are normalized before filtering:

258

259

```python

260

from pathlib import Path

261

262

# Input paths (various formats)

263

paths = [

264

"docs/index.rst", # Relative path

265

"/home/user/project/docs/api.rst", # Absolute path

266

Path("docs/modules/core.rst"), # Path object

267

"./docs/getting-started.rst", # Current directory relative

268

"../shared/templates/base.html", # Parent directory relative

269

]

270

271

# All paths are normalized to absolute POSIX format:

272

# /home/user/project/docs/index.rst

273

# /home/user/project/docs/api.rst

274

# /home/user/project/docs/modules/core.rst

275

# /home/user/project/docs/getting-started.rst

276

# /home/user/shared/templates/base.html

277

```

278

279

## Performance Characteristics

280

281

### Efficient Matching

282

283

- **Short-circuit Evaluation**: Returns True on first match

284

- **Compiled Regexes**: Regular expressions are pre-compiled during initialization

285

- **Path Caching**: Normalized paths avoid repeated resolution

286

- **Duplicate Removal**: Patterns are deduplicated during initialization

287

288

### Pattern Ordering

289

290

Patterns are tested in this order:

291

1. **Regular patterns** (glob-style) - typically faster

292

2. **Regex patterns** - more flexible but potentially slower

293

294

For best performance, put most common patterns first in each list.

295

296

### Memory Usage

297

298

- **Pattern Storage**: Minimal memory overhead for pattern storage

299

- **Compiled Regexes**: Small memory cost for compiled regex objects

300

- **No Path Caching**: File paths are not cached (stateless operation)

301

302

## Common Use Cases

303

304

### Documentation Projects

305

306

```python

307

# Typical documentation project ignores

308

doc_filter = IgnoreFilter(

309

regular=[

310

"_build", # Sphinx build directory

311

".doctrees", # Sphinx doctree cache

312

"*.tmp", # Temporary files

313

".git", # Version control

314

],

315

regex_based=[

316

r".*\.sw[po]$", # Editor swap files

317

r".*~$", # Backup files

318

]

319

)

320

```

321

322

### Multi-language Projects

323

324

```python

325

# Mixed Python/JavaScript/Docs project

326

mixed_filter = IgnoreFilter(

327

regular=[

328

# Python

329

"__pycache__", "*.pyc", ".pytest_cache",

330

331

# JavaScript

332

"node_modules", ".npm", "*.min.js",

333

334

# Documentation

335

"_build", ".doctrees",

336

337

# General

338

".git", ".vscode", "*.tmp",

339

],

340

regex_based=[

341

# Build artifacts

342

r".*/dist/.*",

343

r".*/build/.*\.js$",

344

345

# Logs with dates

346

r".*\.log\.\d{4}-\d{2}-\d{2}$",

347

]

348

)

349

```

350

351

### Editor Integration

352

353

Different editors create different temporary files:

354

355

```python

356

# Editor-specific ignores

357

editor_filter = IgnoreFilter(

358

regular=[

359

# Vim

360

"*.swp", "*.swo", "*~",

361

362

# Emacs

363

"#*#", ".#*",

364

365

# VSCode

366

".vscode",

367

368

# JetBrains

369

".idea",

370

371

# Sublime Text

372

"*.sublime-workspace", "*.sublime-project",

373

],

374

regex_based=[

375

# Temporary files with PIDs

376

r".*\.tmp\.\d+$",

377

378

# Lock files

379

r".*\.lock$",

380

]

381

)

382

```