0
# Baseline Management
1
2
Support for baseline files that enable gradual adoption of pydoclint in large codebases by tracking existing violations and only reporting new ones.
3
4
## Capabilities
5
6
### Baseline Generation
7
8
Functions for creating and managing baseline files that track existing violations to enable gradual adoption.
9
10
```python { .api }
11
def generateBaseline(
12
violationsAllFiles: dict[str, list[Violation]] | dict[str, list[str]],
13
path: Path,
14
) -> None:
15
"""
16
Generate baseline file based on passed violations.
17
18
Creates a baseline file containing all current violations, allowing
19
future runs to only report new violations not present in the baseline.
20
21
Parameters:
22
- violationsAllFiles: Mapping of file paths to their violations
23
- path: Path where baseline file should be written
24
25
The baseline file format:
26
- Each file section starts with the file path
27
- Violations are indented with 4 spaces
28
- File sections are separated by 20 dashes
29
"""
30
31
def parseBaseline(path: Path) -> dict[str, list[str]]:
32
"""
33
Parse existing baseline file.
34
35
Reads and parses a baseline file created by generateBaseline,
36
returning the violations organized by file path.
37
38
Parameters:
39
- path: Path to baseline file to parse
40
41
Returns:
42
dict[str, list[str]]: Mapping of file paths to violation strings
43
44
Raises:
45
FileNotFoundError: If baseline file doesn't exist
46
"""
47
48
def reEvaluateBaseline(
49
baseline: dict[str, list[str]],
50
actualViolationsInAllFiles: dict[str, list[Violation]],
51
) -> tuple[bool, dict[str, list[str]], dict[str, list[Violation]]]:
52
"""
53
Compare current violations against baseline and determine changes.
54
55
Evaluates current violations against the baseline to identify:
56
- Whether baseline regeneration is needed (violations were fixed)
57
- Which baseline violations are still present
58
- Which violations are new (not in baseline)
59
60
Parameters:
61
- baseline: Parsed baseline violations by file
62
- actualViolationsInAllFiles: Current violations found in files
63
64
Returns:
65
tuple containing:
66
- bool: Whether baseline regeneration is needed
67
- dict[str, list[str]]: Unfixed baseline violations still present
68
- dict[str, list[Violation]]: New violations not in baseline
69
"""
70
```
71
72
### Baseline File Format Constants
73
74
Constants defining the baseline file format structure.
75
76
```python { .api }
77
SEPARATOR: str # "--------------------\n" (20 dashes)
78
LEN_INDENT: int # 4 (indentation length)
79
ONE_SPACE: str # " " (single space)
80
INDENT: str # " " (4 spaces for violation indentation)
81
```
82
83
## Usage Examples
84
85
### Basic Baseline Workflow
86
87
```bash
88
# Step 1: Generate initial baseline from current violations
89
pydoclint --generate-baseline --baseline=violations-baseline.txt src/
90
91
# Step 2: Run normally - only new violations reported
92
pydoclint --baseline=violations-baseline.txt src/
93
94
# Step 3: Auto-regenerate baseline when violations are fixed
95
pydoclint --baseline=violations-baseline.txt --auto-regenerate-baseline=True src/
96
```
97
98
### Programmatic Baseline Management
99
100
```python
101
from pathlib import Path
102
from pydoclint.baseline import generateBaseline, parseBaseline, reEvaluateBaseline
103
from pydoclint.main import _checkPaths
104
105
# Check files and generate initial baseline
106
violations = _checkPaths(
107
paths=("src/",),
108
style="numpy"
109
)
110
111
baseline_path = Path("current-violations.txt")
112
generateBaseline(violations, baseline_path)
113
print(f"Generated baseline with {sum(len(v) for v in violations.values())} violations")
114
115
# Later: check against baseline
116
current_violations = _checkPaths(
117
paths=("src/",),
118
style="numpy"
119
)
120
121
# Parse existing baseline
122
baseline = parseBaseline(baseline_path)
123
124
# Compare current violations against baseline
125
needs_regen, unfixed_baseline, new_violations = reEvaluateBaseline(
126
baseline, current_violations
127
)
128
129
if needs_regen:
130
print("Some violations were fixed - baseline needs regeneration")
131
generateBaseline(unfixed_baseline, baseline_path)
132
133
print(f"New violations: {sum(len(v) for v in new_violations.values())}")
134
```
135
136
### Baseline File Format Example
137
138
```text
139
src/module.py
140
15: DOC101: Docstring contains fewer arguments than in function signature.
141
23: DOC201: does not have a return section in docstring
142
45: DOC103: Docstring arguments are different from function arguments.
143
--------------------
144
src/utils.py
145
8: DOC102: Docstring contains more arguments than in function signature.
146
34: DOC105: Argument names match, but type hints in these args do not match: x
147
--------------------
148
```
149
150
### Configuration-Based Baseline
151
152
```toml
153
# pyproject.toml
154
[tool.pydoclint]
155
style = "google"
156
baseline = "pydoclint-violations.txt"
157
auto-regenerate-baseline = true
158
exclude = "tests/|migrations/"
159
```
160
161
```bash
162
# Configuration automatically handles baseline
163
pydoclint src/ # Uses baseline from config
164
165
# Generate new baseline
166
pydoclint --generate-baseline src/
167
```
168
169
### Advanced Baseline Workflows
170
171
#### Gradual Migration Strategy
172
173
```bash
174
# Phase 1: Generate baseline for entire codebase
175
pydoclint --generate-baseline --baseline=phase1-baseline.txt .
176
177
# Phase 2: Fix critical violations, update baseline
178
pydoclint --baseline=phase1-baseline.txt . 2>&1 | grep "DOC1" > critical-violations.txt
179
# Fix DOC1xx violations manually
180
pydoclint --generate-baseline --baseline=phase2-baseline.txt .
181
182
# Phase 3: Continue incremental improvement
183
pydoclint --baseline=phase2-baseline.txt --auto-regenerate-baseline=True .
184
```
185
186
#### Per-Module Baselines
187
188
```bash
189
# Create separate baselines for different modules
190
pydoclint --generate-baseline --baseline=core-baseline.txt src/core/
191
pydoclint --generate-baseline --baseline=utils-baseline.txt src/utils/
192
pydoclint --generate-baseline --baseline=api-baseline.txt src/api/
193
194
# Check modules independently
195
pydoclint --baseline=core-baseline.txt src/core/
196
pydoclint --baseline=utils-baseline.txt src/utils/
197
pydoclint --baseline=api-baseline.txt src/api/
198
```
199
200
#### CI/CD Integration
201
202
```yaml
203
# .github/workflows/docstring-check.yml
204
name: Docstring Check
205
on: [push, pull_request]
206
207
jobs:
208
docstring-lint:
209
runs-on: ubuntu-latest
210
steps:
211
- uses: actions/checkout@v3
212
- name: Set up Python
213
uses: actions/setup-python@v4
214
with:
215
python-version: '3.9'
216
- name: Install pydoclint
217
run: pip install pydoclint
218
- name: Check docstrings against baseline
219
run: |
220
if [ -f pydoclint-baseline.txt ]; then
221
pydoclint --baseline=pydoclint-baseline.txt src/
222
else
223
pydoclint src/
224
fi
225
```
226
227
### Baseline Maintenance
228
229
```python
230
# Script to maintain baseline health
231
from pathlib import Path
232
from pydoclint.baseline import parseBaseline, generateBaseline, reEvaluateBaseline
233
from pydoclint.main import _checkPaths
234
235
def maintain_baseline(baseline_path: Path, source_paths: tuple[str, ...]):
236
"""Maintain baseline by cleaning up fixed violations."""
237
238
# Get current violations
239
current_violations = _checkPaths(source_paths, style="numpy")
240
241
if not baseline_path.exists():
242
print("No baseline exists, generating new one")
243
generateBaseline(current_violations, baseline_path)
244
return
245
246
# Parse existing baseline
247
baseline = parseBaseline(baseline_path)
248
249
# Check if baseline needs update
250
needs_regen, unfixed_baseline, new_violations = reEvaluateBaseline(
251
baseline, current_violations
252
)
253
254
if needs_regen:
255
print(f"Updating baseline - {len(baseline)} -> {len(unfixed_baseline)} files")
256
generateBaseline(unfixed_baseline, baseline_path)
257
258
new_count = sum(len(v) for v in new_violations.values())
259
if new_count > 0:
260
print(f"Found {new_count} new violations not in baseline")
261
return False
262
263
return True
264
265
# Usage
266
success = maintain_baseline(Path("violations.txt"), ("src/",))
267
```