or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

baseline.mdcli.mdconfiguration.mdcore-analysis.mdflake8-plugin.mdindex.mdutility-apis.mdviolations.md

baseline.mddocs/

0

# Baseline Management

1

2

Support for baseline files that enable gradual adoption of pydoclint in large codebases by tracking existing violations and only reporting new ones.

3

4

## Capabilities

5

6

### Baseline Generation

7

8

Functions for creating and managing baseline files that track existing violations to enable gradual adoption.

9

10

```python { .api }

11

def generateBaseline(

12

violationsAllFiles: dict[str, list[Violation]] | dict[str, list[str]],

13

path: Path,

14

) -> None:

15

"""

16

Generate baseline file based on passed violations.

17

18

Creates a baseline file containing all current violations, allowing

19

future runs to only report new violations not present in the baseline.

20

21

Parameters:

22

- violationsAllFiles: Mapping of file paths to their violations

23

- path: Path where baseline file should be written

24

25

The baseline file format:

26

- Each file section starts with the file path

27

- Violations are indented with 4 spaces

28

- File sections are separated by 20 dashes

29

"""

30

31

def parseBaseline(path: Path) -> dict[str, list[str]]:

32

"""

33

Parse existing baseline file.

34

35

Reads and parses a baseline file created by generateBaseline,

36

returning the violations organized by file path.

37

38

Parameters:

39

- path: Path to baseline file to parse

40

41

Returns:

42

dict[str, list[str]]: Mapping of file paths to violation strings

43

44

Raises:

45

FileNotFoundError: If baseline file doesn't exist

46

"""

47

48

def reEvaluateBaseline(

49

baseline: dict[str, list[str]],

50

actualViolationsInAllFiles: dict[str, list[Violation]],

51

) -> tuple[bool, dict[str, list[str]], dict[str, list[Violation]]]:

52

"""

53

Compare current violations against baseline and determine changes.

54

55

Evaluates current violations against the baseline to identify:

56

- Whether baseline regeneration is needed (violations were fixed)

57

- Which baseline violations are still present

58

- Which violations are new (not in baseline)

59

60

Parameters:

61

- baseline: Parsed baseline violations by file

62

- actualViolationsInAllFiles: Current violations found in files

63

64

Returns:

65

tuple containing:

66

- bool: Whether baseline regeneration is needed

67

- dict[str, list[str]]: Unfixed baseline violations still present

68

- dict[str, list[Violation]]: New violations not in baseline

69

"""

70

```

71

72

### Baseline File Format Constants

73

74

Constants defining the baseline file format structure.

75

76

```python { .api }

77

SEPARATOR: str # "--------------------\n" (20 dashes)

78

LEN_INDENT: int # 4 (indentation length)

79

ONE_SPACE: str # " " (single space)

80

INDENT: str # " " (4 spaces for violation indentation)

81

```

82

83

## Usage Examples

84

85

### Basic Baseline Workflow

86

87

```bash

88

# Step 1: Generate initial baseline from current violations

89

pydoclint --generate-baseline --baseline=violations-baseline.txt src/

90

91

# Step 2: Run normally - only new violations reported

92

pydoclint --baseline=violations-baseline.txt src/

93

94

# Step 3: Auto-regenerate baseline when violations are fixed

95

pydoclint --baseline=violations-baseline.txt --auto-regenerate-baseline=True src/

96

```

97

98

### Programmatic Baseline Management

99

100

```python

101

from pathlib import Path

102

from pydoclint.baseline import generateBaseline, parseBaseline, reEvaluateBaseline

103

from pydoclint.main import _checkPaths

104

105

# Check files and generate initial baseline

106

violations = _checkPaths(

107

paths=("src/",),

108

style="numpy"

109

)

110

111

baseline_path = Path("current-violations.txt")

112

generateBaseline(violations, baseline_path)

113

print(f"Generated baseline with {sum(len(v) for v in violations.values())} violations")

114

115

# Later: check against baseline

116

current_violations = _checkPaths(

117

paths=("src/",),

118

style="numpy"

119

)

120

121

# Parse existing baseline

122

baseline = parseBaseline(baseline_path)

123

124

# Compare current violations against baseline

125

needs_regen, unfixed_baseline, new_violations = reEvaluateBaseline(

126

baseline, current_violations

127

)

128

129

if needs_regen:

130

print("Some violations were fixed - baseline needs regeneration")

131

generateBaseline(unfixed_baseline, baseline_path)

132

133

print(f"New violations: {sum(len(v) for v in new_violations.values())}")

134

```

135

136

### Baseline File Format Example

137

138

```text

139

src/module.py

140

15: DOC101: Docstring contains fewer arguments than in function signature.

141

23: DOC201: does not have a return section in docstring

142

45: DOC103: Docstring arguments are different from function arguments.

143

--------------------

144

src/utils.py

145

8: DOC102: Docstring contains more arguments than in function signature.

146

34: DOC105: Argument names match, but type hints in these args do not match: x

147

--------------------

148

```

149

150

### Configuration-Based Baseline

151

152

```toml

153

# pyproject.toml

154

[tool.pydoclint]

155

style = "google"

156

baseline = "pydoclint-violations.txt"

157

auto-regenerate-baseline = true

158

exclude = "tests/|migrations/"

159

```

160

161

```bash

162

# Configuration automatically handles baseline

163

pydoclint src/ # Uses baseline from config

164

165

# Generate new baseline

166

pydoclint --generate-baseline src/

167

```

168

169

### Advanced Baseline Workflows

170

171

#### Gradual Migration Strategy

172

173

```bash

174

# Phase 1: Generate baseline for entire codebase

175

pydoclint --generate-baseline --baseline=phase1-baseline.txt .

176

177

# Phase 2: Fix critical violations, update baseline

178

pydoclint --baseline=phase1-baseline.txt . 2>&1 | grep "DOC1" > critical-violations.txt

179

# Fix DOC1xx violations manually

180

pydoclint --generate-baseline --baseline=phase2-baseline.txt .

181

182

# Phase 3: Continue incremental improvement

183

pydoclint --baseline=phase2-baseline.txt --auto-regenerate-baseline=True .

184

```

185

186

#### Per-Module Baselines

187

188

```bash

189

# Create separate baselines for different modules

190

pydoclint --generate-baseline --baseline=core-baseline.txt src/core/

191

pydoclint --generate-baseline --baseline=utils-baseline.txt src/utils/

192

pydoclint --generate-baseline --baseline=api-baseline.txt src/api/

193

194

# Check modules independently

195

pydoclint --baseline=core-baseline.txt src/core/

196

pydoclint --baseline=utils-baseline.txt src/utils/

197

pydoclint --baseline=api-baseline.txt src/api/

198

```

199

200

#### CI/CD Integration

201

202

```yaml

203

# .github/workflows/docstring-check.yml

204

name: Docstring Check

205

on: [push, pull_request]

206

207

jobs:

208

docstring-lint:

209

runs-on: ubuntu-latest

210

steps:

211

- uses: actions/checkout@v3

212

- name: Set up Python

213

uses: actions/setup-python@v4

214

with:

215

python-version: '3.9'

216

- name: Install pydoclint

217

run: pip install pydoclint

218

- name: Check docstrings against baseline

219

run: |

220

if [ -f pydoclint-baseline.txt ]; then

221

pydoclint --baseline=pydoclint-baseline.txt src/

222

else

223

pydoclint src/

224

fi

225

```

226

227

### Baseline Maintenance

228

229

```python

230

# Script to maintain baseline health

231

from pathlib import Path

232

from pydoclint.baseline import parseBaseline, generateBaseline, reEvaluateBaseline

233

from pydoclint.main import _checkPaths

234

235

def maintain_baseline(baseline_path: Path, source_paths: tuple[str, ...]):

236

"""Maintain baseline by cleaning up fixed violations."""

237

238

# Get current violations

239

current_violations = _checkPaths(source_paths, style="numpy")

240

241

if not baseline_path.exists():

242

print("No baseline exists, generating new one")

243

generateBaseline(current_violations, baseline_path)

244

return

245

246

# Parse existing baseline

247

baseline = parseBaseline(baseline_path)

248

249

# Check if baseline needs update

250

needs_regen, unfixed_baseline, new_violations = reEvaluateBaseline(

251

baseline, current_violations

252

)

253

254

if needs_regen:

255

print(f"Updating baseline - {len(baseline)} -> {len(unfixed_baseline)} files")

256

generateBaseline(unfixed_baseline, baseline_path)

257

258

new_count = sum(len(v) for v in new_violations.values())

259

if new_count > 0:

260

print(f"Found {new_count} new violations not in baseline")

261

return False

262

263

return True

264

265

# Usage

266

success = maintain_baseline(Path("violations.txt"), ("src/",))

267

```