or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-formatting.mdfile-io.mdindex.mdstring-processing.mdsyntax-analysis.md

string-processing.mddocs/

0

# String Processing

1

2

Text manipulation utilities for docstring processing including indentation detection, line normalization, summary formatting, and text splitting operations that form the foundation of docformatter's text processing capabilities.

3

4

## Capabilities

5

6

### Indentation Analysis

7

8

Functions for analyzing and working with text indentation patterns.

9

10

```python { .api }

11

def find_shortest_indentation(lines: List[str]) -> str:

12

"""

13

Determine the shortest indentation in a list of lines.

14

15

Args:

16

lines (List[str]): List of text lines to analyze

17

18

Returns:

19

str: The shortest indentation string found in non-empty lines

20

"""

21

```

22

23

### Line Normalization

24

25

Utilities for normalizing line endings and line content.

26

27

```python { .api }

28

def normalize_line(line: str, newline: str) -> str:

29

"""

30

Return line with fixed ending, if ending was present.

31

32

Args:

33

line (str): The line to normalize

34

newline (str): The newline character to use

35

36

Returns:

37

str: Line with normalized ending

38

"""

39

40

def normalize_line_endings(lines, newline):

41

"""

42

Return text with normalized line endings.

43

44

Args:

45

lines: Text lines to normalize

46

newline: Newline character to use

47

48

Returns:

49

str: Text with consistent line endings

50

"""

51

```

52

53

### Summary Processing

54

55

Functions for processing and formatting docstring summaries.

56

57

```python { .api }

58

def normalize_summary(summary: str, noncap: Optional[List[str]] = None) -> str:

59

"""

60

Return normalized docstring summary.

61

62

Normalizes summary by capitalizing first word (unless in noncap list)

63

and adding period at end if missing.

64

65

Args:

66

summary (str): The summary string to normalize

67

noncap (List[str], optional): Words not to capitalize when first

68

69

Returns:

70

str: Normalized summary with proper capitalization and punctuation

71

"""

72

```

73

74

### Sentence Detection

75

76

Functions for detecting and working with sentence boundaries.

77

78

```python { .api }

79

def is_probably_beginning_of_sentence(line: str) -> Union[Match[str], None, bool]:

80

"""

81

Determine if the line begins a sentence.

82

83

Uses heuristics to detect parameter lists and sentence beginnings

84

by looking for specific patterns and tokens.

85

86

Args:

87

line (str): The line to test

88

89

Returns:

90

bool: True if line probably begins a sentence

91

"""

92

```

93

94

### Text Splitting

95

96

Functions for splitting text into components.

97

98

```python { .api }

99

def split_first_sentence(text):

100

"""

101

Split text into first sentence and remainder.

102

103

Handles common abbreviations and false sentence endings.

104

Recognizes periods, question marks, exclamation marks, and

105

colons at line endings as sentence boundaries.

106

107

Args:

108

text: Text to split

109

110

Returns:

111

tuple: (first_sentence, remaining_text)

112

"""

113

114

def split_summary_and_description(contents):

115

"""

116

Split docstring into summary and description parts.

117

118

Uses empty lines, sentence boundaries, and heuristics to

119

determine where summary ends and description begins.

120

121

Args:

122

contents: Docstring content to split

123

124

Returns:

125

tuple: (summary, description)

126

"""

127

```

128

129

## Usage Examples

130

131

### Indentation Analysis

132

133

```python

134

from docformatter import find_shortest_indentation

135

136

# Analyze indentation in code block

137

lines = [

138

" def function():",

139

" '''Docstring.",

140

" ",

141

" Description here.",

142

" '''",

143

" pass"

144

]

145

146

shortest = find_shortest_indentation(lines)

147

print(f"Shortest indentation: '{shortest}'") # " "

148

```

149

150

### Line Ending Normalization

151

152

```python

153

from docformatter import normalize_line, normalize_line_endings

154

155

# Normalize single line

156

line = "Text with mixed endings\r\n"

157

normalized = normalize_line(line, "\n")

158

print(repr(normalized)) # "Text with mixed endings\n"

159

160

# Normalize multiple lines

161

text_lines = ["Line 1\r\n", "Line 2\r", "Line 3\n"]

162

normalized_text = normalize_line_endings(text_lines, "\n")

163

print(repr(normalized_text)) # "Line 1\nLine 2\nLine 3\n"

164

```

165

166

### Summary Normalization

167

168

```python

169

from docformatter import normalize_summary

170

171

# Basic summary normalization

172

summary = "format docstrings according to pep 257"

173

normalized = normalize_summary(summary)

174

print(normalized) # "Format docstrings according to pep 257."

175

176

# With non-capitalization list

177

summary = "API documentation generator"

178

normalized = normalize_summary(summary, noncap=["API"])

179

print(normalized) # "API documentation generator."

180

181

# Already properly formatted

182

summary = "Process the input data."

183

normalized = normalize_summary(summary)

184

print(normalized) # "Process the input data." (unchanged)

185

```

186

187

### Text Splitting Operations

188

189

```python

190

from docformatter import split_first_sentence, split_summary_and_description

191

192

# Split first sentence

193

text = "This is the first sentence. This is the second sentence."

194

first, rest = split_first_sentence(text)

195

print(f"First: '{first}'") # "This is the first sentence."

196

print(f"Rest: '{rest}'") # " This is the second sentence."

197

198

# Handle abbreviations

199

text = "See e.g. the documentation. More info follows."

200

first, rest = split_first_sentence(text)

201

print(f"First: '{first}'") # "See e.g. the documentation."

202

print(f"Rest: '{rest}'") # " More info follows."

203

204

# Split summary and description

205

docstring = """Process input data.

206

207

This function processes the input data according to

208

the specified parameters and returns the results.

209

210

Args:

211

data: Input data to process

212

"""

213

214

summary, description = split_summary_and_description(docstring)

215

print(f"Summary: '{summary}'")

216

print(f"Description: '{description}'")

217

```

218

219

### Complex Text Processing

220

221

```python

222

from docformatter import (

223

find_shortest_indentation,

224

normalize_summary,

225

split_summary_and_description

226

)

227

228

def process_docstring(docstring_content):

229

"""Process a complete docstring."""

230

# Split into parts

231

summary, description = split_summary_and_description(docstring_content)

232

233

# Normalize summary

234

normalized_summary = normalize_summary(summary)

235

236

# Analyze description indentation if present

237

if description:

238

desc_lines = description.splitlines()

239

base_indent = find_shortest_indentation(desc_lines)

240

print(f"Description base indentation: '{base_indent}'")

241

242

return normalized_summary, description

243

244

# Example usage

245

docstring = """process the data

246

247

This function processes input data and returns

248

processed results.

249

"""

250

251

summary, desc = process_docstring(docstring)

252

print(f"Processed summary: '{summary}'")

253

```

254

255

### Sentence Boundary Detection

256

257

```python

258

from docformatter import is_probably_beginning_of_sentence

259

260

# Test various line types

261

test_lines = [

262

" - Parameter: description", # Bullet list

263

" @param name: description", # Epytext parameter

264

" :param name: description", # Sphinx parameter

265

" Normal sentence text", # Regular text

266

" ) Closing parenthesis", # Special case

267

]

268

269

for line in test_lines:

270

is_beginning = is_probably_beginning_of_sentence(line)

271

print(f"'{line.strip()}' -> {is_beginning}")

272

```

273

274

## Text Processing Patterns

275

276

### Docstring Content Analysis

277

278

```python

279

from docformatter import split_summary_and_description, normalize_summary

280

281

def analyze_docstring(content):

282

"""Analyze docstring structure and content."""

283

summary, description = split_summary_and_description(content)

284

285

print(f"Summary length: {len(summary)}")

286

print(f"Has description: {bool(description.strip())}")

287

288

# Check if summary needs normalization

289

normalized = normalize_summary(summary)

290

needs_normalization = summary != normalized

291

292

return {

293

'summary': summary,

294

'description': description,

295

'normalized_summary': normalized,

296

'needs_normalization': needs_normalization,

297

'has_description': bool(description.strip())

298

}

299

```

300

301

### Indentation Preservation

302

303

```python

304

from docformatter import find_shortest_indentation

305

306

def preserve_relative_indentation(lines):

307

"""Preserve relative indentation while normalizing base level."""

308

base_indent = find_shortest_indentation(lines)

309

base_level = len(base_indent)

310

311

processed_lines = []

312

for line in lines:

313

if line.strip(): # Non-empty line

314

current_indent = len(line) - len(line.lstrip())

315

relative_indent = current_indent - base_level

316

new_line = " " + " " * relative_indent + line.lstrip()

317

processed_lines.append(new_line)

318

else:

319

processed_lines.append(line)

320

321

return processed_lines

322

```

323

324

## Integration with Other Components

325

326

The string processing functions integrate closely with other docformatter components:

327

328

- **Syntax Analysis**: Provides text splitting for field list processing

329

- **Formatter**: Supplies normalization for docstring content

330

- **Encoder**: Works with line ending detection and normalization

331

- **Configuration**: Respects non-capitalization settings

332

333

## Error Handling

334

335

String processing functions handle various edge cases:

336

337

- **Empty Input**: Functions gracefully handle empty strings and lists

338

- **Mixed Line Endings**: Normalization functions handle CR, LF, and CRLF

339

- **Unicode Content**: All functions work with Unicode text

340

- **Malformed Input**: Robust handling of unexpected input patterns

341

- **Whitespace Variations**: Consistent handling of tabs, spaces, and mixed whitespace