0
# String Processing
1
2
Text manipulation utilities for docstring processing including indentation detection, line normalization, summary formatting, and text splitting operations that form the foundation of docformatter's text processing capabilities.
3
4
## Capabilities
5
6
### Indentation Analysis
7
8
Functions for analyzing and working with text indentation patterns.
9
10
```python { .api }
11
def find_shortest_indentation(lines: List[str]) -> str:
12
"""
13
Determine the shortest indentation in a list of lines.
14
15
Args:
16
lines (List[str]): List of text lines to analyze
17
18
Returns:
19
str: The shortest indentation string found in non-empty lines
20
"""
21
```
22
23
### Line Normalization
24
25
Utilities for normalizing line endings and line content.
26
27
```python { .api }
28
def normalize_line(line: str, newline: str) -> str:
29
"""
30
Return line with fixed ending, if ending was present.
31
32
Args:
33
line (str): The line to normalize
34
newline (str): The newline character to use
35
36
Returns:
37
str: Line with normalized ending
38
"""
39
40
def normalize_line_endings(lines, newline):
41
"""
42
Return text with normalized line endings.
43
44
Args:
45
lines: Text lines to normalize
46
newline: Newline character to use
47
48
Returns:
49
str: Text with consistent line endings
50
"""
51
```
52
53
### Summary Processing
54
55
Functions for processing and formatting docstring summaries.
56
57
```python { .api }
58
def normalize_summary(summary: str, noncap: Optional[List[str]] = None) -> str:
59
"""
60
Return normalized docstring summary.
61
62
Normalizes summary by capitalizing first word (unless in noncap list)
63
and adding period at end if missing.
64
65
Args:
66
summary (str): The summary string to normalize
67
noncap (List[str], optional): Words not to capitalize when first
68
69
Returns:
70
str: Normalized summary with proper capitalization and punctuation
71
"""
72
```
73
74
### Sentence Detection
75
76
Functions for detecting and working with sentence boundaries.
77
78
```python { .api }
79
def is_probably_beginning_of_sentence(line: str) -> Union[Match[str], None, bool]:
80
"""
81
Determine if the line begins a sentence.
82
83
Uses heuristics to detect parameter lists and sentence beginnings
84
by looking for specific patterns and tokens.
85
86
Args:
87
line (str): The line to test
88
89
Returns:
90
bool: True if line probably begins a sentence
91
"""
92
```
93
94
### Text Splitting
95
96
Functions for splitting text into components.
97
98
```python { .api }
99
def split_first_sentence(text):
100
"""
101
Split text into first sentence and remainder.
102
103
Handles common abbreviations and false sentence endings.
104
Recognizes periods, question marks, exclamation marks, and
105
colons at line endings as sentence boundaries.
106
107
Args:
108
text: Text to split
109
110
Returns:
111
tuple: (first_sentence, remaining_text)
112
"""
113
114
def split_summary_and_description(contents):
115
"""
116
Split docstring into summary and description parts.
117
118
Uses empty lines, sentence boundaries, and heuristics to
119
determine where summary ends and description begins.
120
121
Args:
122
contents: Docstring content to split
123
124
Returns:
125
tuple: (summary, description)
126
"""
127
```
128
129
## Usage Examples
130
131
### Indentation Analysis
132
133
```python
134
from docformatter import find_shortest_indentation
135
136
# Analyze indentation in code block
137
lines = [
138
" def function():",
139
" '''Docstring.",
140
" ",
141
" Description here.",
142
" '''",
143
" pass"
144
]
145
146
shortest = find_shortest_indentation(lines)
147
print(f"Shortest indentation: '{shortest}'") # " "
148
```
149
150
### Line Ending Normalization
151
152
```python
153
from docformatter import normalize_line, normalize_line_endings
154
155
# Normalize single line
156
line = "Text with mixed endings\r\n"
157
normalized = normalize_line(line, "\n")
158
print(repr(normalized)) # "Text with mixed endings\n"
159
160
# Normalize multiple lines
161
text_lines = ["Line 1\r\n", "Line 2\r", "Line 3\n"]
162
normalized_text = normalize_line_endings(text_lines, "\n")
163
print(repr(normalized_text)) # "Line 1\nLine 2\nLine 3\n"
164
```
165
166
### Summary Normalization
167
168
```python
169
from docformatter import normalize_summary
170
171
# Basic summary normalization
172
summary = "format docstrings according to pep 257"
173
normalized = normalize_summary(summary)
174
print(normalized) # "Format docstrings according to pep 257."
175
176
# With non-capitalization list
177
summary = "API documentation generator"
178
normalized = normalize_summary(summary, noncap=["API"])
179
print(normalized) # "API documentation generator."
180
181
# Already properly formatted
182
summary = "Process the input data."
183
normalized = normalize_summary(summary)
184
print(normalized) # "Process the input data." (unchanged)
185
```
186
187
### Text Splitting Operations
188
189
```python
190
from docformatter import split_first_sentence, split_summary_and_description
191
192
# Split first sentence
193
text = "This is the first sentence. This is the second sentence."
194
first, rest = split_first_sentence(text)
195
print(f"First: '{first}'") # "This is the first sentence."
196
print(f"Rest: '{rest}'") # " This is the second sentence."
197
198
# Handle abbreviations
199
text = "See e.g. the documentation. More info follows."
200
first, rest = split_first_sentence(text)
201
print(f"First: '{first}'") # "See e.g. the documentation."
202
print(f"Rest: '{rest}'") # " More info follows."
203
204
# Split summary and description
205
docstring = """Process input data.
206
207
This function processes the input data according to
208
the specified parameters and returns the results.
209
210
Args:
211
data: Input data to process
212
"""
213
214
summary, description = split_summary_and_description(docstring)
215
print(f"Summary: '{summary}'")
216
print(f"Description: '{description}'")
217
```
218
219
### Complex Text Processing
220
221
```python
222
from docformatter import (
223
find_shortest_indentation,
224
normalize_summary,
225
split_summary_and_description
226
)
227
228
def process_docstring(docstring_content):
229
"""Process a complete docstring."""
230
# Split into parts
231
summary, description = split_summary_and_description(docstring_content)
232
233
# Normalize summary
234
normalized_summary = normalize_summary(summary)
235
236
# Analyze description indentation if present
237
if description:
238
desc_lines = description.splitlines()
239
base_indent = find_shortest_indentation(desc_lines)
240
print(f"Description base indentation: '{base_indent}'")
241
242
return normalized_summary, description
243
244
# Example usage
245
docstring = """process the data
246
247
This function processes input data and returns
248
processed results.
249
"""
250
251
summary, desc = process_docstring(docstring)
252
print(f"Processed summary: '{summary}'")
253
```
254
255
### Sentence Boundary Detection
256
257
```python
258
from docformatter import is_probably_beginning_of_sentence
259
260
# Test various line types
261
test_lines = [
262
" - Parameter: description", # Bullet list
263
" @param name: description", # Epytext parameter
264
" :param name: description", # Sphinx parameter
265
" Normal sentence text", # Regular text
266
" ) Closing parenthesis", # Special case
267
]
268
269
for line in test_lines:
270
is_beginning = is_probably_beginning_of_sentence(line)
271
print(f"'{line.strip()}' -> {is_beginning}")
272
```
273
274
## Text Processing Patterns
275
276
### Docstring Content Analysis
277
278
```python
279
from docformatter import split_summary_and_description, normalize_summary
280
281
def analyze_docstring(content):
282
"""Analyze docstring structure and content."""
283
summary, description = split_summary_and_description(content)
284
285
print(f"Summary length: {len(summary)}")
286
print(f"Has description: {bool(description.strip())}")
287
288
# Check if summary needs normalization
289
normalized = normalize_summary(summary)
290
needs_normalization = summary != normalized
291
292
return {
293
'summary': summary,
294
'description': description,
295
'normalized_summary': normalized,
296
'needs_normalization': needs_normalization,
297
'has_description': bool(description.strip())
298
}
299
```
300
301
### Indentation Preservation
302
303
```python
304
from docformatter import find_shortest_indentation
305
306
def preserve_relative_indentation(lines):
307
"""Preserve relative indentation while normalizing base level."""
308
base_indent = find_shortest_indentation(lines)
309
base_level = len(base_indent)
310
311
processed_lines = []
312
for line in lines:
313
if line.strip(): # Non-empty line
314
current_indent = len(line) - len(line.lstrip())
315
relative_indent = current_indent - base_level
316
new_line = " " + " " * relative_indent + line.lstrip()
317
processed_lines.append(new_line)
318
else:
319
processed_lines.append(line)
320
321
return processed_lines
322
```
323
324
## Integration with Other Components
325
326
The string processing functions integrate closely with other docformatter components:
327
328
- **Syntax Analysis**: Provides text splitting for field list processing
329
- **Formatter**: Supplies normalization for docstring content
330
- **Encoder**: Works with line ending detection and normalization
331
- **Configuration**: Respects non-capitalization settings
332
333
## Error Handling
334
335
String processing functions handle various edge cases:
336
337
- **Empty Input**: Functions gracefully handle empty strings and lists
338
- **Mixed Line Endings**: Normalization functions handle CR, LF, and CRLF
339
- **Unicode Content**: All functions work with Unicode text
340
- **Malformed Input**: Robust handling of unexpected input patterns
341
- **Whitespace Variations**: Consistent handling of tabs, spaces, and mixed whitespace