Tessl Tile for pypi/docformatter@1.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md core-formatting.md file-io.md index.md string-processing.md syntax-analysis.md

string-processing.mddocs/

0
# String Processing
1

2
Text manipulation utilities for docstring processing including indentation detection, line normalization, summary formatting, and text splitting operations that form the foundation of docformatter's text processing capabilities.
3

4
## Capabilities
5

6
### Indentation Analysis
7

8
Functions for analyzing and working with text indentation patterns.
9

10
```python { .api }
11
def find_shortest_indentation(lines: List[str]) -> str:
12
    """
13
    Determine the shortest indentation in a list of lines.
14
    
15
    Args:
16
        lines (List[str]): List of text lines to analyze
17
        
18
    Returns:
19
        str: The shortest indentation string found in non-empty lines
20
    """
21
```
22

23
### Line Normalization
24

25
Utilities for normalizing line endings and line content.
26

27
```python { .api }
28
def normalize_line(line: str, newline: str) -> str:
29
    """
30
    Return line with fixed ending, if ending was present.
31
    
32
    Args:
33
        line (str): The line to normalize
34
        newline (str): The newline character to use
35
        
36
    Returns:
37
        str: Line with normalized ending
38
    """
39

40
def normalize_line_endings(lines, newline):
41
    """
42
    Return text with normalized line endings.
43
    
44
    Args:
45
        lines: Text lines to normalize
46
        newline: Newline character to use
47
        
48
    Returns:
49
        str: Text with consistent line endings
50
    """
51
```
52

53
### Summary Processing
54

55
Functions for processing and formatting docstring summaries.
56

57
```python { .api }
58
def normalize_summary(summary: str, noncap: Optional[List[str]] = None) -> str:
59
    """
60
    Return normalized docstring summary.
61
    
62
    Normalizes summary by capitalizing first word (unless in noncap list)
63
    and adding period at end if missing.
64
    
65
    Args:
66
        summary (str): The summary string to normalize
67
        noncap (List[str], optional): Words not to capitalize when first
68
        
69
    Returns:
70
        str: Normalized summary with proper capitalization and punctuation
71
    """
72
```
73

74
### Sentence Detection
75

76
Functions for detecting and working with sentence boundaries.
77

78
```python { .api }
79
def is_probably_beginning_of_sentence(line: str) -> Union[Match[str], None, bool]:
80
    """
81
    Determine if the line begins a sentence.
82
    
83
    Uses heuristics to detect parameter lists and sentence beginnings
84
    by looking for specific patterns and tokens.
85
    
86
    Args:
87
        line (str): The line to test
88
        
89
    Returns:
90
        bool: True if line probably begins a sentence
91
    """
92
```
93

94
### Text Splitting
95

96
Functions for splitting text into components.
97

98
```python { .api }
99
def split_first_sentence(text):
100
    """
101
    Split text into first sentence and remainder.
102
    
103
    Handles common abbreviations and false sentence endings.
104
    Recognizes periods, question marks, exclamation marks, and
105
    colons at line endings as sentence boundaries.
106
    
107
    Args:
108
        text: Text to split
109
        
110
    Returns:
111
        tuple: (first_sentence, remaining_text)
112
    """
113

114
def split_summary_and_description(contents):
115
    """
116
    Split docstring into summary and description parts.
117
    
118
    Uses empty lines, sentence boundaries, and heuristics to
119
    determine where summary ends and description begins.
120
    
121
    Args:
122
        contents: Docstring content to split
123
        
124
    Returns:
125
        tuple: (summary, description)
126
    """
127
```
128

129
## Usage Examples
130

131
### Indentation Analysis
132

133
```python
134
from docformatter import find_shortest_indentation
135

136
# Analyze indentation in code block
137
lines = [
138
    "    def function():",
139
    "        '''Docstring.",
140
    "        ",
141
    "        Description here.",
142
    "        '''",
143
    "        pass"
144
]
145

146
shortest = find_shortest_indentation(lines)
147
print(f"Shortest indentation: '{shortest}'")  # "    "
148
```
149

150
### Line Ending Normalization
151

152
```python
153
from docformatter import normalize_line, normalize_line_endings
154

155
# Normalize single line
156
line = "Text with mixed endings\r\n"
157
normalized = normalize_line(line, "\n")
158
print(repr(normalized))  # "Text with mixed endings\n"
159

160
# Normalize multiple lines
161
text_lines = ["Line 1\r\n", "Line 2\r", "Line 3\n"]
162
normalized_text = normalize_line_endings(text_lines, "\n")
163
print(repr(normalized_text))  # "Line 1\nLine 2\nLine 3\n"
164
```
165

166
### Summary Normalization
167

168
```python
169
from docformatter import normalize_summary
170

171
# Basic summary normalization
172
summary = "format docstrings according to pep 257"
173
normalized = normalize_summary(summary)
174
print(normalized)  # "Format docstrings according to pep 257."
175

176
# With non-capitalization list
177
summary = "API documentation generator"
178
normalized = normalize_summary(summary, noncap=["API"])
179
print(normalized)  # "API documentation generator."
180

181
# Already properly formatted
182
summary = "Process the input data."
183
normalized = normalize_summary(summary)
184
print(normalized)  # "Process the input data." (unchanged)
185
```
186

187
### Text Splitting Operations
188

189
```python
190
from docformatter import split_first_sentence, split_summary_and_description
191

192
# Split first sentence
193
text = "This is the first sentence. This is the second sentence."
194
first, rest = split_first_sentence(text)
195
print(f"First: '{first}'")   # "This is the first sentence."
196
print(f"Rest: '{rest}'")     # " This is the second sentence."
197

198
# Handle abbreviations
199
text = "See e.g. the documentation. More info follows."
200
first, rest = split_first_sentence(text)
201
print(f"First: '{first}'")   # "See e.g. the documentation."
202
print(f"Rest: '{rest}'")     # " More info follows."
203

204
# Split summary and description
205
docstring = """Process input data.
206

207
This function processes the input data according to
208
the specified parameters and returns the results.
209

210
Args:
211
    data: Input data to process
212
"""
213

214
summary, description = split_summary_and_description(docstring)
215
print(f"Summary: '{summary}'")
216
print(f"Description: '{description}'")
217
```
218

219
### Complex Text Processing
220

221
```python
222
from docformatter import (
223
    find_shortest_indentation,
224
    normalize_summary,
225
    split_summary_and_description
226
)
227

228
def process_docstring(docstring_content):
229
    """Process a complete docstring."""
230
    # Split into parts
231
    summary, description = split_summary_and_description(docstring_content)
232
    
233
    # Normalize summary
234
    normalized_summary = normalize_summary(summary)
235
    
236
    # Analyze description indentation if present
237
    if description:
238
        desc_lines = description.splitlines()
239
        base_indent = find_shortest_indentation(desc_lines)
240
        print(f"Description base indentation: '{base_indent}'")
241
    
242
    return normalized_summary, description
243

244
# Example usage
245
docstring = """process the data
246
    
247
    This function processes input data and returns
248
    processed results.
249
    """
250

251
summary, desc = process_docstring(docstring)
252
print(f"Processed summary: '{summary}'")
253
```
254

255
### Sentence Boundary Detection
256

257
```python
258
from docformatter import is_probably_beginning_of_sentence
259

260
# Test various line types
261
test_lines = [
262
    "    - Parameter: description",  # Bullet list
263
    "    @param name: description",  # Epytext parameter
264
    "    :param name: description",  # Sphinx parameter
265
    "    Normal sentence text",      # Regular text
266
    "    ) Closing parenthesis",     # Special case
267
]
268

269
for line in test_lines:
270
    is_beginning = is_probably_beginning_of_sentence(line)
271
    print(f"'{line.strip()}' -> {is_beginning}")
272
```
273

274
## Text Processing Patterns
275

276
### Docstring Content Analysis
277

278
```python
279
from docformatter import split_summary_and_description, normalize_summary
280

281
def analyze_docstring(content):
282
    """Analyze docstring structure and content."""
283
    summary, description = split_summary_and_description(content)
284
    
285
    print(f"Summary length: {len(summary)}")
286
    print(f"Has description: {bool(description.strip())}")
287
    
288
    # Check if summary needs normalization
289
    normalized = normalize_summary(summary)
290
    needs_normalization = summary != normalized
291
    
292
    return {
293
        'summary': summary,
294
        'description': description,
295
        'normalized_summary': normalized,
296
        'needs_normalization': needs_normalization,
297
        'has_description': bool(description.strip())
298
    }
299
```
300

301
### Indentation Preservation
302

303
```python
304
from docformatter import find_shortest_indentation
305

306
def preserve_relative_indentation(lines):
307
    """Preserve relative indentation while normalizing base level."""
308
    base_indent = find_shortest_indentation(lines)
309
    base_level = len(base_indent)
310
    
311
    processed_lines = []
312
    for line in lines:
313
        if line.strip():  # Non-empty line
314
            current_indent = len(line) - len(line.lstrip())
315
            relative_indent = current_indent - base_level
316
            new_line = "    " + " " * relative_indent + line.lstrip()
317
            processed_lines.append(new_line)
318
        else:
319
            processed_lines.append(line)
320
    
321
    return processed_lines
322
```
323

324
## Integration with Other Components
325

326
The string processing functions integrate closely with other docformatter components:
327

328
- **Syntax Analysis**: Provides text splitting for field list processing
329
- **Formatter**: Supplies normalization for docstring content
330
- **Encoder**: Works with line ending detection and normalization
331
- **Configuration**: Respects non-capitalization settings
332

333
## Error Handling
334

335
String processing functions handle various edge cases:
336

337
- **Empty Input**: Functions gracefully handle empty strings and lists
338
- **Mixed Line Endings**: Normalization functions handle CR, LF, and CRLF
339
- **Unicode Content**: All functions work with Unicode text
340
- **Malformed Input**: Robust handling of unexpected input patterns
341
- **Whitespace Variations**: Consistent handling of tabs, spaces, and mixed whitespace

Version

Tile

Files

string-processing.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

string-processing.mddocs/