Tessl Tile for pypi/docformatter@1.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md core-formatting.md file-io.md index.md string-processing.md syntax-analysis.md

file-io.mddocs/

0
# File I/O and Encoding
1

2
File encoding detection, line ending handling, and file opening utilities for robust text file processing across different encodings and platforms. This module ensures docformatter can handle files with various encodings and line ending conventions.
3

4
## Capabilities
5

6
### Encoder Class
7

8
The main class for handling file encoding detection and file I/O operations.
9

10
```python { .api }
11
class Encoder:
12
    """Encoding and decoding of files."""
13
    
14
    # Line ending constants
15
    CR = "\r"           # Carriage return (Mac classic)
16
    LF = "\n"           # Line feed (Unix/Linux)
17
    CRLF = "\r\n"       # Carriage return + Line feed (Windows)
18
    
19
    # Default encoding
20
    DEFAULT_ENCODING = "latin-1"
21
    
22
    def __init__(self):
23
        """
24
        Initialize an Encoder instance.
25
        
26
        Sets up encoding detection with default fallback encoding
27
        and system encoding detection.
28
        """
29
    
30
    # Instance attributes after initialization
31
    encoding: str           # Current detected/set file encoding
32
    system_encoding: str    # System preferred encoding
33
```
34

35
### Encoding Detection
36

37
Methods for detecting and working with file encodings.
38

39
```python { .api }
40
def do_detect_encoding(self, filename) -> None:
41
    """
42
    Detect and set the encoding for a file.
43
    
44
    Uses charset_normalizer library to detect file encoding with high
45
    accuracy. Falls back to DEFAULT_ENCODING if detection fails.
46
    
47
    Args:
48
        filename (str): Path to file for encoding detection
49
        
50
    Side Effects:
51
        Sets self.encoding to detected encoding
52
    """
53
```
54

55
### Line Ending Detection
56

57
Methods for detecting and normalizing line endings.
58

59
```python { .api }
60
def do_find_newline(self, source: List[str]) -> str:
61
    """
62
    Determine the predominant newline style in source lines.
63
    
64
    Analyzes line endings to determine whether file uses Unix (LF),
65
    Windows (CRLF), or Mac classic (CR) line endings.
66
    
67
    Args:
68
        source (List[str]): List of source code lines
69
        
70
    Returns:
71
        str: Predominant newline character(s) (LF, CRLF, or CR)
72
    """
73
```
74

75
### File Opening
76

77
Methods for opening files with proper encoding handling.
78

79
```python { .api }
80
def do_open_with_encoding(self, filename, mode: str = "r"):
81
    """
82
    Open file with detected encoding.
83
    
84
    Opens file using the encoding detected by do_detect_encoding().
85
    Handles encoding errors gracefully.
86
    
87
    Args:
88
        filename (str): Path to file to open
89
        mode (str): File opening mode (default: "r")
90
        
91
    Returns:
92
        File object opened with proper encoding
93
        
94
    Raises:
95
        IOError: If file cannot be opened
96
        UnicodeDecodeError: If encoding is incorrect
97
    """
98
```
99

100
### Utility Functions
101

102
File discovery and processing utilities.
103

104
```python { .api }
105
def find_py_files(sources, recursive, exclude=None):
106
    """
107
    Find Python source files in given sources.
108
    
109
    Generator function that yields Python files (.py extension)
110
    from the specified sources, with support for recursive directory
111
    traversal and exclusion patterns.
112
    
113
    Args:
114
        sources: Iterable of file/directory paths
115
        recursive (bool): Whether to search directories recursively
116
        exclude (list, optional): Patterns to exclude from search
117
        
118
    Yields:
119
        str: Path to each Python file found
120
    """
121

122
def has_correct_length(length_range, start, end):
123
    """
124
    Check if docstring is within specified length range.
125
    
126
    Used with --docstring-length option to filter docstrings
127
    by their line count.
128
    
129
    Args:
130
        length_range (list): [min_length, max_length] or None
131
        start (int): Starting line number of docstring
132
        end (int): Ending line number of docstring
133
        
134
    Returns:
135
        bool: True if within range or no range specified
136
    """
137

138
def is_in_range(line_range, start, end):
139
    """
140
    Check if docstring is within specified line range.
141
    
142
    Used with --range option to process only docstrings
143
    within specific line numbers.
144
    
145
    Args:
146
        line_range (list): [start_line, end_line] or None
147
        start (int): Starting line number of docstring
148
        end (int): Ending line number of docstring
149
        
150
    Returns:
151
        bool: True if in range or no range specified
152
    """
153
```
154

155
## Usage Examples
156

157
### Basic Encoding Detection
158

159
```python
160
from docformatter import Encoder
161

162
# Create encoder instance
163
encoder = Encoder()
164

165
# Detect encoding for a file
166
encoder.do_detect_encoding("example.py")
167
print(f"Detected encoding: {encoder.encoding}")
168
print(f"System encoding: {encoder.system_encoding}")
169

170
# Open file with detected encoding
171
with encoder.do_open_with_encoding("example.py") as f:
172
    content = f.read()
173
    print(f"File content length: {len(content)}")
174
```
175

176
### Line Ending Detection
177

178
```python
179
from docformatter import Encoder
180

181
# Read file and detect line endings
182
encoder = Encoder()
183
encoder.do_detect_encoding("mixed_endings.py")
184

185
with encoder.do_open_with_encoding("mixed_endings.py") as f:
186
    lines = f.readlines()
187

188
# Detect predominant line ending
189
newline_style = encoder.do_find_newline(lines)
190
print(f"Detected line ending: {repr(newline_style)}")
191

192
if newline_style == encoder.LF:
193
    print("Unix/Linux line endings")
194
elif newline_style == encoder.CRLF:
195
    print("Windows line endings")
196
elif newline_style == encoder.CR:
197
    print("Mac classic line endings")
198
```
199

200
### File Processing with Encoding
201

202
```python
203
from docformatter import Encoder
204

205
def process_python_file(filename):
206
    """Process a Python file with proper encoding handling."""
207
    encoder = Encoder()
208
    
209
    # Detect encoding
210
    try:
211
        encoder.do_detect_encoding(filename)
212
        print(f"Processing {filename} with encoding: {encoder.encoding}")
213
        
214
        # Read file content
215
        with encoder.do_open_with_encoding(filename) as f:
216
            lines = f.readlines()
217
        
218
        # Detect line endings
219
        newline_style = encoder.do_find_newline(lines)
220
        
221
        # Process content (example: count docstrings)
222
        content = ''.join(lines)
223
        docstring_count = content.count('"""') + content.count("'''")
224
        
225
        return {
226
            'filename': filename,
227
            'encoding': encoder.encoding,
228
            'line_ending': newline_style,
229
            'line_count': len(lines),
230
            'docstring_markers': docstring_count
231
        }
232
        
233
    except Exception as e:
234
        print(f"Error processing {filename}: {e}")
235
        return None
236

237
# Example usage
238
result = process_python_file("example.py")
239
if result:
240
    print(f"File info: {result}")
241
```
242

243
### Finding Python Files
244

245
```python
246
from docformatter import find_py_files
247

248
# Find all .py files in current directory
249
files = list(find_py_files(["."], recursive=False))
250
print(f"Found {len(files)} Python files")
251

252
# Find files recursively, excluding test directories
253
files = list(find_py_files(
254
    ["."], 
255
    recursive=True, 
256
    exclude=["tests", "__pycache__", ".git"]
257
))
258
print(f"Found {len(files)} Python files (excluding tests)")
259

260
# Process multiple source locations
261
sources = ["src/", "scripts/", "tools/"]
262
for filename in find_py_files(sources, recursive=True):
263
    print(f"Processing: {filename}")
264
```
265

266
### Range and Length Filtering
267

268
```python
269
from docformatter import has_correct_length, is_in_range
270

271
# Check docstring length filtering
272
length_range = [5, 20]  # Only process docstrings 5-20 lines long
273
start_line = 10
274
end_line = 15
275

276
if has_correct_length(length_range, start_line, end_line):
277
    print("Docstring is within length range")
278

279
# Check line range filtering  
280
line_range = [1, 100]  # Only process docstrings in lines 1-100
281
if is_in_range(line_range, start_line, end_line):
282
    print("Docstring is within line range")
283

284
# Example usage in file processing
285
def should_process_docstring(start, end, length_filter=None, line_filter=None):
286
    """Determine if docstring should be processed based on filters."""
287
    if length_filter and not has_correct_length(length_filter, start, end):
288
        return False
289
    if line_filter and not is_in_range(line_filter, start, end):
290
        return False
291
    return True
292

293
# Test with various docstrings
294
docstrings = [
295
    (5, 8),    # Lines 5-8 (4 lines)
296
    (10, 25),  # Lines 10-25 (16 lines)  
297
    (50, 75),  # Lines 50-75 (26 lines)
298
]
299

300
for start, end in docstrings:
301
    should_process = should_process_docstring(
302
        start, end,
303
        length_filter=[3, 20],  # 3-20 lines
304
        line_filter=[1, 30]     # Lines 1-30
305
    )
306
    print(f"Docstring lines {start}-{end}: {'Process' if should_process else 'Skip'}")
307
```
308

309
### Advanced File Processing
310

311
```python
312
from docformatter import Encoder, find_py_files
313

314
class FileProcessor:
315
    def __init__(self):
316
        self.encoder = Encoder()
317
        self.processed_files = []
318
        
319
    def process_directory(self, directory, recursive=True, exclude=None):
320
        """Process all Python files in directory."""
321
        files = find_py_files([directory], recursive, exclude)
322
        
323
        for filename in files:
324
            try:
325
                result = self.process_file(filename)
326
                if result:
327
                    self.processed_files.append(result)
328
            except Exception as e:
329
                print(f"Error processing {filename}: {e}")
330
        
331
        return self.processed_files
332
    
333
    def process_file(self, filename):
334
        """Process individual file with encoding detection."""
335
        # Detect encoding
336
        self.encoder.do_detect_encoding(filename)
337
        
338
        # Read file
339
        with self.encoder.do_open_with_encoding(filename) as f:
340
            lines = f.readlines()
341
        
342
        # Analyze file
343
        newline_style = self.encoder.do_find_newline(lines)
344
        
345
        return {
346
            'filename': filename,
347
            'encoding': self.encoder.encoding,
348
            'line_ending': repr(newline_style),
349
            'lines': len(lines),
350
            'size': sum(len(line.encode(self.encoder.encoding)) for line in lines)
351
        }
352

353
# Usage
354
processor = FileProcessor()
355
results = processor.process_directory(
356
    "src/",
357
    recursive=True,
358
    exclude=["__pycache__", "*.pyc", "tests/"]
359
)
360

361
# Print summary
362
for result in results:
363
    print(f"{result['filename']}: {result['encoding']} encoding, "
364
          f"{result['lines']} lines, {result['size']} bytes")
365
```
366

367
### Error Handling
368

369
```python
370
from docformatter import Encoder
371

372
def safe_file_processing(filename):
373
    """Process file with comprehensive error handling."""
374
    encoder = Encoder()
375
    
376
    try:
377
        # Try to detect encoding
378
        encoder.do_detect_encoding(filename)
379
        print(f"Detected encoding: {encoder.encoding}")
380
        
381
    except FileNotFoundError:
382
        print(f"File not found: {filename}")
383
        return None
384
        
385
    except PermissionError:
386
        print(f"Permission denied: {filename}")
387
        return None
388
        
389
    except Exception as e:
390
        print(f"Encoding detection failed: {e}")
391
        print(f"Using fallback encoding: {encoder.DEFAULT_ENCODING}")
392
        encoder.encoding = encoder.DEFAULT_ENCODING
393
    
394
    try:
395
        # Try to open and read file
396
        with encoder.do_open_with_encoding(filename) as f:
397
            content = f.read()
398
            
399
        return {
400
            'success': True,
401
            'encoding': encoder.encoding,
402
            'content_length': len(content)
403
        }
404
        
405
    except UnicodeDecodeError as e:
406
        print(f"Unicode decode error: {e}")
407
        print("File may have mixed encodings or be binary")
408
        return None
409
        
410
    except Exception as e:
411
        print(f"File reading error: {e}")
412
        return None
413

414
# Test with various files
415
test_files = ["example.py", "unicode_file.py", "binary_file.so", "missing.py"]
416

417
for filename in test_files:
418
    result = safe_file_processing(filename)
419
    if result:
420
        print(f"Successfully processed {filename}")
421
    else:
422
        print(f"Failed to process {filename}")
423
```
424

425
## Integration with Docformatter
426

427
The file I/O and encoding module integrates with other components:
428

429
- **Formatter**: Provides encoding-aware file reading and writing
430
- **Configuration**: Supports file discovery with exclusion patterns
431
- **String Processing**: Ensures proper handling of Unicode content
432
- **Command-Line Interface**: Enables robust batch file processing
433

434
## Platform Considerations
435

436
The module handles platform-specific differences:
437

438
- **Line Endings**: Detects and preserves original line ending style
439
- **Encodings**: Handles Windows-1252, UTF-8, Latin-1, and other encodings
440
- **File Paths**: Works with both Unix and Windows path conventions
441
- **Permissions**: Graceful handling of permission-denied errors
442
- **Unicode**: Full support for international characters and symbols
443

444
## Performance Considerations
445

446
- **Encoding Detection**: Uses fast heuristic-based detection
447
- **File Reading**: Efficient line-by-line processing for large files
448
- **Memory Usage**: Streams large files rather than loading entirely
449
- **Caching**: Reuses encoding detection results within same session

Version

Tile

Files

file-io.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

file-io.mddocs/