or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-formatting.mdfile-io.mdindex.mdstring-processing.mdsyntax-analysis.md

file-io.mddocs/

0

# File I/O and Encoding

1

2

File encoding detection, line ending handling, and file opening utilities for robust text file processing across different encodings and platforms. This module ensures docformatter can handle files with various encodings and line ending conventions.

3

4

## Capabilities

5

6

### Encoder Class

7

8

The main class for handling file encoding detection and file I/O operations.

9

10

```python { .api }

11

class Encoder:

12

"""Encoding and decoding of files."""

13

14

# Line ending constants

15

CR = "\r" # Carriage return (Mac classic)

16

LF = "\n" # Line feed (Unix/Linux)

17

CRLF = "\r\n" # Carriage return + Line feed (Windows)

18

19

# Default encoding

20

DEFAULT_ENCODING = "latin-1"

21

22

def __init__(self):

23

"""

24

Initialize an Encoder instance.

25

26

Sets up encoding detection with default fallback encoding

27

and system encoding detection.

28

"""

29

30

# Instance attributes after initialization

31

encoding: str # Current detected/set file encoding

32

system_encoding: str # System preferred encoding

33

```

34

35

### Encoding Detection

36

37

Methods for detecting and working with file encodings.

38

39

```python { .api }

40

def do_detect_encoding(self, filename) -> None:

41

"""

42

Detect and set the encoding for a file.

43

44

Uses charset_normalizer library to detect file encoding with high

45

accuracy. Falls back to DEFAULT_ENCODING if detection fails.

46

47

Args:

48

filename (str): Path to file for encoding detection

49

50

Side Effects:

51

Sets self.encoding to detected encoding

52

"""

53

```

54

55

### Line Ending Detection

56

57

Methods for detecting and normalizing line endings.

58

59

```python { .api }

60

def do_find_newline(self, source: List[str]) -> str:

61

"""

62

Determine the predominant newline style in source lines.

63

64

Analyzes line endings to determine whether file uses Unix (LF),

65

Windows (CRLF), or Mac classic (CR) line endings.

66

67

Args:

68

source (List[str]): List of source code lines

69

70

Returns:

71

str: Predominant newline character(s) (LF, CRLF, or CR)

72

"""

73

```

74

75

### File Opening

76

77

Methods for opening files with proper encoding handling.

78

79

```python { .api }

80

def do_open_with_encoding(self, filename, mode: str = "r"):

81

"""

82

Open file with detected encoding.

83

84

Opens file using the encoding detected by do_detect_encoding().

85

Handles encoding errors gracefully.

86

87

Args:

88

filename (str): Path to file to open

89

mode (str): File opening mode (default: "r")

90

91

Returns:

92

File object opened with proper encoding

93

94

Raises:

95

IOError: If file cannot be opened

96

UnicodeDecodeError: If encoding is incorrect

97

"""

98

```

99

100

### Utility Functions

101

102

File discovery and processing utilities.

103

104

```python { .api }

105

def find_py_files(sources, recursive, exclude=None):

106

"""

107

Find Python source files in given sources.

108

109

Generator function that yields Python files (.py extension)

110

from the specified sources, with support for recursive directory

111

traversal and exclusion patterns.

112

113

Args:

114

sources: Iterable of file/directory paths

115

recursive (bool): Whether to search directories recursively

116

exclude (list, optional): Patterns to exclude from search

117

118

Yields:

119

str: Path to each Python file found

120

"""

121

122

def has_correct_length(length_range, start, end):

123

"""

124

Check if docstring is within specified length range.

125

126

Used with --docstring-length option to filter docstrings

127

by their line count.

128

129

Args:

130

length_range (list): [min_length, max_length] or None

131

start (int): Starting line number of docstring

132

end (int): Ending line number of docstring

133

134

Returns:

135

bool: True if within range or no range specified

136

"""

137

138

def is_in_range(line_range, start, end):

139

"""

140

Check if docstring is within specified line range.

141

142

Used with --range option to process only docstrings

143

within specific line numbers.

144

145

Args:

146

line_range (list): [start_line, end_line] or None

147

start (int): Starting line number of docstring

148

end (int): Ending line number of docstring

149

150

Returns:

151

bool: True if in range or no range specified

152

"""

153

```

154

155

## Usage Examples

156

157

### Basic Encoding Detection

158

159

```python

160

from docformatter import Encoder

161

162

# Create encoder instance

163

encoder = Encoder()

164

165

# Detect encoding for a file

166

encoder.do_detect_encoding("example.py")

167

print(f"Detected encoding: {encoder.encoding}")

168

print(f"System encoding: {encoder.system_encoding}")

169

170

# Open file with detected encoding

171

with encoder.do_open_with_encoding("example.py") as f:

172

content = f.read()

173

print(f"File content length: {len(content)}")

174

```

175

176

### Line Ending Detection

177

178

```python

179

from docformatter import Encoder

180

181

# Read file and detect line endings

182

encoder = Encoder()

183

encoder.do_detect_encoding("mixed_endings.py")

184

185

with encoder.do_open_with_encoding("mixed_endings.py") as f:

186

lines = f.readlines()

187

188

# Detect predominant line ending

189

newline_style = encoder.do_find_newline(lines)

190

print(f"Detected line ending: {repr(newline_style)}")

191

192

if newline_style == encoder.LF:

193

print("Unix/Linux line endings")

194

elif newline_style == encoder.CRLF:

195

print("Windows line endings")

196

elif newline_style == encoder.CR:

197

print("Mac classic line endings")

198

```

199

200

### File Processing with Encoding

201

202

```python

203

from docformatter import Encoder

204

205

def process_python_file(filename):

206

"""Process a Python file with proper encoding handling."""

207

encoder = Encoder()

208

209

# Detect encoding

210

try:

211

encoder.do_detect_encoding(filename)

212

print(f"Processing {filename} with encoding: {encoder.encoding}")

213

214

# Read file content

215

with encoder.do_open_with_encoding(filename) as f:

216

lines = f.readlines()

217

218

# Detect line endings

219

newline_style = encoder.do_find_newline(lines)

220

221

# Process content (example: count docstrings)

222

content = ''.join(lines)

223

docstring_count = content.count('"""') + content.count("'''")

224

225

return {

226

'filename': filename,

227

'encoding': encoder.encoding,

228

'line_ending': newline_style,

229

'line_count': len(lines),

230

'docstring_markers': docstring_count

231

}

232

233

except Exception as e:

234

print(f"Error processing {filename}: {e}")

235

return None

236

237

# Example usage

238

result = process_python_file("example.py")

239

if result:

240

print(f"File info: {result}")

241

```

242

243

### Finding Python Files

244

245

```python

246

from docformatter import find_py_files

247

248

# Find all .py files in current directory

249

files = list(find_py_files(["."], recursive=False))

250

print(f"Found {len(files)} Python files")

251

252

# Find files recursively, excluding test directories

253

files = list(find_py_files(

254

["."],

255

recursive=True,

256

exclude=["tests", "__pycache__", ".git"]

257

))

258

print(f"Found {len(files)} Python files (excluding tests)")

259

260

# Process multiple source locations

261

sources = ["src/", "scripts/", "tools/"]

262

for filename in find_py_files(sources, recursive=True):

263

print(f"Processing: {filename}")

264

```

265

266

### Range and Length Filtering

267

268

```python

269

from docformatter import has_correct_length, is_in_range

270

271

# Check docstring length filtering

272

length_range = [5, 20] # Only process docstrings 5-20 lines long

273

start_line = 10

274

end_line = 15

275

276

if has_correct_length(length_range, start_line, end_line):

277

print("Docstring is within length range")

278

279

# Check line range filtering

280

line_range = [1, 100] # Only process docstrings in lines 1-100

281

if is_in_range(line_range, start_line, end_line):

282

print("Docstring is within line range")

283

284

# Example usage in file processing

285

def should_process_docstring(start, end, length_filter=None, line_filter=None):

286

"""Determine if docstring should be processed based on filters."""

287

if length_filter and not has_correct_length(length_filter, start, end):

288

return False

289

if line_filter and not is_in_range(line_filter, start, end):

290

return False

291

return True

292

293

# Test with various docstrings

294

docstrings = [

295

(5, 8), # Lines 5-8 (4 lines)

296

(10, 25), # Lines 10-25 (16 lines)

297

(50, 75), # Lines 50-75 (26 lines)

298

]

299

300

for start, end in docstrings:

301

should_process = should_process_docstring(

302

start, end,

303

length_filter=[3, 20], # 3-20 lines

304

line_filter=[1, 30] # Lines 1-30

305

)

306

print(f"Docstring lines {start}-{end}: {'Process' if should_process else 'Skip'}")

307

```

308

309

### Advanced File Processing

310

311

```python

312

from docformatter import Encoder, find_py_files

313

314

class FileProcessor:

315

def __init__(self):

316

self.encoder = Encoder()

317

self.processed_files = []

318

319

def process_directory(self, directory, recursive=True, exclude=None):

320

"""Process all Python files in directory."""

321

files = find_py_files([directory], recursive, exclude)

322

323

for filename in files:

324

try:

325

result = self.process_file(filename)

326

if result:

327

self.processed_files.append(result)

328

except Exception as e:

329

print(f"Error processing {filename}: {e}")

330

331

return self.processed_files

332

333

def process_file(self, filename):

334

"""Process individual file with encoding detection."""

335

# Detect encoding

336

self.encoder.do_detect_encoding(filename)

337

338

# Read file

339

with self.encoder.do_open_with_encoding(filename) as f:

340

lines = f.readlines()

341

342

# Analyze file

343

newline_style = self.encoder.do_find_newline(lines)

344

345

return {

346

'filename': filename,

347

'encoding': self.encoder.encoding,

348

'line_ending': repr(newline_style),

349

'lines': len(lines),

350

'size': sum(len(line.encode(self.encoder.encoding)) for line in lines)

351

}

352

353

# Usage

354

processor = FileProcessor()

355

results = processor.process_directory(

356

"src/",

357

recursive=True,

358

exclude=["__pycache__", "*.pyc", "tests/"]

359

)

360

361

# Print summary

362

for result in results:

363

print(f"{result['filename']}: {result['encoding']} encoding, "

364

f"{result['lines']} lines, {result['size']} bytes")

365

```

366

367

### Error Handling

368

369

```python

370

from docformatter import Encoder

371

372

def safe_file_processing(filename):

373

"""Process file with comprehensive error handling."""

374

encoder = Encoder()

375

376

try:

377

# Try to detect encoding

378

encoder.do_detect_encoding(filename)

379

print(f"Detected encoding: {encoder.encoding}")

380

381

except FileNotFoundError:

382

print(f"File not found: {filename}")

383

return None

384

385

except PermissionError:

386

print(f"Permission denied: {filename}")

387

return None

388

389

except Exception as e:

390

print(f"Encoding detection failed: {e}")

391

print(f"Using fallback encoding: {encoder.DEFAULT_ENCODING}")

392

encoder.encoding = encoder.DEFAULT_ENCODING

393

394

try:

395

# Try to open and read file

396

with encoder.do_open_with_encoding(filename) as f:

397

content = f.read()

398

399

return {

400

'success': True,

401

'encoding': encoder.encoding,

402

'content_length': len(content)

403

}

404

405

except UnicodeDecodeError as e:

406

print(f"Unicode decode error: {e}")

407

print("File may have mixed encodings or be binary")

408

return None

409

410

except Exception as e:

411

print(f"File reading error: {e}")

412

return None

413

414

# Test with various files

415

test_files = ["example.py", "unicode_file.py", "binary_file.so", "missing.py"]

416

417

for filename in test_files:

418

result = safe_file_processing(filename)

419

if result:

420

print(f"Successfully processed {filename}")

421

else:

422

print(f"Failed to process {filename}")

423

```

424

425

## Integration with Docformatter

426

427

The file I/O and encoding module integrates with other components:

428

429

- **Formatter**: Provides encoding-aware file reading and writing

430

- **Configuration**: Supports file discovery with exclusion patterns

431

- **String Processing**: Ensures proper handling of Unicode content

432

- **Command-Line Interface**: Enables robust batch file processing

433

434

## Platform Considerations

435

436

The module handles platform-specific differences:

437

438

- **Line Endings**: Detects and preserves original line ending style

439

- **Encodings**: Handles Windows-1252, UTF-8, Latin-1, and other encodings

440

- **File Paths**: Works with both Unix and Windows path conventions

441

- **Permissions**: Graceful handling of permission-denied errors

442

- **Unicode**: Full support for international characters and symbols

443

444

## Performance Considerations

445

446

- **Encoding Detection**: Uses fast heuristic-based detection

447

- **File Reading**: Efficient line-by-line processing for large files

448

- **Memory Usage**: Streams large files rather than loading entirely

449

- **Caching**: Reuses encoding detection results within same session