0
# CLI Interface
1
2
Command-line interface and programmatic CLI functions for charset detection and file normalization. Provides both shell command capabilities and importable Python functions for CLI operations.
3
4
## Capabilities
5
6
### Command-Line Detection
7
8
Primary CLI detection function that processes files and outputs structured results in JSON format.
9
10
```python { .api }
11
def cli_detect(
12
paths: list[str],
13
alternatives: bool = False,
14
normalize: bool = False,
15
minimal: bool = False,
16
replace: bool = False,
17
force: bool = False,
18
threshold: float = 0.2,
19
verbose: bool = False
20
) -> None:
21
"""
22
CLI detection function for processing multiple files.
23
24
Parameters:
25
- paths: List of file paths to analyze
26
- alternatives: Output complementary possibilities if any (JSON list format)
27
- normalize: Permit normalization of input files
28
- minimal: Only output charset to STDOUT, disabling JSON output
29
- replace: Replace files when normalizing instead of creating new ones
30
- force: Replace files without asking for confirmation
31
- threshold: Custom maximum chaos allowed in decoded content (0.0-1.0)
32
- verbose: Display complementary information and detection logs
33
34
Returns:
35
None (outputs to stdout)
36
37
Note: This function handles multiple files and outputs JSON results to stdout
38
"""
39
```
40
41
**Usage Example:**
42
43
```python
44
from charset_normalizer.cli import cli_detect
45
46
# Analyze single file
47
cli_detect(['document.txt'])
48
49
# Analyze with alternatives and verbose output
50
cli_detect(['data.csv'], alternatives=True, verbose=True)
51
52
# Normalize files with replacement
53
cli_detect(['file1.txt', 'file2.csv'], normalize=True, replace=True, force=True)
54
55
# Use custom detection threshold
56
cli_detect(['mixed_encoding.txt'], threshold=0.15, verbose=True)
57
```
58
59
### Interactive Confirmation
60
61
Helper function for interactive yes/no prompts in CLI operations.
62
63
```python { .api }
64
def query_yes_no(question: str, default: str = "yes") -> bool:
65
"""
66
Ask a yes/no question via input() and return the answer.
67
68
Parameters:
69
- question: Question string presented to the user
70
- default: Presumed answer if user just hits Enter ("yes", "no", or None)
71
72
Returns:
73
bool: True for "yes", False for "no"
74
75
Raises:
76
ValueError: If default is not "yes", "no", or None
77
78
Note: Used internally by CLI for confirmation prompts
79
"""
80
```
81
82
**Usage Example:**
83
84
```python
85
from charset_normalizer.cli import query_yes_no
86
87
# Basic yes/no prompt
88
if query_yes_no("Do you want to continue?"):
89
print("Proceeding...")
90
else:
91
print("Cancelled")
92
93
# Default to "no"
94
if query_yes_no("Delete all files?", default="no"):
95
print("Files deleted")
96
97
# Require explicit answer
98
answer = query_yes_no("Are you sure?", default=None)
99
```
100
101
## Shell Command Usage
102
103
The charset-normalizer package provides the `normalizer` command-line tool:
104
105
```bash
106
# Basic detection
107
normalizer document.txt
108
109
# Multiple files with alternatives
110
normalizer file1.txt file2.csv --with-alternative
111
112
# Normalize files in place
113
normalizer data.txt --normalize --replace --force
114
115
# Verbose detection with custom threshold
116
normalizer mixed_encoding.txt --verbose --threshold 0.15
117
118
# Minimal output (encoding name only)
119
normalizer simple.txt --minimal
120
```
121
122
## JSON Output Format
123
124
The CLI outputs structured JSON results for programmatic consumption:
125
126
```json
127
{
128
"path": "/path/to/document.txt",
129
"encoding": "utf_8",
130
"encoding_aliases": ["utf-8", "u8", "utf8"],
131
"alternative_encodings": ["ascii"],
132
"language": "English",
133
"alphabets": ["Basic Latin"],
134
"has_sig_or_bom": false,
135
"chaos": 0.02,
136
"coherence": 0.85,
137
"unicode_path": null,
138
"is_preferred": true
139
}
140
```
141
142
When `--with-alternative` is used, output becomes an array of results:
143
144
```json
145
[
146
{
147
"path": "/path/to/document.txt",
148
"encoding": "utf_8",
149
"language": "English",
150
"chaos": 0.02,
151
"coherence": 0.85,
152
"is_preferred": true
153
},
154
{
155
"path": "/path/to/document.txt",
156
"encoding": "iso-8859-1",
157
"language": "English",
158
"chaos": 0.05,
159
"coherence": 0.82,
160
"is_preferred": false
161
}
162
]
163
```
164
165
## Integration Patterns
166
167
### Script Integration
168
169
```python
170
import sys
171
import json
172
from charset_normalizer.cli import cli_detect
173
from io import StringIO
174
175
# Capture CLI output programmatically
176
old_stdout = sys.stdout
177
sys.stdout = buffer = StringIO()
178
179
try:
180
cli_detect(['document.txt'])
181
output = buffer.getvalue()
182
result = json.loads(output)
183
print(f"Detected encoding: {result['encoding']}")
184
finally:
185
sys.stdout = old_stdout
186
```
187
188
### Batch Processing
189
190
```python
191
from charset_normalizer.cli import cli_detect
192
import os
193
194
# Process all text files in directory
195
text_files = [f for f in os.listdir('.') if f.endswith('.txt')]
196
cli_detect(text_files, alternatives=True, verbose=True)
197
```
198
199
### Safe File Normalization
200
201
```python
202
from charset_normalizer.cli import cli_detect, query_yes_no
203
import os
204
205
def safe_normalize_files(file_paths):
206
"""Safely normalize files with user confirmation."""
207
# First, detect encodings
208
cli_detect(file_paths, verbose=True)
209
210
# Ask for confirmation
211
if query_yes_no(f"Normalize {len(file_paths)} files?"):
212
cli_detect(file_paths, normalize=True, replace=True)
213
print("Files normalized successfully")
214
else:
215
print("Normalization cancelled")
216
217
# Usage
218
safe_normalize_files(['doc1.txt', 'doc2.csv'])
219
```
220
221
## Error Handling
222
223
The CLI functions handle various error conditions:
224
225
- **File not found**: Skips missing files with warning
226
- **Permission errors**: Reports access issues and continues
227
- **Binary files**: Automatically skips non-text content
228
- **Encoding failures**: Reports problematic files and continues
229
- **User interruption**: Handles Ctrl+C gracefully
230
231
For programmatic usage, wrap CLI calls in try-catch blocks:
232
233
```python
234
try:
235
cli_detect(['problematic_file.bin'])
236
except KeyboardInterrupt:
237
print("Detection interrupted by user")
238
except Exception as e:
239
print(f"CLI error: {e}")
240
```