The Real First Universal Charset Detector providing modern, fast, and reliable character encoding detection as an alternative to chardet.
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Command-line interface and programmatic CLI functions for charset detection and file normalization. Provides both shell command capabilities and importable Python functions for CLI operations.
Primary CLI detection function that processes files and outputs structured results in JSON format.
def cli_detect(
paths: list[str],
alternatives: bool = False,
normalize: bool = False,
minimal: bool = False,
replace: bool = False,
force: bool = False,
threshold: float = 0.2,
verbose: bool = False
) -> None:
"""
CLI detection function for processing multiple files.
Parameters:
- paths: List of file paths to analyze
- alternatives: Output complementary possibilities if any (JSON list format)
- normalize: Permit normalization of input files
- minimal: Only output charset to STDOUT, disabling JSON output
- replace: Replace files when normalizing instead of creating new ones
- force: Replace files without asking for confirmation
- threshold: Custom maximum chaos allowed in decoded content (0.0-1.0)
- verbose: Display complementary information and detection logs
Returns:
None (outputs to stdout)
Note: This function handles multiple files and outputs JSON results to stdout
"""Usage Example:
from charset_normalizer.cli import cli_detect
# Analyze single file
cli_detect(['document.txt'])
# Analyze with alternatives and verbose output
cli_detect(['data.csv'], alternatives=True, verbose=True)
# Normalize files with replacement
cli_detect(['file1.txt', 'file2.csv'], normalize=True, replace=True, force=True)
# Use custom detection threshold
cli_detect(['mixed_encoding.txt'], threshold=0.15, verbose=True)Helper function for interactive yes/no prompts in CLI operations.
def query_yes_no(question: str, default: str = "yes") -> bool:
"""
Ask a yes/no question via input() and return the answer.
Parameters:
- question: Question string presented to the user
- default: Presumed answer if user just hits Enter ("yes", "no", or None)
Returns:
bool: True for "yes", False for "no"
Raises:
ValueError: If default is not "yes", "no", or None
Note: Used internally by CLI for confirmation prompts
"""Usage Example:
from charset_normalizer.cli import query_yes_no
# Basic yes/no prompt
if query_yes_no("Do you want to continue?"):
print("Proceeding...")
else:
print("Cancelled")
# Default to "no"
if query_yes_no("Delete all files?", default="no"):
print("Files deleted")
# Require explicit answer
answer = query_yes_no("Are you sure?", default=None)The charset-normalizer package provides the normalizer command-line tool:
# Basic detection
normalizer document.txt
# Multiple files with alternatives
normalizer file1.txt file2.csv --with-alternative
# Normalize files in place
normalizer data.txt --normalize --replace --force
# Verbose detection with custom threshold
normalizer mixed_encoding.txt --verbose --threshold 0.15
# Minimal output (encoding name only)
normalizer simple.txt --minimalThe CLI outputs structured JSON results for programmatic consumption:
{
"path": "/path/to/document.txt",
"encoding": "utf_8",
"encoding_aliases": ["utf-8", "u8", "utf8"],
"alternative_encodings": ["ascii"],
"language": "English",
"alphabets": ["Basic Latin"],
"has_sig_or_bom": false,
"chaos": 0.02,
"coherence": 0.85,
"unicode_path": null,
"is_preferred": true
}When --with-alternative is used, output becomes an array of results:
[
{
"path": "/path/to/document.txt",
"encoding": "utf_8",
"language": "English",
"chaos": 0.02,
"coherence": 0.85,
"is_preferred": true
},
{
"path": "/path/to/document.txt",
"encoding": "iso-8859-1",
"language": "English",
"chaos": 0.05,
"coherence": 0.82,
"is_preferred": false
}
]import sys
import json
from charset_normalizer.cli import cli_detect
from io import StringIO
# Capture CLI output programmatically
old_stdout = sys.stdout
sys.stdout = buffer = StringIO()
try:
cli_detect(['document.txt'])
output = buffer.getvalue()
result = json.loads(output)
print(f"Detected encoding: {result['encoding']}")
finally:
sys.stdout = old_stdoutfrom charset_normalizer.cli import cli_detect
import os
# Process all text files in directory
text_files = [f for f in os.listdir('.') if f.endswith('.txt')]
cli_detect(text_files, alternatives=True, verbose=True)from charset_normalizer.cli import cli_detect, query_yes_no
import os
def safe_normalize_files(file_paths):
"""Safely normalize files with user confirmation."""
# First, detect encodings
cli_detect(file_paths, verbose=True)
# Ask for confirmation
if query_yes_no(f"Normalize {len(file_paths)} files?"):
cli_detect(file_paths, normalize=True, replace=True)
print("Files normalized successfully")
else:
print("Normalization cancelled")
# Usage
safe_normalize_files(['doc1.txt', 'doc2.csv'])The CLI functions handle various error conditions:
For programmatic usage, wrap CLI calls in try-catch blocks:
try:
cli_detect(['problematic_file.bin'])
except KeyboardInterrupt:
print("Detection interrupted by user")
except Exception as e:
print(f"CLI error: {e}")Install with Tessl CLI
npx tessl i tessl/pypi-charset-normalizer