Educational collection of surprising Python code snippets that demonstrate counter-intuitive behaviors and language internals
Overall
score
93%
Build a tool that validates Python identifiers for potential security issues caused by Unicode lookalike characters. The tool should detect when identifiers contain characters that visually resemble ASCII letters but are actually different Unicode code points (such as Cyrillic characters).
Create a Python module that provides functionality to:
Analyze identifiers: Check if a given string contains Unicode lookalike characters that could cause confusion with ASCII characters.
Report findings: For each suspicious identifier, report:
Categorize risk level: Classify identifiers as:
safe: Contains only ASCII characterssuspicious: Contains Unicode characters that look similar to ASCIImixed: Contains both ASCII and non-ASCII characters without visual confusion@generates
def analyze_identifier(identifier: str) -> dict:
"""
Analyze a Python identifier for Unicode lookalike characters.
Args:
identifier: The string to analyze
Returns:
A dictionary containing:
- 'risk_level': str - 'safe', 'suspicious', or 'mixed'
- 'lookalikes': list[dict] - List of lookalike character info
Each dict contains:
- 'char': str - The suspicious character
- 'code_point': str - Unicode code point (e.g., 'U+04BB')
- 'resembles': str - The ASCII character it resembles
- 'position': int - Position in the identifier
"""
passProvides access to Unicode character database for character name and category lookup.
Install with Tessl CLI
npx tessl i tessl/pypi-wtfpythondocs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10