CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-wtfpython

Educational collection of surprising Python code snippets that demonstrate counter-intuitive behaviors and language internals

Overall
score

93%

Overview
Eval results
Files

task.mdevals/scenario-9/

Unicode Identifier Validator

Build a tool that validates Python identifiers for potential security issues caused by Unicode lookalike characters. The tool should detect when identifiers contain characters that visually resemble ASCII letters but are actually different Unicode code points (such as Cyrillic characters).

Requirements

Create a Python module that provides functionality to:

  1. Analyze identifiers: Check if a given string contains Unicode lookalike characters that could cause confusion with ASCII characters.

  2. Report findings: For each suspicious identifier, report:

    • The character(s) that are lookalikes
    • Their Unicode code points
    • What ASCII character they resemble
  3. Categorize risk level: Classify identifiers as:

    • safe: Contains only ASCII characters
    • suspicious: Contains Unicode characters that look similar to ASCII
    • mixed: Contains both ASCII and non-ASCII characters without visual confusion

Test Cases

  • Given the identifier "hello", it should be classified as safe @test
  • Given the identifier "һello" (with Cyrillic 'һ'), it should be classified as suspicious and report that 'һ' (U+04BB) resembles ASCII 'h' @test
  • Given the identifier "tеst" (with Cyrillic 'е'), it should be classified as suspicious and report that 'е' (U+0435) resembles ASCII 'e' @test
  • Given the identifier "hello世界", it should be classified as mixed since it contains Chinese characters that don't resemble ASCII @test

Implementation

@generates

API

def analyze_identifier(identifier: str) -> dict:
    """
    Analyze a Python identifier for Unicode lookalike characters.

    Args:
        identifier: The string to analyze

    Returns:
        A dictionary containing:
        - 'risk_level': str - 'safe', 'suspicious', or 'mixed'
        - 'lookalikes': list[dict] - List of lookalike character info
          Each dict contains:
          - 'char': str - The suspicious character
          - 'code_point': str - Unicode code point (e.g., 'U+04BB')
          - 'resembles': str - The ASCII character it resembles
          - 'position': int - Position in the identifier
    """
    pass

Dependencies { .dependencies }

unicodedata { .dependency }

Provides access to Unicode character database for character name and category lookup.

Install with Tessl CLI

npx tessl i tessl/pypi-wtfpython

tile.json