or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/wtfpython@3.0.x
tile.json

tessl/pypi-wtfpython

tessl install tessl/pypi-wtfpython@3.0.0

Educational collection of surprising Python code snippets that demonstrate counter-intuitive behaviors and language internals

Agent Success

Agent success rate when using this tile

93%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.06x

Baseline

Agent success rate without this tile

88%

task.mdevals/scenario-10/

Unicode Identifier Validator

Build a tool that validates Python identifiers for potential security issues caused by Unicode lookalike characters. The tool should detect when identifiers contain characters that visually resemble ASCII letters but are actually different Unicode code points (such as Cyrillic characters).

Requirements

Create a Python module that provides functionality to:

  1. Analyze identifiers: Check if a given string contains Unicode lookalike characters that could cause confusion with ASCII characters.

  2. Report findings: For each suspicious identifier, report:

    • The character(s) that are lookalikes
    • Their Unicode code points
    • What ASCII character they resemble
  3. Categorize risk level: Classify identifiers as:

    • safe: Contains only ASCII characters
    • suspicious: Contains Unicode characters that look similar to ASCII
    • mixed: Contains both ASCII and non-ASCII characters without visual confusion

Test Cases

  • Given the identifier "hello", it should be classified as safe @test
  • Given the identifier "һello" (with Cyrillic 'һ'), it should be classified as suspicious and report that 'һ' (U+04BB) resembles ASCII 'h' @test
  • Given the identifier "tеst" (with Cyrillic 'е'), it should be classified as suspicious and report that 'е' (U+0435) resembles ASCII 'e' @test
  • Given the identifier "hello世界", it should be classified as mixed since it contains Chinese characters that don't resemble ASCII @test

Implementation

@generates

API

def analyze_identifier(identifier: str) -> dict:
    """
    Analyze a Python identifier for Unicode lookalike characters.

    Args:
        identifier: The string to analyze

    Returns:
        A dictionary containing:
        - 'risk_level': str - 'safe', 'suspicious', or 'mixed'
        - 'lookalikes': list[dict] - List of lookalike character info
          Each dict contains:
          - 'char': str - The suspicious character
          - 'code_point': str - Unicode code point (e.g., 'U+04BB')
          - 'resembles': str - The ASCII character it resembles
          - 'position': int - Position in the identifier
    """
    pass

Dependencies { .dependencies }

unicodedata { .dependency }

Provides access to Unicode character database for character name and category lookup.