tessl install tessl/pypi-w3lib@2.3.0Library of web-related functions for HTML manipulation, HTTP processing, URL handling, and encoding detection
Agent Success
Agent success rate when using this tile
84%
Improvement
Agent success rate improvement when using this tile compared to baseline
0.91x
Baseline
Agent success rate without this tile
92%
Build a text processing utility that converts HTML entity references into their corresponding Unicode characters. The utility should handle various types of HTML entities that appear in web content and provide options for selective entity preservation.
Your implementation should process HTML text containing entities and convert them to readable Unicode characters:
Entity Types: Support for named entities (like &), decimal numeric entities (like A), and hexadecimal numeric entities (like A).
Selective Preservation: Provide the ability to keep specific entities unconverted. For example, preserve <, >, and & while converting others.
Entity Detection: Implement a check to determine if a given text contains any HTML entities before processing.
Illegal Character Handling: Handle removal of illegal XML/HTML character references according to standard specifications.
"Hello & goodbye" produces "Hello & goodbye" @test"Price: €100" produces "Price: €100" @test"Copyright © 2024" produces "Copyright © 2024" @test"<div>" with preserved entities ['lt', 'gt'] keeps "<div>" unchanged @test"No entities here" contains entities returns False @test"Has entity" contains entities returns True @test@generates
def convert_entities(text: str, keep: list[str] | None = None, remove_illegal: bool = True) -> str:
"""
Convert HTML entities in text to Unicode characters.
Args:
text: The text containing HTML entities
keep: Optional list of entity names to preserve (e.g., ['amp', 'lt', 'gt'])
remove_illegal: Whether to remove illegal character references
Returns:
Text with entities converted to Unicode characters
"""
pass
def has_entities(text: str) -> bool:
"""
Check if text contains any HTML entities.
Args:
text: The text to check
Returns:
True if the text contains HTML entities, False otherwise
"""
passProvides web utility functions for HTML processing.
@satisfied-by