tessl/pypi-presidio-anonymizer

Presidio Anonymizer package - replaces analyzed text with desired values.

—

Pending

Overview

Eval results

Files

Core Anonymization

Name: tessl/pypi-presidio-anonymizer
Author: tessl

The AnonymizerEngine provides the primary anonymization functionality, taking text and analyzer results to apply configurable transformation operators on detected PII entities.

Capabilities

Text Anonymization

Main anonymization method that processes text with PII entity locations and applies transformation operators.

def anonymize(
    self,
    text: str,
    analyzer_results: List[RecognizerResult],
    operators: Optional[Dict[str, OperatorConfig]] = None,
    conflict_resolution: ConflictResolutionStrategy = ConflictResolutionStrategy.MERGE_SIMILAR_OR_CONTAINED
) -> EngineResult:
    """
    Anonymize method to anonymize the given text.

    Parameters:
    - text (str): The text to anonymize
    - analyzer_results (List[RecognizerResult]): Results from analyzer containing PII locations
    - operators (Optional[Dict[str, OperatorConfig]]): Configuration for anonymizers per entity type
    - conflict_resolution (ConflictResolutionStrategy): Strategy for handling overlapping entities

    Returns:
    EngineResult: Contains anonymized text and metadata about transformations
    """

Usage Examples:

from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig

engine = AnonymizerEngine()

# Basic replacement
result = engine.anonymize(
    text="My name is John Doe",
    analyzer_results=[RecognizerResult("PERSON", 11, 19, 0.9)],
    operators={"PERSON": OperatorConfig("replace", {"new_value": "[REDACTED]"})}
)
print(result.text)  # "My name is [REDACTED]"

# Multiple operators
operators = {
    "PERSON": OperatorConfig("replace", {"new_value": "[PERSON]"}),
    "EMAIL_ADDRESS": OperatorConfig("mask", {"masking_char": "*", "chars_to_mask": 5}),
    "PHONE_NUMBER": OperatorConfig("redact")
}

result = engine.anonymize(
    text="Contact John at john@email.com or 555-1234",
    analyzer_results=[
        RecognizerResult("PERSON", 8, 12, 0.9),
        RecognizerResult("EMAIL_ADDRESS", 16, 30, 0.9),
        RecognizerResult("PHONE_NUMBER", 34, 42, 0.8)
    ],
    operators=operators
)

Operator Management

Add or remove custom anonymization operators at runtime.

def add_anonymizer(self, anonymizer_cls: Type[Operator]) -> None:
    """
    Add a new anonymizer to the engine.

    Parameters:
    - anonymizer_cls (Type[Operator]): The anonymizer class to add
    """

def remove_anonymizer(self, anonymizer_cls: Type[Operator]) -> None:
    """
    Remove an anonymizer from the engine.

    Parameters:
    - anonymizer_cls (Type[Operator]): The anonymizer class to remove
    """

Usage Example:

from presidio_anonymizer.operators import Operator

class CustomHasher(Operator):
    def operate(self, text, params):
        # Custom hashing logic
        return f"HASH_{hash(text)}"

engine = AnonymizerEngine()
engine.add_anonymizer(CustomHasher)

# Use the custom operator
operators = {"PERSON": OperatorConfig("custom_hasher")}

Available Operators

Get list of all available anonymization operators.

def get_anonymizers(self) -> List[str]:
    """
    Return a list of supported anonymizers.

    Returns:
    List[str]: Names of available anonymizer operators
    """

Usage Example:

engine = AnonymizerEngine()
available = engine.get_anonymizers()
print(available)  # ['replace', 'redact', 'mask', 'hash', 'encrypt', 'keep']

Conflict Resolution Strategies

When PII entities overlap, the engine uses conflict resolution strategies:

MERGE_SIMILAR_OR_CONTAINED (Default)

Merges entities of the same type that overlap or are contained within each other.

from presidio_anonymizer.entities import ConflictResolutionStrategy

result = engine.anonymize(
    text=text,
    analyzer_results=overlapping_results,
    conflict_resolution=ConflictResolutionStrategy.MERGE_SIMILAR_OR_CONTAINED
)

REMOVE_INTERSECTIONS

Adjusts boundaries of overlapping entities to remove intersections, keeping higher-scored entities intact.

result = engine.anonymize(
    text=text,
    analyzer_results=overlapping_results,
    conflict_resolution=ConflictResolutionStrategy.REMOVE_INTERSECTIONS
)

Default Behavior

Default Operator: If no operator is specified for an entity type, uses "replace" operator
Default Parameters: Each operator has sensible defaults (e.g., mask uses "*" character)
Whitespace Merging: Adjacent entities of the same type separated only by whitespace are merged
Sorted Processing: Entities are processed in start position order for consistent results

Install with Tessl CLI

npx tessl i tessl/pypi-presidio-anonymizer

docs

batch-processing.md

core-anonymization.md

tessl/pypi-presidio-anonymizer