Tessl Tile for pypi/presidio-anonymizer@2.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

batch-processing.md core-anonymization.md deanonymization.md entities.md index.md operators.md

index.mddocs/

0
# Presidio Anonymizer
1

2
Presidio Anonymizer is a Python library for anonymizing and deanonymizing text by replacing or transforming personally identifiable information (PII) entities. It takes analyzer results containing PII locations and applies configurable anonymization operators like masking, replacement, encryption, or redaction.
3

4
## Package Information
5

6
- **Package Name**: presidio_anonymizer
7
- **Language**: Python
8
- **Installation**: `pip install presidio-anonymizer`
9
- **Server Mode**: `pip install presidio-anonymizer[server]` (includes Flask REST API)
10
- **Azure Health Services**: `pip install presidio-anonymizer[ahds]` (includes AHDS operators)
11

12
## Core Imports
13

14
```python
15
from presidio_anonymizer import (
16
    AnonymizerEngine,
17
    DeanonymizeEngine,
18
    BatchAnonymizerEngine,
19
    RecognizerResult,
20
    OperatorConfig,
21
    ConflictResolutionStrategy
22
)
23
```
24

25
## Basic Usage
26

27
```python
28
from presidio_anonymizer import AnonymizerEngine
29
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
30

31
# Initialize the anonymizer engine
32
engine = AnonymizerEngine()
33

34
# Define the text to anonymize
35
text = "My name is John Doe and my email is john.doe@example.com"
36

37
# Define analyzer results (typically from presidio-analyzer)
38
analyzer_results = [
39
    RecognizerResult(entity_type="PERSON", start=11, end=19, score=0.9),
40
    RecognizerResult(entity_type="EMAIL_ADDRESS", start=36, end=57, score=0.9)
41
]
42

43
# Configure anonymization operators
44
operators = {
45
    "PERSON": OperatorConfig("replace", {"new_value": "[PERSON]"}),
46
    "EMAIL_ADDRESS": OperatorConfig("mask", {"masking_char": "*", "chars_to_mask": 5, "from_end": True})
47
}
48

49
# Anonymize the text
50
result = engine.anonymize(
51
    text=text,
52
    analyzer_results=analyzer_results,
53
    operators=operators
54
)
55

56
print(result.text)  # "My name is [PERSON] and my email is john.***@example.com"
57
```
58

59
## Architecture
60

61
Presidio Anonymizer uses a modular architecture:
62

63
- **Engines**: Core processing engines (AnonymizerEngine, DeanonymizeEngine, BatchAnonymizerEngine)
64
- **Operators**: Pluggable transformation modules (Replace, Mask, Hash, Encrypt, etc.)
65
- **Entities**: Data structures for configuration and results (RecognizerResult, OperatorConfig, EngineResult)
66
- **Conflict Resolution**: Strategies for handling overlapping PII entities
67
- **Factory Pattern**: OperatorsFactory manages available operators and enables custom extensions
68

69
## Capabilities
70

71
### Core Anonymization
72

73
Primary anonymization functionality using the AnonymizerEngine. Processes text with analyzer results and applies configurable operators to transform PII entities.
74

75
```python { .api }
76
class AnonymizerEngine:
77
    def anonymize(
78
        self,
79
        text: str,
80
        analyzer_results: List[RecognizerResult],
81
        operators: Optional[Dict[str, OperatorConfig]] = None,
82
        conflict_resolution: ConflictResolutionStrategy = ConflictResolutionStrategy.MERGE_SIMILAR_OR_CONTAINED
83
    ) -> EngineResult: ...
84
    
85
    def get_anonymizers(self) -> List[str]: ...
86
    def add_anonymizer(self, anonymizer_cls: Type[Operator]) -> None: ...
87
    def remove_anonymizer(self, anonymizer_cls: Type[Operator]) -> None: ...
88
```
89

90
[Core Anonymization](./core-anonymization.md)
91

92
### Deanonymization
93

94
Reverse anonymization operations using the DeanonymizeEngine. Restores original text from anonymized content when using reversible operators like encryption.
95

96
```python { .api }
97
class DeanonymizeEngine:
98
    def deanonymize(
99
        self,
100
        text: str,
101
        entities: List[OperatorResult],
102
        operators: Dict[str, OperatorConfig]
103
    ) -> EngineResult: ...
104
    
105
    def get_deanonymizers(self) -> List[str]: ...
106
    def add_deanonymizer(self, deanonymizer_cls: Type[Operator]) -> None: ...
107
    def remove_deanonymizer(self, deanonymizer_cls: Type[Operator]) -> None: ...
108
```
109

110
[Deanonymization](./deanonymization.md)
111

112
### Batch Processing
113

114
Batch anonymization for lists and dictionaries using the BatchAnonymizerEngine. Efficiently processes multiple texts or structured data formats.
115

116
```python { .api }
117
class BatchAnonymizerEngine:
118
    def __init__(self, anonymizer_engine: Optional[AnonymizerEngine] = None): ...
119
    
120
    def anonymize_list(
121
        self,
122
        texts: List[Optional[Union[str, bool, int, float]]],
123
        recognizer_results_list: List[List[RecognizerResult]],
124
        **kwargs
125
    ) -> List[Union[str, Any]]: ...
126
    
127
    def anonymize_dict(
128
        self,
129
        analyzer_results: Iterable[DictRecognizerResult],
130
        **kwargs
131
    ) -> Dict[str, str]: ...
132
```
133

134
[Batch Processing](./batch-processing.md)
135

136
### Anonymization Operators
137

138
Built-in transformation operators for different anonymization strategies including replacement, masking, encryption, hashing, and redaction.
139

140
```python { .api }
141
# Replacement operators
142
class Replace(Operator): ...
143
class Redact(Operator): ...
144
class Keep(Operator): ...
145

146
# Transformation operators  
147
class Mask(Operator): ...
148
class Hash(Operator): ...
149

150
# Encryption operators
151
class Encrypt(Operator): ...
152
class Decrypt(Operator): ...
153

154
# Custom operators
155
class Custom(Operator): ...
156
```
157

158
[Anonymization Operators](./operators.md)
159

160
### Data Entities
161

162
Core data structures for configuration, input, and results including recognizer results, operator configurations, and engine outputs.
163

164
```python { .api }
165
class RecognizerResult:
166
    def __init__(self, entity_type: str, start: int, end: int, score: float): ...
167
    @classmethod
168
    def from_json(cls, data: Dict) -> "RecognizerResult": ...
169

170
class OperatorConfig:
171
    def __init__(self, operator_name: str, params: Dict = None): ...
172
    @classmethod
173
    def from_json(cls, params: Dict) -> "OperatorConfig": ...
174

175
class EngineResult:
176
    text: str
177
    items: List[OperatorResult]
178
```
179

180
[Data Entities](./entities.md)
181

182
## Types
183

184
```python { .api }
185
from enum import Enum
186
from typing import Dict, List, Optional, Union, Type, Iterable, Any
187

188
class ConflictResolutionStrategy(Enum):
189
    MERGE_SIMILAR_OR_CONTAINED = "merge_similar_or_contained"
190
    REMOVE_INTERSECTIONS = "remove_intersections"
191

192
class InvalidParamError(Exception):
193
    """Exception raised for invalid parameters."""
194

195
# Type aliases
196
OperatorType = Enum  # Anonymize, Deanonymize
197
PIIEntity = object   # Base entity with start/end positions
198
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/