0
# Presidio Anonymizer
1
2
Presidio Anonymizer is a Python library for anonymizing and deanonymizing text by replacing or transforming personally identifiable information (PII) entities. It takes analyzer results containing PII locations and applies configurable anonymization operators like masking, replacement, encryption, or redaction.
3
4
## Package Information
5
6
- **Package Name**: presidio_anonymizer
7
- **Language**: Python
8
- **Installation**: `pip install presidio-anonymizer`
9
- **Server Mode**: `pip install presidio-anonymizer[server]` (includes Flask REST API)
10
- **Azure Health Services**: `pip install presidio-anonymizer[ahds]` (includes AHDS operators)
11
12
## Core Imports
13
14
```python
15
from presidio_anonymizer import (
16
AnonymizerEngine,
17
DeanonymizeEngine,
18
BatchAnonymizerEngine,
19
RecognizerResult,
20
OperatorConfig,
21
ConflictResolutionStrategy
22
)
23
```
24
25
## Basic Usage
26
27
```python
28
from presidio_anonymizer import AnonymizerEngine
29
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
30
31
# Initialize the anonymizer engine
32
engine = AnonymizerEngine()
33
34
# Define the text to anonymize
35
text = "My name is John Doe and my email is john.doe@example.com"
36
37
# Define analyzer results (typically from presidio-analyzer)
38
analyzer_results = [
39
RecognizerResult(entity_type="PERSON", start=11, end=19, score=0.9),
40
RecognizerResult(entity_type="EMAIL_ADDRESS", start=36, end=57, score=0.9)
41
]
42
43
# Configure anonymization operators
44
operators = {
45
"PERSON": OperatorConfig("replace", {"new_value": "[PERSON]"}),
46
"EMAIL_ADDRESS": OperatorConfig("mask", {"masking_char": "*", "chars_to_mask": 5, "from_end": True})
47
}
48
49
# Anonymize the text
50
result = engine.anonymize(
51
text=text,
52
analyzer_results=analyzer_results,
53
operators=operators
54
)
55
56
print(result.text) # "My name is [PERSON] and my email is john.***@example.com"
57
```
58
59
## Architecture
60
61
Presidio Anonymizer uses a modular architecture:
62
63
- **Engines**: Core processing engines (AnonymizerEngine, DeanonymizeEngine, BatchAnonymizerEngine)
64
- **Operators**: Pluggable transformation modules (Replace, Mask, Hash, Encrypt, etc.)
65
- **Entities**: Data structures for configuration and results (RecognizerResult, OperatorConfig, EngineResult)
66
- **Conflict Resolution**: Strategies for handling overlapping PII entities
67
- **Factory Pattern**: OperatorsFactory manages available operators and enables custom extensions
68
69
## Capabilities
70
71
### Core Anonymization
72
73
Primary anonymization functionality using the AnonymizerEngine. Processes text with analyzer results and applies configurable operators to transform PII entities.
74
75
```python { .api }
76
class AnonymizerEngine:
77
def anonymize(
78
self,
79
text: str,
80
analyzer_results: List[RecognizerResult],
81
operators: Optional[Dict[str, OperatorConfig]] = None,
82
conflict_resolution: ConflictResolutionStrategy = ConflictResolutionStrategy.MERGE_SIMILAR_OR_CONTAINED
83
) -> EngineResult: ...
84
85
def get_anonymizers(self) -> List[str]: ...
86
def add_anonymizer(self, anonymizer_cls: Type[Operator]) -> None: ...
87
def remove_anonymizer(self, anonymizer_cls: Type[Operator]) -> None: ...
88
```
89
90
[Core Anonymization](./core-anonymization.md)
91
92
### Deanonymization
93
94
Reverse anonymization operations using the DeanonymizeEngine. Restores original text from anonymized content when using reversible operators like encryption.
95
96
```python { .api }
97
class DeanonymizeEngine:
98
def deanonymize(
99
self,
100
text: str,
101
entities: List[OperatorResult],
102
operators: Dict[str, OperatorConfig]
103
) -> EngineResult: ...
104
105
def get_deanonymizers(self) -> List[str]: ...
106
def add_deanonymizer(self, deanonymizer_cls: Type[Operator]) -> None: ...
107
def remove_deanonymizer(self, deanonymizer_cls: Type[Operator]) -> None: ...
108
```
109
110
[Deanonymization](./deanonymization.md)
111
112
### Batch Processing
113
114
Batch anonymization for lists and dictionaries using the BatchAnonymizerEngine. Efficiently processes multiple texts or structured data formats.
115
116
```python { .api }
117
class BatchAnonymizerEngine:
118
def __init__(self, anonymizer_engine: Optional[AnonymizerEngine] = None): ...
119
120
def anonymize_list(
121
self,
122
texts: List[Optional[Union[str, bool, int, float]]],
123
recognizer_results_list: List[List[RecognizerResult]],
124
**kwargs
125
) -> List[Union[str, Any]]: ...
126
127
def anonymize_dict(
128
self,
129
analyzer_results: Iterable[DictRecognizerResult],
130
**kwargs
131
) -> Dict[str, str]: ...
132
```
133
134
[Batch Processing](./batch-processing.md)
135
136
### Anonymization Operators
137
138
Built-in transformation operators for different anonymization strategies including replacement, masking, encryption, hashing, and redaction.
139
140
```python { .api }
141
# Replacement operators
142
class Replace(Operator): ...
143
class Redact(Operator): ...
144
class Keep(Operator): ...
145
146
# Transformation operators
147
class Mask(Operator): ...
148
class Hash(Operator): ...
149
150
# Encryption operators
151
class Encrypt(Operator): ...
152
class Decrypt(Operator): ...
153
154
# Custom operators
155
class Custom(Operator): ...
156
```
157
158
[Anonymization Operators](./operators.md)
159
160
### Data Entities
161
162
Core data structures for configuration, input, and results including recognizer results, operator configurations, and engine outputs.
163
164
```python { .api }
165
class RecognizerResult:
166
def __init__(self, entity_type: str, start: int, end: int, score: float): ...
167
@classmethod
168
def from_json(cls, data: Dict) -> "RecognizerResult": ...
169
170
class OperatorConfig:
171
def __init__(self, operator_name: str, params: Dict = None): ...
172
@classmethod
173
def from_json(cls, params: Dict) -> "OperatorConfig": ...
174
175
class EngineResult:
176
text: str
177
items: List[OperatorResult]
178
```
179
180
[Data Entities](./entities.md)
181
182
## Types
183
184
```python { .api }
185
from enum import Enum
186
from typing import Dict, List, Optional, Union, Type, Iterable, Any
187
188
class ConflictResolutionStrategy(Enum):
189
MERGE_SIMILAR_OR_CONTAINED = "merge_similar_or_contained"
190
REMOVE_INTERSECTIONS = "remove_intersections"
191
192
class InvalidParamError(Exception):
193
"""Exception raised for invalid parameters."""
194
195
# Type aliases
196
OperatorType = Enum # Anonymize, Deanonymize
197
PIIEntity = object # Base entity with start/end positions
198
```