0
# Anonymization Operators
1
2
Presidio Anonymizer provides built-in transformation operators for different anonymization strategies. Operators are configurable modules that transform detected PII entities according to specific algorithms.
3
4
## Capabilities
5
6
### Text Replacement
7
8
Replace detected PII with specified text values.
9
10
```python { .api }
11
class Replace(Operator):
12
"""Replaces detected text with specified replacement text."""
13
14
# Configuration
15
OperatorConfig("replace", {
16
"new_value": str # Text to replace with
17
})
18
```
19
20
**Usage Examples:**
21
22
```python
23
from presidio_anonymizer.entities import OperatorConfig
24
25
# Replace with placeholder
26
replace_config = OperatorConfig("replace", {"new_value": "[REDACTED]"})
27
28
# Replace with entity type
29
replace_config = OperatorConfig("replace", {"new_value": "<PERSON>"})
30
31
# Replace with custom text
32
replace_config = OperatorConfig("replace", {"new_value": "CONFIDENTIAL"})
33
```
34
35
### Text Redaction
36
37
Completely remove detected PII from text.
38
39
```python { .api }
40
class Redact(Operator):
41
"""Completely removes/redacts detected text."""
42
43
# Configuration (no parameters required)
44
OperatorConfig("redact")
45
```
46
47
**Usage Example:**
48
49
```python
50
# Simple redaction - removes text entirely
51
redact_config = OperatorConfig("redact")
52
53
# Result: "Contact at " (name completely removed)
54
```
55
56
### Text Masking
57
58
Mask characters in detected text with specified masking character.
59
60
```python { .api }
61
class Mask(Operator):
62
"""Masks characters in detected text with specified masking character."""
63
64
# Configuration
65
OperatorConfig("mask", {
66
"masking_char": str, # Character to use for masking (default: "*")
67
"chars_to_mask": int, # Number of characters to mask (default: all)
68
"from_end": bool # Mask from end instead of beginning (default: False)
69
})
70
```
71
72
**Usage Examples:**
73
74
```python
75
# Mask all characters
76
mask_all = OperatorConfig("mask", {"masking_char": "*"})
77
78
# Mask last 4 characters
79
mask_end = OperatorConfig("mask", {
80
"masking_char": "*",
81
"chars_to_mask": 4,
82
"from_end": True
83
})
84
85
# Mask first 3 characters
86
mask_start = OperatorConfig("mask", {
87
"masking_char": "#",
88
"chars_to_mask": 3,
89
"from_end": False
90
})
91
92
# Examples:
93
# "john@email.com" -> "j***@email.com" (last 4 from end)
94
# "555-1234" -> "###-1234" (first 3)
95
```
96
97
### Hashing
98
99
Replace detected text with hash values.
100
101
```python { .api }
102
class Hash(Operator):
103
"""Replaces detected text with hash value."""
104
105
# Configuration
106
OperatorConfig("hash", {
107
"hash_type": str # Hash algorithm (default: "sha256")
108
})
109
```
110
111
**Usage Example:**
112
113
```python
114
# SHA256 hash (default)
115
hash_config = OperatorConfig("hash")
116
117
# Specific hash algorithm
118
hash_md5 = OperatorConfig("hash", {"hash_type": "md5"})
119
120
# Result: "Contact a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3 at..."
121
```
122
123
### Encryption (Reversible)
124
125
Encrypt detected text using AES encryption for reversible anonymization.
126
127
```python { .api }
128
class Encrypt(Operator):
129
"""Encrypts detected text using AES encryption."""
130
131
# Configuration
132
OperatorConfig("encrypt", {
133
"key": str # 32-character encryption key (required)
134
})
135
```
136
137
**Usage Example:**
138
139
```python
140
# AES encryption with key
141
encrypt_config = OperatorConfig("encrypt", {
142
"key": "my-secret-key-32-characters-long12" # Must be exactly 32 characters
143
})
144
145
# Result: encrypted text that can be decrypted later
146
```
147
148
### Keep Unchanged
149
150
Keep original text without any transformation.
151
152
```python { .api }
153
class Keep(Operator):
154
"""Keeps original text unchanged."""
155
156
# Configuration (no parameters required)
157
OperatorConfig("keep")
158
```
159
160
**Usage Example:**
161
162
```python
163
# Keep certain entity types unchanged
164
keep_config = OperatorConfig("keep")
165
166
# Useful for preserving certain PII types while anonymizing others
167
operators = {
168
"PERSON": OperatorConfig("replace", {"new_value": "[PERSON]"}),
169
"LOCATION": OperatorConfig("keep") # Keep locations unchanged
170
}
171
```
172
173
### Custom Operators
174
175
Create custom anonymization logic by extending the Operator base class.
176
177
```python { .api }
178
class Custom(Operator):
179
"""Allows custom anonymization logic."""
180
181
# Configuration
182
OperatorConfig("custom", {
183
"lambda": str # Python lambda expression as string
184
})
185
```
186
187
**Usage Example:**
188
189
```python
190
# Custom transformation using lambda
191
# Note: Custom operator uses eval() - ensure lambda expressions are trusted
192
custom_config = OperatorConfig("custom", {
193
"lambda": "lambda x: x.upper().replace(' ', '_')"
194
})
195
196
# Result: "JOHN_DOE" instead of "John Doe"
197
```
198
199
## Deanonymization Operators
200
201
### Decryption
202
203
Decrypt previously encrypted text using matching key.
204
205
```python { .api }
206
class Decrypt(Operator):
207
"""Decrypts previously encrypted text."""
208
209
# Configuration
210
OperatorConfig("decrypt", {
211
"key": str # Must match original encryption key
212
})
213
```
214
215
**Usage Example:**
216
217
```python
218
# Must use same key as encryption
219
decrypt_config = OperatorConfig("decrypt", {
220
"key": "my-secret-key-32-characters-long12" # Same as encrypt key
221
})
222
```
223
224
### Deanonymize Keep
225
226
Keep text unchanged during deanonymization.
227
228
```python { .api }
229
class DeanonymizeKeep(Operator):
230
"""Keeps text unchanged during deanonymization."""
231
232
# Configuration (no parameters required)
233
OperatorConfig("deanonymize_keep")
234
```
235
236
## Azure Health Data Services (Optional)
237
238
When the `ahds` optional dependency is installed, additional operators are available:
239
240
### AHDS Surrogate
241
242
Uses Azure Health Data Services for surrogate key generation.
243
244
```python { .api }
245
class AHDSSurrogate(Operator):
246
"""Azure Health Data Services surrogate key operator."""
247
# Available only when 'ahds' dependencies are installed
248
```
249
250
**Installation:**
251
252
```bash
253
# Install with AHDS support
254
pip install presidio-anonymizer[ahds]
255
```
256
257
**Availability Check:**
258
259
```python
260
from presidio_anonymizer.operators import AHDS_AVAILABLE
261
262
if AHDS_AVAILABLE:
263
from presidio_anonymizer.operators import AHDSSurrogate
264
265
# Configure AHDS surrogate operator
266
ahds_config = OperatorConfig("ahds_surrogate", {
267
# Configuration parameters for AHDS integration
268
})
269
270
# Use in operators dictionary
271
operators = {
272
"PERSON": ahds_config,
273
"PHONE_NUMBER": ahds_config
274
}
275
else:
276
print("AHDS operators not available - install with: pip install presidio-anonymizer[ahds]")
277
```
278
279
**Feature Check in Code:**
280
281
```python
282
# Programmatically check for AHDS availability
283
def get_available_operators():
284
"""Get list of all available operators including optional ones."""
285
from presidio_anonymizer.operators import AHDS_AVAILABLE
286
287
operators = ['replace', 'mask', 'hash', 'encrypt', 'redact', 'keep', 'custom']
288
if AHDS_AVAILABLE:
289
operators.append('ahds_surrogate')
290
291
return operators
292
```
293
294
## Operator Factory
295
296
Manage and extend available operators using the factory pattern.
297
298
```python { .api }
299
class OperatorsFactory:
300
"""Factory for managing anonymization and deanonymization operators."""
301
302
def get_anonymizers(self) -> Dict[str, Type[Operator]]:
303
"""
304
Get all available anonymization operators.
305
306
Returns:
307
Dict[str, Type[Operator]]: Dictionary mapping operator names to operator classes
308
"""
309
310
def get_deanonymizers(self) -> Dict[str, Type[Operator]]:
311
"""
312
Get all available deanonymization operators.
313
314
Returns:
315
Dict[str, Type[Operator]]: Dictionary mapping operator names to operator classes
316
"""
317
318
def add_anonymize_operator(self, operator_cls: Type[Operator]) -> None:
319
"""
320
Add a new anonymization operator.
321
322
Parameters:
323
- operator_cls (Type[Operator]): The operator class to add
324
"""
325
326
def add_deanonymize_operator(self, operator_cls: Type[Operator]) -> None:
327
"""
328
Add a new deanonymization operator.
329
330
Parameters:
331
- operator_cls (Type[Operator]): The operator class to add
332
"""
333
334
def remove_anonymize_operator(self, operator_cls: Type[Operator]) -> None:
335
"""
336
Remove an anonymization operator.
337
338
Parameters:
339
- operator_cls (Type[Operator]): The operator class to remove
340
"""
341
342
def remove_deanonymize_operator(self, operator_cls: Type[Operator]) -> None:
343
"""
344
Remove a deanonymization operator.
345
346
Parameters:
347
- operator_cls (Type[Operator]): The operator class to remove
348
"""
349
```
350
351
**Usage Example:**
352
353
```python
354
from presidio_anonymizer.operators import OperatorsFactory, Operator
355
356
# Access the factory (typically used internally by engines)
357
factory = OperatorsFactory()
358
359
# Get available operators
360
anonymizers = factory.get_anonymizers()
361
print(anonymizers.keys()) # ['replace', 'mask', 'hash', 'encrypt', 'redact', 'keep', 'custom']
362
363
deanonymizers = factory.get_deanonymizers()
364
print(deanonymizers.keys()) # ['decrypt', 'deanonymize_keep']
365
366
# Add custom operator
367
class MyCustomOperator(Operator):
368
def operate(self, text, params):
369
return f"CUSTOM_{text.upper()}"
370
371
factory.add_anonymize_operator(MyCustomOperator)
372
```
373
374
## Creating Custom Operators
375
376
Extend the base Operator class to create custom transformation logic:
377
378
```python
379
from presidio_anonymizer.operators import Operator
380
381
class CustomROT13(Operator):
382
def operate(self, text, params):
383
"""Apply ROT13 transformation."""
384
return text.encode('rot13')
385
386
def validate(self, params):
387
"""Validate operator parameters."""
388
pass # No parameters needed for ROT13
389
390
# Register with engine
391
from presidio_anonymizer import AnonymizerEngine
392
393
engine = AnonymizerEngine()
394
engine.add_anonymizer(CustomROT13)
395
396
# Use in configuration
397
operators = {"PERSON": OperatorConfig("custom_rot13")}
398
```
399
400
## Operator Selection Guidelines
401
402
- **Replace**: General purpose, good for placeholder values
403
- **Mask**: Partial visibility, good for debugging or partial concealment
404
- **Redact**: Complete removal, highest security
405
- **Hash**: Consistent anonymization, enables analytics on anonymized data
406
- **Encrypt**: Reversible, good for temporary anonymization
407
- **Keep**: Selective anonymization, preserve certain data types