0
# Data Entities
1
2
Core data structures used for configuration, input, and results in presidio-anonymizer operations. These entities enable structured communication between components and provide type-safe configuration.
3
4
## Capabilities
5
6
### Recognizer Results
7
8
Represents detected PII entities with location and confidence information, typically provided by presidio-analyzer.
9
10
```python { .api }
11
class RecognizerResult:
12
def __init__(self, entity_type: str, start: int, end: int, score: float):
13
"""
14
Initialize RecognizerResult.
15
16
Parameters:
17
- entity_type (str): Type of PII entity (e.g., "PERSON", "EMAIL_ADDRESS")
18
- start (int): Start position in text (inclusive)
19
- end (int): End position in text (exclusive)
20
- score (float): Confidence score (0.0 to 1.0)
21
"""
22
23
@classmethod
24
def from_json(cls, data: Dict) -> "RecognizerResult":
25
"""
26
Create RecognizerResult from JSON dictionary.
27
28
Parameters:
29
- data (Dict): Dictionary with keys: entity_type, start, end, score
30
31
Returns:
32
RecognizerResult: New instance from JSON data
33
"""
34
35
def has_conflict(self, other: "RecognizerResult") -> bool:
36
"""
37
Check if this result conflicts with another result.
38
39
Parameters:
40
- other (RecognizerResult): Another recognizer result to check against
41
42
Returns:
43
bool: True if results conflict (overlap with different types or lower score)
44
"""
45
46
def intersects(self, other: "RecognizerResult") -> int:
47
"""
48
Check intersection with another result.
49
50
Parameters:
51
- other (RecognizerResult): Another recognizer result
52
53
Returns:
54
int: Number of intersecting characters (0 if no intersection)
55
"""
56
57
def equal_indices(self, other: "RecognizerResult") -> bool:
58
"""
59
Check if two results have the same start and end positions.
60
61
Parameters:
62
- other (RecognizerResult): Another recognizer result
63
64
Returns:
65
bool: True if indices match
66
"""
67
68
def contains(self, other: "RecognizerResult") -> bool:
69
"""
70
Check if this result contains another result within its boundaries.
71
72
Parameters:
73
- other (RecognizerResult): Another recognizer result
74
75
Returns:
76
bool: True if this result contains the other result
77
"""
78
```
79
80
**Usage Examples:**
81
82
```python
83
from presidio_anonymizer.entities import RecognizerResult
84
85
# Create from constructor
86
result = RecognizerResult(
87
entity_type="PERSON",
88
start=11,
89
end=19,
90
score=0.85
91
)
92
93
# Create from JSON (typically used in API scenarios)
94
json_data = {
95
"entity_type": "EMAIL_ADDRESS",
96
"start": 25,
97
"end": 42,
98
"score": 0.95
99
}
100
result = RecognizerResult.from_json(json_data)
101
102
# Check for conflicts
103
result1 = RecognizerResult("PERSON", 10, 20, 0.8)
104
result2 = RecognizerResult("LOCATION", 15, 25, 0.9)
105
if result1.has_conflict(result2):
106
print("Results overlap and may need conflict resolution")
107
```
108
109
### Operator Configuration
110
111
Configuration object that specifies which operator to use and its parameters for anonymizing specific entity types.
112
113
```python { .api }
114
class OperatorConfig:
115
def __init__(self, operator_name: str, params: Dict = None):
116
"""
117
Create operator configuration.
118
119
Parameters:
120
- operator_name (str): Name of the operator ("replace", "mask", "encrypt", etc.)
121
- params (Dict, optional): Parameters for the operator
122
"""
123
124
@classmethod
125
def from_json(cls, params: Dict) -> "OperatorConfig":
126
"""
127
Create OperatorConfig from JSON dictionary.
128
129
Parameters:
130
- params (Dict): Dictionary with "type" key and operator parameters
131
132
Returns:
133
OperatorConfig: New configuration instance
134
"""
135
136
def __eq__(self, other: "OperatorConfig") -> bool:
137
"""Check equality with another OperatorConfig."""
138
```
139
140
**Usage Examples:**
141
142
```python
143
from presidio_anonymizer.entities import OperatorConfig
144
145
# Simple operator without parameters
146
redact_config = OperatorConfig("redact")
147
148
# Operator with parameters
149
mask_config = OperatorConfig("mask", {
150
"masking_char": "*",
151
"chars_to_mask": 4,
152
"from_end": True
153
})
154
155
# Complex operator configuration
156
encrypt_config = OperatorConfig("encrypt", {
157
"key": "my-secret-key-32-characters-long12"
158
})
159
160
# Create from JSON (API scenarios)
161
json_config = {
162
"type": "replace",
163
"new_value": "[CONFIDENTIAL]"
164
}
165
replace_config = OperatorConfig.from_json(json_config)
166
167
# Use in operators dictionary
168
operators = {
169
"PERSON": mask_config,
170
"EMAIL_ADDRESS": encrypt_config,
171
"PHONE_NUMBER": redact_config
172
}
173
```
174
175
### Engine Results
176
177
Contains the results of anonymization or deanonymization operations, including the transformed text and metadata.
178
179
```python { .api }
180
class EngineResult:
181
"""
182
Result of anonymization or deanonymization operation.
183
184
Attributes:
185
- text (str): The transformed text
186
- items (List[OperatorResult]): Metadata about applied transformations
187
"""
188
189
def __init__(self, text: str = None, items: List[OperatorResult] = None):
190
"""
191
Initialize EngineResult.
192
193
Parameters:
194
- text (str, optional): The transformed text
195
- items (List[OperatorResult], optional): Metadata about transformations, defaults to empty list
196
"""
197
198
def to_json(self) -> str:
199
"""
200
Convert result to JSON string.
201
202
Returns:
203
str: JSON representation of the result
204
"""
205
206
def set_text(self, text: str) -> None:
207
"""
208
Set the text attribute.
209
210
Parameters:
211
- text (str): The text to set
212
"""
213
214
def add_item(self, item: OperatorResult) -> None:
215
"""
216
Add an operator result item.
217
218
Parameters:
219
- item (OperatorResult): The operator result to add
220
"""
221
```
222
223
**Usage Example:**
224
225
```python
226
from presidio_anonymizer import AnonymizerEngine
227
228
engine = AnonymizerEngine()
229
result = engine.anonymize(text, analyzer_results, operators)
230
231
print(f"Anonymized text: {result.text}")
232
print(f"Number of transformations: {len(result.items)}")
233
234
# Access transformation details
235
for item in result.items:
236
print(f"Entity {item.entity_type} at {item.start}-{item.end} -> '{item.text}'")
237
238
# Convert to JSON for API responses
239
json_result = result.to_json()
240
```
241
242
### Operator Results
243
244
Metadata about individual operator transformations applied during anonymization.
245
246
```python { .api }
247
class OperatorResult:
248
"""
249
Result of applying an operator to a PII entity.
250
251
Attributes:
252
- start (int): Start position of transformed text
253
- end (int): End position of transformed text
254
- entity_type (str): Type of PII entity that was transformed
255
- text (str): The replacement text that was inserted
256
- operator (str): Name of operator that was applied
257
"""
258
259
def __init__(self, start: int, end: int, entity_type: str, text: str = None, operator: str = None):
260
"""
261
Initialize OperatorResult.
262
263
Parameters:
264
- start (int): Start position of transformed text
265
- end (int): End position of transformed text
266
- entity_type (str): Type of PII entity that was transformed
267
- text (str, optional): The replacement text that was inserted
268
- operator (str, optional): Name of operator that was applied
269
"""
270
271
def to_dict(self) -> Dict:
272
"""
273
Convert to dictionary representation.
274
275
Returns:
276
Dict: Dictionary with all attributes
277
"""
278
279
@classmethod
280
def from_json(cls, json: Dict) -> "OperatorResult":
281
"""
282
Create OperatorResult from JSON dictionary.
283
284
Parameters:
285
- json (Dict): Dictionary with operator result data
286
287
Returns:
288
OperatorResult: New instance from JSON data
289
"""
290
```
291
292
**Usage Example:**
293
294
```python
295
# Access operator results from engine result
296
result = engine.anonymize(text, analyzer_results, operators)
297
298
for op_result in result.items:
299
print(f"Applied {op_result.operator} to {op_result.entity_type}")
300
print(f"Position: {op_result.start}-{op_result.end}")
301
print(f"Replacement: '{op_result.text}'")
302
```
303
304
### PII Entity Base
305
306
Base class for PII entities with position information.
307
308
```python { .api }
309
class PIIEntity:
310
"""
311
Base class for PII entities with position information.
312
313
Attributes:
314
- start (int): Start position in text
315
- end (int): End position in text
316
- entity_type (str): Type of PII entity
317
"""
318
319
def __init__(self, start: int, end: int, entity_type: str):
320
"""
321
Initialize PIIEntity.
322
323
Parameters:
324
- start (int): Start position in text
325
- end (int): End position in text
326
- entity_type (str): Type of PII entity
327
"""
328
```
329
330
### Dictionary Recognizer Results
331
332
Specialized recognizer result for structured data like dictionaries and nested objects.
333
334
```python { .api }
335
@dataclass
336
class DictRecognizerResult:
337
"""
338
Recognizer result for dictionary/structured data analysis.
339
340
This is a dataclass that represents analyzer results for structured data like dictionaries.
341
342
Attributes:
343
- key (str): Dictionary key or field name
344
- value (Union[str, List[str], dict]): The value associated with the key
345
- recognizer_results (Union[List[RecognizerResult], List[List[RecognizerResult]], Iterator["DictRecognizerResult"]]): Analyzer results for the value
346
"""
347
```
348
349
**Usage Example:**
350
351
```python
352
from presidio_anonymizer.entities import DictRecognizerResult
353
354
# Used with BatchAnonymizerEngine for dictionary data
355
dict_result = DictRecognizerResult(
356
key="user_email",
357
value="john@example.com",
358
recognizer_results=[RecognizerResult("EMAIL_ADDRESS", 0, 16, 0.9)]
359
)
360
```
361
362
### Conflict Resolution Strategy
363
364
Enumeration defining strategies for handling overlapping PII entities.
365
366
```python { .api }
367
class ConflictResolutionStrategy(Enum):
368
MERGE_SIMILAR_OR_CONTAINED = "merge_similar_or_contained"
369
REMOVE_INTERSECTIONS = "remove_intersections"
370
```
371
372
**Usage Example:**
373
374
```python
375
from presidio_anonymizer.entities import ConflictResolutionStrategy
376
377
# Use in anonymization calls
378
result = engine.anonymize(
379
text=text,
380
analyzer_results=analyzer_results,
381
operators=operators,
382
conflict_resolution=ConflictResolutionStrategy.REMOVE_INTERSECTIONS
383
)
384
```
385
386
### Exception Classes
387
388
Custom exceptions for error handling.
389
390
```python { .api }
391
class InvalidParamError(Exception):
392
"""Exception raised for invalid parameters."""
393
394
def __init__(self, msg: str):
395
"""
396
Initialize InvalidParamError.
397
398
Parameters:
399
- msg (str): Error message
400
"""
401
```
402
403
**Usage Example:**
404
405
```python
406
from presidio_anonymizer.entities import InvalidParamError
407
408
try:
409
result = engine.anonymize(text, analyzer_results, invalid_operators)
410
except InvalidParamError as e:
411
print(f"Invalid configuration: {e}")
412
```
413
414
## Entity Relationships
415
416
- **Input Flow**: `RecognizerResult` + `OperatorConfig` → Engine → `EngineResult`
417
- **Output Details**: `EngineResult` contains `OperatorResult` items
418
- **Batch Processing**: `DictRecognizerResult` for structured data
419
- **Error Handling**: `InvalidParamError` for configuration issues
420
- **Base Classes**: `PIIEntity` provides common position functionality
421
422
## JSON Serialization
423
424
Most entities support JSON serialization for API integration:
425
426
```python
427
# RecognizerResult from JSON
428
result = RecognizerResult.from_json({
429
"entity_type": "PERSON",
430
"start": 0,
431
"end": 8,
432
"score": 0.9
433
})
434
435
# OperatorConfig from JSON
436
config = OperatorConfig.from_json({
437
"type": "mask",
438
"masking_char": "*",
439
"chars_to_mask": 4
440
})
441
442
# EngineResult to JSON
443
json_output = engine_result.to_json()
444
```