Tessl Tile for pypi/presidio-anonymizer@2.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

batch-processing.md core-anonymization.md deanonymization.md entities.md index.md operators.md

entities.mddocs/

0
# Data Entities
1

2
Core data structures used for configuration, input, and results in presidio-anonymizer operations. These entities enable structured communication between components and provide type-safe configuration.
3

4
## Capabilities
5

6
### Recognizer Results
7

8
Represents detected PII entities with location and confidence information, typically provided by presidio-analyzer.
9

10
```python { .api }
11
class RecognizerResult:
12
    def __init__(self, entity_type: str, start: int, end: int, score: float):
13
        """
14
        Initialize RecognizerResult.
15

16
        Parameters:
17
        - entity_type (str): Type of PII entity (e.g., "PERSON", "EMAIL_ADDRESS")
18
        - start (int): Start position in text (inclusive)
19
        - end (int): End position in text (exclusive)
20
        - score (float): Confidence score (0.0 to 1.0)
21
        """
22

23
    @classmethod
24
    def from_json(cls, data: Dict) -> "RecognizerResult":
25
        """
26
        Create RecognizerResult from JSON dictionary.
27

28
        Parameters:
29
        - data (Dict): Dictionary with keys: entity_type, start, end, score
30

31
        Returns:
32
        RecognizerResult: New instance from JSON data
33
        """
34

35
    def has_conflict(self, other: "RecognizerResult") -> bool:
36
        """
37
        Check if this result conflicts with another result.
38

39
        Parameters:
40
        - other (RecognizerResult): Another recognizer result to check against
41

42
        Returns:
43
        bool: True if results conflict (overlap with different types or lower score)
44
        """
45

46
    def intersects(self, other: "RecognizerResult") -> int:
47
        """
48
        Check intersection with another result.
49

50
        Parameters:
51
        - other (RecognizerResult): Another recognizer result
52

53
        Returns:
54
        int: Number of intersecting characters (0 if no intersection)
55
        """
56

57
    def equal_indices(self, other: "RecognizerResult") -> bool:
58
        """
59
        Check if two results have the same start and end positions.
60

61
        Parameters:
62
        - other (RecognizerResult): Another recognizer result
63

64
        Returns:
65
        bool: True if indices match
66
        """
67
    
68
    def contains(self, other: "RecognizerResult") -> bool:
69
        """
70
        Check if this result contains another result within its boundaries.
71

72
        Parameters:
73
        - other (RecognizerResult): Another recognizer result
74

75
        Returns:
76
        bool: True if this result contains the other result
77
        """
78
```
79

80
**Usage Examples:**
81

82
```python
83
from presidio_anonymizer.entities import RecognizerResult
84

85
# Create from constructor
86
result = RecognizerResult(
87
    entity_type="PERSON",
88
    start=11,
89
    end=19,
90
    score=0.85
91
)
92

93
# Create from JSON (typically used in API scenarios)
94
json_data = {
95
    "entity_type": "EMAIL_ADDRESS",
96
    "start": 25,
97
    "end": 42,
98
    "score": 0.95
99
}
100
result = RecognizerResult.from_json(json_data)
101

102
# Check for conflicts
103
result1 = RecognizerResult("PERSON", 10, 20, 0.8)
104
result2 = RecognizerResult("LOCATION", 15, 25, 0.9)
105
if result1.has_conflict(result2):
106
    print("Results overlap and may need conflict resolution")
107
```
108

109
### Operator Configuration
110

111
Configuration object that specifies which operator to use and its parameters for anonymizing specific entity types.
112

113
```python { .api }
114
class OperatorConfig:
115
    def __init__(self, operator_name: str, params: Dict = None):
116
        """
117
        Create operator configuration.
118

119
        Parameters:
120
        - operator_name (str): Name of the operator ("replace", "mask", "encrypt", etc.)
121
        - params (Dict, optional): Parameters for the operator
122
        """
123

124
    @classmethod
125
    def from_json(cls, params: Dict) -> "OperatorConfig":
126
        """
127
        Create OperatorConfig from JSON dictionary.
128

129
        Parameters:
130
        - params (Dict): Dictionary with "type" key and operator parameters
131

132
        Returns:
133
        OperatorConfig: New configuration instance
134
        """
135

136
    def __eq__(self, other: "OperatorConfig") -> bool:
137
        """Check equality with another OperatorConfig."""
138
```
139

140
**Usage Examples:**
141

142
```python
143
from presidio_anonymizer.entities import OperatorConfig
144

145
# Simple operator without parameters
146
redact_config = OperatorConfig("redact")
147

148
# Operator with parameters
149
mask_config = OperatorConfig("mask", {
150
    "masking_char": "*",
151
    "chars_to_mask": 4,
152
    "from_end": True
153
})
154

155
# Complex operator configuration
156
encrypt_config = OperatorConfig("encrypt", {
157
    "key": "my-secret-key-32-characters-long12"
158
})
159

160
# Create from JSON (API scenarios)
161
json_config = {
162
    "type": "replace",
163
    "new_value": "[CONFIDENTIAL]"
164
}
165
replace_config = OperatorConfig.from_json(json_config)
166

167
# Use in operators dictionary
168
operators = {
169
    "PERSON": mask_config,
170
    "EMAIL_ADDRESS": encrypt_config,
171
    "PHONE_NUMBER": redact_config
172
}
173
```
174

175
### Engine Results
176

177
Contains the results of anonymization or deanonymization operations, including the transformed text and metadata.
178

179
```python { .api }
180
class EngineResult:
181
    """
182
    Result of anonymization or deanonymization operation.
183

184
    Attributes:
185
    - text (str): The transformed text
186
    - items (List[OperatorResult]): Metadata about applied transformations
187
    """
188
    
189
    def __init__(self, text: str = None, items: List[OperatorResult] = None):
190
        """
191
        Initialize EngineResult.
192
        
193
        Parameters:
194
        - text (str, optional): The transformed text
195
        - items (List[OperatorResult], optional): Metadata about transformations, defaults to empty list
196
        """
197

198
    def to_json(self) -> str:
199
        """
200
        Convert result to JSON string.
201

202
        Returns:
203
        str: JSON representation of the result
204
        """
205
    
206
    def set_text(self, text: str) -> None:
207
        """
208
        Set the text attribute.
209
        
210
        Parameters:
211
        - text (str): The text to set
212
        """
213
    
214
    def add_item(self, item: OperatorResult) -> None:
215
        """
216
        Add an operator result item.
217
        
218
        Parameters:
219
        - item (OperatorResult): The operator result to add
220
        """
221
```
222

223
**Usage Example:**
224

225
```python
226
from presidio_anonymizer import AnonymizerEngine
227

228
engine = AnonymizerEngine()
229
result = engine.anonymize(text, analyzer_results, operators)
230

231
print(f"Anonymized text: {result.text}")
232
print(f"Number of transformations: {len(result.items)}")
233

234
# Access transformation details
235
for item in result.items:
236
    print(f"Entity {item.entity_type} at {item.start}-{item.end} -> '{item.text}'")
237

238
# Convert to JSON for API responses
239
json_result = result.to_json()
240
```
241

242
### Operator Results
243

244
Metadata about individual operator transformations applied during anonymization.
245

246
```python { .api }
247
class OperatorResult:
248
    """
249
    Result of applying an operator to a PII entity.
250

251
    Attributes:
252
    - start (int): Start position of transformed text
253
    - end (int): End position of transformed text  
254
    - entity_type (str): Type of PII entity that was transformed
255
    - text (str): The replacement text that was inserted
256
    - operator (str): Name of operator that was applied
257
    """
258
    
259
    def __init__(self, start: int, end: int, entity_type: str, text: str = None, operator: str = None):
260
        """
261
        Initialize OperatorResult.
262
        
263
        Parameters:
264
        - start (int): Start position of transformed text
265
        - end (int): End position of transformed text
266
        - entity_type (str): Type of PII entity that was transformed
267
        - text (str, optional): The replacement text that was inserted
268
        - operator (str, optional): Name of operator that was applied
269
        """
270
    
271
    def to_dict(self) -> Dict:
272
        """
273
        Convert to dictionary representation.
274
        
275
        Returns:
276
        Dict: Dictionary with all attributes
277
        """
278
    
279
    @classmethod
280
    def from_json(cls, json: Dict) -> "OperatorResult":
281
        """
282
        Create OperatorResult from JSON dictionary.
283
        
284
        Parameters:
285
        - json (Dict): Dictionary with operator result data
286
        
287
        Returns:
288
        OperatorResult: New instance from JSON data
289
        """
290
```
291

292
**Usage Example:**
293

294
```python
295
# Access operator results from engine result
296
result = engine.anonymize(text, analyzer_results, operators)
297

298
for op_result in result.items:
299
    print(f"Applied {op_result.operator} to {op_result.entity_type}")
300
    print(f"Position: {op_result.start}-{op_result.end}")
301
    print(f"Replacement: '{op_result.text}'")
302
```
303

304
### PII Entity Base
305

306
Base class for PII entities with position information.
307

308
```python { .api }
309
class PIIEntity:
310
    """
311
    Base class for PII entities with position information.
312

313
    Attributes:
314
    - start (int): Start position in text
315
    - end (int): End position in text
316
    - entity_type (str): Type of PII entity
317
    """
318

319
    def __init__(self, start: int, end: int, entity_type: str):
320
        """
321
        Initialize PIIEntity.
322
        
323
        Parameters:
324
        - start (int): Start position in text
325
        - end (int): End position in text  
326
        - entity_type (str): Type of PII entity
327
        """
328
```
329

330
### Dictionary Recognizer Results
331

332
Specialized recognizer result for structured data like dictionaries and nested objects.
333

334
```python { .api }
335
@dataclass
336
class DictRecognizerResult:
337
    """
338
    Recognizer result for dictionary/structured data analysis.
339
    
340
    This is a dataclass that represents analyzer results for structured data like dictionaries.
341

342
    Attributes:
343
    - key (str): Dictionary key or field name
344
    - value (Union[str, List[str], dict]): The value associated with the key
345
    - recognizer_results (Union[List[RecognizerResult], List[List[RecognizerResult]], Iterator["DictRecognizerResult"]]): Analyzer results for the value
346
    """
347
```
348

349
**Usage Example:**
350

351
```python
352
from presidio_anonymizer.entities import DictRecognizerResult
353

354
# Used with BatchAnonymizerEngine for dictionary data
355
dict_result = DictRecognizerResult(
356
    key="user_email",
357
    value="john@example.com",
358
    recognizer_results=[RecognizerResult("EMAIL_ADDRESS", 0, 16, 0.9)]
359
)
360
```
361

362
### Conflict Resolution Strategy
363

364
Enumeration defining strategies for handling overlapping PII entities.
365

366
```python { .api }
367
class ConflictResolutionStrategy(Enum):
368
    MERGE_SIMILAR_OR_CONTAINED = "merge_similar_or_contained"
369
    REMOVE_INTERSECTIONS = "remove_intersections"
370
```
371

372
**Usage Example:**
373

374
```python
375
from presidio_anonymizer.entities import ConflictResolutionStrategy
376

377
# Use in anonymization calls
378
result = engine.anonymize(
379
    text=text,
380
    analyzer_results=analyzer_results,
381
    operators=operators,
382
    conflict_resolution=ConflictResolutionStrategy.REMOVE_INTERSECTIONS
383
)
384
```
385

386
### Exception Classes
387

388
Custom exceptions for error handling.
389

390
```python { .api }
391
class InvalidParamError(Exception):
392
    """Exception raised for invalid parameters."""
393
    
394
    def __init__(self, msg: str):
395
        """
396
        Initialize InvalidParamError.
397
        
398
        Parameters:
399
        - msg (str): Error message
400
        """
401
```
402

403
**Usage Example:**
404

405
```python
406
from presidio_anonymizer.entities import InvalidParamError
407

408
try:
409
    result = engine.anonymize(text, analyzer_results, invalid_operators)
410
except InvalidParamError as e:
411
    print(f"Invalid configuration: {e}")
412
```
413

414
## Entity Relationships
415

416
- **Input Flow**: `RecognizerResult` + `OperatorConfig` → Engine → `EngineResult`
417
- **Output Details**: `EngineResult` contains `OperatorResult` items
418
- **Batch Processing**: `DictRecognizerResult` for structured data
419
- **Error Handling**: `InvalidParamError` for configuration issues
420
- **Base Classes**: `PIIEntity` provides common position functionality
421

422
## JSON Serialization
423

424
Most entities support JSON serialization for API integration:
425

426
```python
427
# RecognizerResult from JSON
428
result = RecognizerResult.from_json({
429
    "entity_type": "PERSON",
430
    "start": 0,
431
    "end": 8,
432
    "score": 0.9
433
})
434

435
# OperatorConfig from JSON
436
config = OperatorConfig.from_json({
437
    "type": "mask",
438
    "masking_char": "*",
439
    "chars_to_mask": 4
440
})
441

442
# EngineResult to JSON
443
json_output = engine_result.to_json()
444
```

Version

Tile

Files

entities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

entities.mddocs/