or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

batch-processing.mdcore-anonymization.mddeanonymization.mdentities.mdindex.mdoperators.md

index.mddocs/

0

# Presidio Anonymizer

1

2

Presidio Anonymizer is a Python library for anonymizing and deanonymizing text by replacing or transforming personally identifiable information (PII) entities. It takes analyzer results containing PII locations and applies configurable anonymization operators like masking, replacement, encryption, or redaction.

3

4

## Package Information

5

6

- **Package Name**: presidio_anonymizer

7

- **Language**: Python

8

- **Installation**: `pip install presidio-anonymizer`

9

- **Server Mode**: `pip install presidio-anonymizer[server]` (includes Flask REST API)

10

- **Azure Health Services**: `pip install presidio-anonymizer[ahds]` (includes AHDS operators)

11

12

## Core Imports

13

14

```python

15

from presidio_anonymizer import (

16

AnonymizerEngine,

17

DeanonymizeEngine,

18

BatchAnonymizerEngine,

19

RecognizerResult,

20

OperatorConfig,

21

ConflictResolutionStrategy

22

)

23

```

24

25

## Basic Usage

26

27

```python

28

from presidio_anonymizer import AnonymizerEngine

29

from presidio_anonymizer.entities import RecognizerResult, OperatorConfig

30

31

# Initialize the anonymizer engine

32

engine = AnonymizerEngine()

33

34

# Define the text to anonymize

35

text = "My name is John Doe and my email is john.doe@example.com"

36

37

# Define analyzer results (typically from presidio-analyzer)

38

analyzer_results = [

39

RecognizerResult(entity_type="PERSON", start=11, end=19, score=0.9),

40

RecognizerResult(entity_type="EMAIL_ADDRESS", start=36, end=57, score=0.9)

41

]

42

43

# Configure anonymization operators

44

operators = {

45

"PERSON": OperatorConfig("replace", {"new_value": "[PERSON]"}),

46

"EMAIL_ADDRESS": OperatorConfig("mask", {"masking_char": "*", "chars_to_mask": 5, "from_end": True})

47

}

48

49

# Anonymize the text

50

result = engine.anonymize(

51

text=text,

52

analyzer_results=analyzer_results,

53

operators=operators

54

)

55

56

print(result.text) # "My name is [PERSON] and my email is john.***@example.com"

57

```

58

59

## Architecture

60

61

Presidio Anonymizer uses a modular architecture:

62

63

- **Engines**: Core processing engines (AnonymizerEngine, DeanonymizeEngine, BatchAnonymizerEngine)

64

- **Operators**: Pluggable transformation modules (Replace, Mask, Hash, Encrypt, etc.)

65

- **Entities**: Data structures for configuration and results (RecognizerResult, OperatorConfig, EngineResult)

66

- **Conflict Resolution**: Strategies for handling overlapping PII entities

67

- **Factory Pattern**: OperatorsFactory manages available operators and enables custom extensions

68

69

## Capabilities

70

71

### Core Anonymization

72

73

Primary anonymization functionality using the AnonymizerEngine. Processes text with analyzer results and applies configurable operators to transform PII entities.

74

75

```python { .api }

76

class AnonymizerEngine:

77

def anonymize(

78

self,

79

text: str,

80

analyzer_results: List[RecognizerResult],

81

operators: Optional[Dict[str, OperatorConfig]] = None,

82

conflict_resolution: ConflictResolutionStrategy = ConflictResolutionStrategy.MERGE_SIMILAR_OR_CONTAINED

83

) -> EngineResult: ...

84

85

def get_anonymizers(self) -> List[str]: ...

86

def add_anonymizer(self, anonymizer_cls: Type[Operator]) -> None: ...

87

def remove_anonymizer(self, anonymizer_cls: Type[Operator]) -> None: ...

88

```

89

90

[Core Anonymization](./core-anonymization.md)

91

92

### Deanonymization

93

94

Reverse anonymization operations using the DeanonymizeEngine. Restores original text from anonymized content when using reversible operators like encryption.

95

96

```python { .api }

97

class DeanonymizeEngine:

98

def deanonymize(

99

self,

100

text: str,

101

entities: List[OperatorResult],

102

operators: Dict[str, OperatorConfig]

103

) -> EngineResult: ...

104

105

def get_deanonymizers(self) -> List[str]: ...

106

def add_deanonymizer(self, deanonymizer_cls: Type[Operator]) -> None: ...

107

def remove_deanonymizer(self, deanonymizer_cls: Type[Operator]) -> None: ...

108

```

109

110

[Deanonymization](./deanonymization.md)

111

112

### Batch Processing

113

114

Batch anonymization for lists and dictionaries using the BatchAnonymizerEngine. Efficiently processes multiple texts or structured data formats.

115

116

```python { .api }

117

class BatchAnonymizerEngine:

118

def __init__(self, anonymizer_engine: Optional[AnonymizerEngine] = None): ...

119

120

def anonymize_list(

121

self,

122

texts: List[Optional[Union[str, bool, int, float]]],

123

recognizer_results_list: List[List[RecognizerResult]],

124

**kwargs

125

) -> List[Union[str, Any]]: ...

126

127

def anonymize_dict(

128

self,

129

analyzer_results: Iterable[DictRecognizerResult],

130

**kwargs

131

) -> Dict[str, str]: ...

132

```

133

134

[Batch Processing](./batch-processing.md)

135

136

### Anonymization Operators

137

138

Built-in transformation operators for different anonymization strategies including replacement, masking, encryption, hashing, and redaction.

139

140

```python { .api }

141

# Replacement operators

142

class Replace(Operator): ...

143

class Redact(Operator): ...

144

class Keep(Operator): ...

145

146

# Transformation operators

147

class Mask(Operator): ...

148

class Hash(Operator): ...

149

150

# Encryption operators

151

class Encrypt(Operator): ...

152

class Decrypt(Operator): ...

153

154

# Custom operators

155

class Custom(Operator): ...

156

```

157

158

[Anonymization Operators](./operators.md)

159

160

### Data Entities

161

162

Core data structures for configuration, input, and results including recognizer results, operator configurations, and engine outputs.

163

164

```python { .api }

165

class RecognizerResult:

166

def __init__(self, entity_type: str, start: int, end: int, score: float): ...

167

@classmethod

168

def from_json(cls, data: Dict) -> "RecognizerResult": ...

169

170

class OperatorConfig:

171

def __init__(self, operator_name: str, params: Dict = None): ...

172

@classmethod

173

def from_json(cls, params: Dict) -> "OperatorConfig": ...

174

175

class EngineResult:

176

text: str

177

items: List[OperatorResult]

178

```

179

180

[Data Entities](./entities.md)

181

182

## Types

183

184

```python { .api }

185

from enum import Enum

186

from typing import Dict, List, Optional, Union, Type, Iterable, Any

187

188

class ConflictResolutionStrategy(Enum):

189

MERGE_SIMILAR_OR_CONTAINED = "merge_similar_or_contained"

190

REMOVE_INTERSECTIONS = "remove_intersections"

191

192

class InvalidParamError(Exception):

193

"""Exception raised for invalid parameters."""

194

195

# Type aliases

196

OperatorType = Enum # Anonymize, Deanonymize

197

PIIEntity = object # Base entity with start/end positions

198

```