or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

batch-processing.mdcore-anonymization.mddeanonymization.mdentities.mdindex.mdoperators.md

operators.mddocs/

0

# Anonymization Operators

1

2

Presidio Anonymizer provides built-in transformation operators for different anonymization strategies. Operators are configurable modules that transform detected PII entities according to specific algorithms.

3

4

## Capabilities

5

6

### Text Replacement

7

8

Replace detected PII with specified text values.

9

10

```python { .api }

11

class Replace(Operator):

12

"""Replaces detected text with specified replacement text."""

13

14

# Configuration

15

OperatorConfig("replace", {

16

"new_value": str # Text to replace with

17

})

18

```

19

20

**Usage Examples:**

21

22

```python

23

from presidio_anonymizer.entities import OperatorConfig

24

25

# Replace with placeholder

26

replace_config = OperatorConfig("replace", {"new_value": "[REDACTED]"})

27

28

# Replace with entity type

29

replace_config = OperatorConfig("replace", {"new_value": "<PERSON>"})

30

31

# Replace with custom text

32

replace_config = OperatorConfig("replace", {"new_value": "CONFIDENTIAL"})

33

```

34

35

### Text Redaction

36

37

Completely remove detected PII from text.

38

39

```python { .api }

40

class Redact(Operator):

41

"""Completely removes/redacts detected text."""

42

43

# Configuration (no parameters required)

44

OperatorConfig("redact")

45

```

46

47

**Usage Example:**

48

49

```python

50

# Simple redaction - removes text entirely

51

redact_config = OperatorConfig("redact")

52

53

# Result: "Contact at " (name completely removed)

54

```

55

56

### Text Masking

57

58

Mask characters in detected text with specified masking character.

59

60

```python { .api }

61

class Mask(Operator):

62

"""Masks characters in detected text with specified masking character."""

63

64

# Configuration

65

OperatorConfig("mask", {

66

"masking_char": str, # Character to use for masking (default: "*")

67

"chars_to_mask": int, # Number of characters to mask (default: all)

68

"from_end": bool # Mask from end instead of beginning (default: False)

69

})

70

```

71

72

**Usage Examples:**

73

74

```python

75

# Mask all characters

76

mask_all = OperatorConfig("mask", {"masking_char": "*"})

77

78

# Mask last 4 characters

79

mask_end = OperatorConfig("mask", {

80

"masking_char": "*",

81

"chars_to_mask": 4,

82

"from_end": True

83

})

84

85

# Mask first 3 characters

86

mask_start = OperatorConfig("mask", {

87

"masking_char": "#",

88

"chars_to_mask": 3,

89

"from_end": False

90

})

91

92

# Examples:

93

# "john@email.com" -> "j***@email.com" (last 4 from end)

94

# "555-1234" -> "###-1234" (first 3)

95

```

96

97

### Hashing

98

99

Replace detected text with hash values.

100

101

```python { .api }

102

class Hash(Operator):

103

"""Replaces detected text with hash value."""

104

105

# Configuration

106

OperatorConfig("hash", {

107

"hash_type": str # Hash algorithm (default: "sha256")

108

})

109

```

110

111

**Usage Example:**

112

113

```python

114

# SHA256 hash (default)

115

hash_config = OperatorConfig("hash")

116

117

# Specific hash algorithm

118

hash_md5 = OperatorConfig("hash", {"hash_type": "md5"})

119

120

# Result: "Contact a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3 at..."

121

```

122

123

### Encryption (Reversible)

124

125

Encrypt detected text using AES encryption for reversible anonymization.

126

127

```python { .api }

128

class Encrypt(Operator):

129

"""Encrypts detected text using AES encryption."""

130

131

# Configuration

132

OperatorConfig("encrypt", {

133

"key": str # 32-character encryption key (required)

134

})

135

```

136

137

**Usage Example:**

138

139

```python

140

# AES encryption with key

141

encrypt_config = OperatorConfig("encrypt", {

142

"key": "my-secret-key-32-characters-long12" # Must be exactly 32 characters

143

})

144

145

# Result: encrypted text that can be decrypted later

146

```

147

148

### Keep Unchanged

149

150

Keep original text without any transformation.

151

152

```python { .api }

153

class Keep(Operator):

154

"""Keeps original text unchanged."""

155

156

# Configuration (no parameters required)

157

OperatorConfig("keep")

158

```

159

160

**Usage Example:**

161

162

```python

163

# Keep certain entity types unchanged

164

keep_config = OperatorConfig("keep")

165

166

# Useful for preserving certain PII types while anonymizing others

167

operators = {

168

"PERSON": OperatorConfig("replace", {"new_value": "[PERSON]"}),

169

"LOCATION": OperatorConfig("keep") # Keep locations unchanged

170

}

171

```

172

173

### Custom Operators

174

175

Create custom anonymization logic by extending the Operator base class.

176

177

```python { .api }

178

class Custom(Operator):

179

"""Allows custom anonymization logic."""

180

181

# Configuration

182

OperatorConfig("custom", {

183

"lambda": str # Python lambda expression as string

184

})

185

```

186

187

**Usage Example:**

188

189

```python

190

# Custom transformation using lambda

191

# Note: Custom operator uses eval() - ensure lambda expressions are trusted

192

custom_config = OperatorConfig("custom", {

193

"lambda": "lambda x: x.upper().replace(' ', '_')"

194

})

195

196

# Result: "JOHN_DOE" instead of "John Doe"

197

```

198

199

## Deanonymization Operators

200

201

### Decryption

202

203

Decrypt previously encrypted text using matching key.

204

205

```python { .api }

206

class Decrypt(Operator):

207

"""Decrypts previously encrypted text."""

208

209

# Configuration

210

OperatorConfig("decrypt", {

211

"key": str # Must match original encryption key

212

})

213

```

214

215

**Usage Example:**

216

217

```python

218

# Must use same key as encryption

219

decrypt_config = OperatorConfig("decrypt", {

220

"key": "my-secret-key-32-characters-long12" # Same as encrypt key

221

})

222

```

223

224

### Deanonymize Keep

225

226

Keep text unchanged during deanonymization.

227

228

```python { .api }

229

class DeanonymizeKeep(Operator):

230

"""Keeps text unchanged during deanonymization."""

231

232

# Configuration (no parameters required)

233

OperatorConfig("deanonymize_keep")

234

```

235

236

## Azure Health Data Services (Optional)

237

238

When the `ahds` optional dependency is installed, additional operators are available:

239

240

### AHDS Surrogate

241

242

Uses Azure Health Data Services for surrogate key generation.

243

244

```python { .api }

245

class AHDSSurrogate(Operator):

246

"""Azure Health Data Services surrogate key operator."""

247

# Available only when 'ahds' dependencies are installed

248

```

249

250

**Installation:**

251

252

```bash

253

# Install with AHDS support

254

pip install presidio-anonymizer[ahds]

255

```

256

257

**Availability Check:**

258

259

```python

260

from presidio_anonymizer.operators import AHDS_AVAILABLE

261

262

if AHDS_AVAILABLE:

263

from presidio_anonymizer.operators import AHDSSurrogate

264

265

# Configure AHDS surrogate operator

266

ahds_config = OperatorConfig("ahds_surrogate", {

267

# Configuration parameters for AHDS integration

268

})

269

270

# Use in operators dictionary

271

operators = {

272

"PERSON": ahds_config,

273

"PHONE_NUMBER": ahds_config

274

}

275

else:

276

print("AHDS operators not available - install with: pip install presidio-anonymizer[ahds]")

277

```

278

279

**Feature Check in Code:**

280

281

```python

282

# Programmatically check for AHDS availability

283

def get_available_operators():

284

"""Get list of all available operators including optional ones."""

285

from presidio_anonymizer.operators import AHDS_AVAILABLE

286

287

operators = ['replace', 'mask', 'hash', 'encrypt', 'redact', 'keep', 'custom']

288

if AHDS_AVAILABLE:

289

operators.append('ahds_surrogate')

290

291

return operators

292

```

293

294

## Operator Factory

295

296

Manage and extend available operators using the factory pattern.

297

298

```python { .api }

299

class OperatorsFactory:

300

"""Factory for managing anonymization and deanonymization operators."""

301

302

def get_anonymizers(self) -> Dict[str, Type[Operator]]:

303

"""

304

Get all available anonymization operators.

305

306

Returns:

307

Dict[str, Type[Operator]]: Dictionary mapping operator names to operator classes

308

"""

309

310

def get_deanonymizers(self) -> Dict[str, Type[Operator]]:

311

"""

312

Get all available deanonymization operators.

313

314

Returns:

315

Dict[str, Type[Operator]]: Dictionary mapping operator names to operator classes

316

"""

317

318

def add_anonymize_operator(self, operator_cls: Type[Operator]) -> None:

319

"""

320

Add a new anonymization operator.

321

322

Parameters:

323

- operator_cls (Type[Operator]): The operator class to add

324

"""

325

326

def add_deanonymize_operator(self, operator_cls: Type[Operator]) -> None:

327

"""

328

Add a new deanonymization operator.

329

330

Parameters:

331

- operator_cls (Type[Operator]): The operator class to add

332

"""

333

334

def remove_anonymize_operator(self, operator_cls: Type[Operator]) -> None:

335

"""

336

Remove an anonymization operator.

337

338

Parameters:

339

- operator_cls (Type[Operator]): The operator class to remove

340

"""

341

342

def remove_deanonymize_operator(self, operator_cls: Type[Operator]) -> None:

343

"""

344

Remove a deanonymization operator.

345

346

Parameters:

347

- operator_cls (Type[Operator]): The operator class to remove

348

"""

349

```

350

351

**Usage Example:**

352

353

```python

354

from presidio_anonymizer.operators import OperatorsFactory, Operator

355

356

# Access the factory (typically used internally by engines)

357

factory = OperatorsFactory()

358

359

# Get available operators

360

anonymizers = factory.get_anonymizers()

361

print(anonymizers.keys()) # ['replace', 'mask', 'hash', 'encrypt', 'redact', 'keep', 'custom']

362

363

deanonymizers = factory.get_deanonymizers()

364

print(deanonymizers.keys()) # ['decrypt', 'deanonymize_keep']

365

366

# Add custom operator

367

class MyCustomOperator(Operator):

368

def operate(self, text, params):

369

return f"CUSTOM_{text.upper()}"

370

371

factory.add_anonymize_operator(MyCustomOperator)

372

```

373

374

## Creating Custom Operators

375

376

Extend the base Operator class to create custom transformation logic:

377

378

```python

379

from presidio_anonymizer.operators import Operator

380

381

class CustomROT13(Operator):

382

def operate(self, text, params):

383

"""Apply ROT13 transformation."""

384

return text.encode('rot13')

385

386

def validate(self, params):

387

"""Validate operator parameters."""

388

pass # No parameters needed for ROT13

389

390

# Register with engine

391

from presidio_anonymizer import AnonymizerEngine

392

393

engine = AnonymizerEngine()

394

engine.add_anonymizer(CustomROT13)

395

396

# Use in configuration

397

operators = {"PERSON": OperatorConfig("custom_rot13")}

398

```

399

400

## Operator Selection Guidelines

401

402

- **Replace**: General purpose, good for placeholder values

403

- **Mask**: Partial visibility, good for debugging or partial concealment

404

- **Redact**: Complete removal, highest security

405

- **Hash**: Consistent anonymization, enables analytics on anonymized data

406

- **Encrypt**: Reversible, good for temporary anonymization

407

- **Keep**: Selective anonymization, preserve certain data types