or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

batch-processing.mdcore-anonymization.mddeanonymization.mdentities.mdindex.mdoperators.md

entities.mddocs/

0

# Data Entities

1

2

Core data structures used for configuration, input, and results in presidio-anonymizer operations. These entities enable structured communication between components and provide type-safe configuration.

3

4

## Capabilities

5

6

### Recognizer Results

7

8

Represents detected PII entities with location and confidence information, typically provided by presidio-analyzer.

9

10

```python { .api }

11

class RecognizerResult:

12

def __init__(self, entity_type: str, start: int, end: int, score: float):

13

"""

14

Initialize RecognizerResult.

15

16

Parameters:

17

- entity_type (str): Type of PII entity (e.g., "PERSON", "EMAIL_ADDRESS")

18

- start (int): Start position in text (inclusive)

19

- end (int): End position in text (exclusive)

20

- score (float): Confidence score (0.0 to 1.0)

21

"""

22

23

@classmethod

24

def from_json(cls, data: Dict) -> "RecognizerResult":

25

"""

26

Create RecognizerResult from JSON dictionary.

27

28

Parameters:

29

- data (Dict): Dictionary with keys: entity_type, start, end, score

30

31

Returns:

32

RecognizerResult: New instance from JSON data

33

"""

34

35

def has_conflict(self, other: "RecognizerResult") -> bool:

36

"""

37

Check if this result conflicts with another result.

38

39

Parameters:

40

- other (RecognizerResult): Another recognizer result to check against

41

42

Returns:

43

bool: True if results conflict (overlap with different types or lower score)

44

"""

45

46

def intersects(self, other: "RecognizerResult") -> int:

47

"""

48

Check intersection with another result.

49

50

Parameters:

51

- other (RecognizerResult): Another recognizer result

52

53

Returns:

54

int: Number of intersecting characters (0 if no intersection)

55

"""

56

57

def equal_indices(self, other: "RecognizerResult") -> bool:

58

"""

59

Check if two results have the same start and end positions.

60

61

Parameters:

62

- other (RecognizerResult): Another recognizer result

63

64

Returns:

65

bool: True if indices match

66

"""

67

68

def contains(self, other: "RecognizerResult") -> bool:

69

"""

70

Check if this result contains another result within its boundaries.

71

72

Parameters:

73

- other (RecognizerResult): Another recognizer result

74

75

Returns:

76

bool: True if this result contains the other result

77

"""

78

```

79

80

**Usage Examples:**

81

82

```python

83

from presidio_anonymizer.entities import RecognizerResult

84

85

# Create from constructor

86

result = RecognizerResult(

87

entity_type="PERSON",

88

start=11,

89

end=19,

90

score=0.85

91

)

92

93

# Create from JSON (typically used in API scenarios)

94

json_data = {

95

"entity_type": "EMAIL_ADDRESS",

96

"start": 25,

97

"end": 42,

98

"score": 0.95

99

}

100

result = RecognizerResult.from_json(json_data)

101

102

# Check for conflicts

103

result1 = RecognizerResult("PERSON", 10, 20, 0.8)

104

result2 = RecognizerResult("LOCATION", 15, 25, 0.9)

105

if result1.has_conflict(result2):

106

print("Results overlap and may need conflict resolution")

107

```

108

109

### Operator Configuration

110

111

Configuration object that specifies which operator to use and its parameters for anonymizing specific entity types.

112

113

```python { .api }

114

class OperatorConfig:

115

def __init__(self, operator_name: str, params: Dict = None):

116

"""

117

Create operator configuration.

118

119

Parameters:

120

- operator_name (str): Name of the operator ("replace", "mask", "encrypt", etc.)

121

- params (Dict, optional): Parameters for the operator

122

"""

123

124

@classmethod

125

def from_json(cls, params: Dict) -> "OperatorConfig":

126

"""

127

Create OperatorConfig from JSON dictionary.

128

129

Parameters:

130

- params (Dict): Dictionary with "type" key and operator parameters

131

132

Returns:

133

OperatorConfig: New configuration instance

134

"""

135

136

def __eq__(self, other: "OperatorConfig") -> bool:

137

"""Check equality with another OperatorConfig."""

138

```

139

140

**Usage Examples:**

141

142

```python

143

from presidio_anonymizer.entities import OperatorConfig

144

145

# Simple operator without parameters

146

redact_config = OperatorConfig("redact")

147

148

# Operator with parameters

149

mask_config = OperatorConfig("mask", {

150

"masking_char": "*",

151

"chars_to_mask": 4,

152

"from_end": True

153

})

154

155

# Complex operator configuration

156

encrypt_config = OperatorConfig("encrypt", {

157

"key": "my-secret-key-32-characters-long12"

158

})

159

160

# Create from JSON (API scenarios)

161

json_config = {

162

"type": "replace",

163

"new_value": "[CONFIDENTIAL]"

164

}

165

replace_config = OperatorConfig.from_json(json_config)

166

167

# Use in operators dictionary

168

operators = {

169

"PERSON": mask_config,

170

"EMAIL_ADDRESS": encrypt_config,

171

"PHONE_NUMBER": redact_config

172

}

173

```

174

175

### Engine Results

176

177

Contains the results of anonymization or deanonymization operations, including the transformed text and metadata.

178

179

```python { .api }

180

class EngineResult:

181

"""

182

Result of anonymization or deanonymization operation.

183

184

Attributes:

185

- text (str): The transformed text

186

- items (List[OperatorResult]): Metadata about applied transformations

187

"""

188

189

def __init__(self, text: str = None, items: List[OperatorResult] = None):

190

"""

191

Initialize EngineResult.

192

193

Parameters:

194

- text (str, optional): The transformed text

195

- items (List[OperatorResult], optional): Metadata about transformations, defaults to empty list

196

"""

197

198

def to_json(self) -> str:

199

"""

200

Convert result to JSON string.

201

202

Returns:

203

str: JSON representation of the result

204

"""

205

206

def set_text(self, text: str) -> None:

207

"""

208

Set the text attribute.

209

210

Parameters:

211

- text (str): The text to set

212

"""

213

214

def add_item(self, item: OperatorResult) -> None:

215

"""

216

Add an operator result item.

217

218

Parameters:

219

- item (OperatorResult): The operator result to add

220

"""

221

```

222

223

**Usage Example:**

224

225

```python

226

from presidio_anonymizer import AnonymizerEngine

227

228

engine = AnonymizerEngine()

229

result = engine.anonymize(text, analyzer_results, operators)

230

231

print(f"Anonymized text: {result.text}")

232

print(f"Number of transformations: {len(result.items)}")

233

234

# Access transformation details

235

for item in result.items:

236

print(f"Entity {item.entity_type} at {item.start}-{item.end} -> '{item.text}'")

237

238

# Convert to JSON for API responses

239

json_result = result.to_json()

240

```

241

242

### Operator Results

243

244

Metadata about individual operator transformations applied during anonymization.

245

246

```python { .api }

247

class OperatorResult:

248

"""

249

Result of applying an operator to a PII entity.

250

251

Attributes:

252

- start (int): Start position of transformed text

253

- end (int): End position of transformed text

254

- entity_type (str): Type of PII entity that was transformed

255

- text (str): The replacement text that was inserted

256

- operator (str): Name of operator that was applied

257

"""

258

259

def __init__(self, start: int, end: int, entity_type: str, text: str = None, operator: str = None):

260

"""

261

Initialize OperatorResult.

262

263

Parameters:

264

- start (int): Start position of transformed text

265

- end (int): End position of transformed text

266

- entity_type (str): Type of PII entity that was transformed

267

- text (str, optional): The replacement text that was inserted

268

- operator (str, optional): Name of operator that was applied

269

"""

270

271

def to_dict(self) -> Dict:

272

"""

273

Convert to dictionary representation.

274

275

Returns:

276

Dict: Dictionary with all attributes

277

"""

278

279

@classmethod

280

def from_json(cls, json: Dict) -> "OperatorResult":

281

"""

282

Create OperatorResult from JSON dictionary.

283

284

Parameters:

285

- json (Dict): Dictionary with operator result data

286

287

Returns:

288

OperatorResult: New instance from JSON data

289

"""

290

```

291

292

**Usage Example:**

293

294

```python

295

# Access operator results from engine result

296

result = engine.anonymize(text, analyzer_results, operators)

297

298

for op_result in result.items:

299

print(f"Applied {op_result.operator} to {op_result.entity_type}")

300

print(f"Position: {op_result.start}-{op_result.end}")

301

print(f"Replacement: '{op_result.text}'")

302

```

303

304

### PII Entity Base

305

306

Base class for PII entities with position information.

307

308

```python { .api }

309

class PIIEntity:

310

"""

311

Base class for PII entities with position information.

312

313

Attributes:

314

- start (int): Start position in text

315

- end (int): End position in text

316

- entity_type (str): Type of PII entity

317

"""

318

319

def __init__(self, start: int, end: int, entity_type: str):

320

"""

321

Initialize PIIEntity.

322

323

Parameters:

324

- start (int): Start position in text

325

- end (int): End position in text

326

- entity_type (str): Type of PII entity

327

"""

328

```

329

330

### Dictionary Recognizer Results

331

332

Specialized recognizer result for structured data like dictionaries and nested objects.

333

334

```python { .api }

335

@dataclass

336

class DictRecognizerResult:

337

"""

338

Recognizer result for dictionary/structured data analysis.

339

340

This is a dataclass that represents analyzer results for structured data like dictionaries.

341

342

Attributes:

343

- key (str): Dictionary key or field name

344

- value (Union[str, List[str], dict]): The value associated with the key

345

- recognizer_results (Union[List[RecognizerResult], List[List[RecognizerResult]], Iterator["DictRecognizerResult"]]): Analyzer results for the value

346

"""

347

```

348

349

**Usage Example:**

350

351

```python

352

from presidio_anonymizer.entities import DictRecognizerResult

353

354

# Used with BatchAnonymizerEngine for dictionary data

355

dict_result = DictRecognizerResult(

356

key="user_email",

357

value="john@example.com",

358

recognizer_results=[RecognizerResult("EMAIL_ADDRESS", 0, 16, 0.9)]

359

)

360

```

361

362

### Conflict Resolution Strategy

363

364

Enumeration defining strategies for handling overlapping PII entities.

365

366

```python { .api }

367

class ConflictResolutionStrategy(Enum):

368

MERGE_SIMILAR_OR_CONTAINED = "merge_similar_or_contained"

369

REMOVE_INTERSECTIONS = "remove_intersections"

370

```

371

372

**Usage Example:**

373

374

```python

375

from presidio_anonymizer.entities import ConflictResolutionStrategy

376

377

# Use in anonymization calls

378

result = engine.anonymize(

379

text=text,

380

analyzer_results=analyzer_results,

381

operators=operators,

382

conflict_resolution=ConflictResolutionStrategy.REMOVE_INTERSECTIONS

383

)

384

```

385

386

### Exception Classes

387

388

Custom exceptions for error handling.

389

390

```python { .api }

391

class InvalidParamError(Exception):

392

"""Exception raised for invalid parameters."""

393

394

def __init__(self, msg: str):

395

"""

396

Initialize InvalidParamError.

397

398

Parameters:

399

- msg (str): Error message

400

"""

401

```

402

403

**Usage Example:**

404

405

```python

406

from presidio_anonymizer.entities import InvalidParamError

407

408

try:

409

result = engine.anonymize(text, analyzer_results, invalid_operators)

410

except InvalidParamError as e:

411

print(f"Invalid configuration: {e}")

412

```

413

414

## Entity Relationships

415

416

- **Input Flow**: `RecognizerResult` + `OperatorConfig` → Engine → `EngineResult`

417

- **Output Details**: `EngineResult` contains `OperatorResult` items

418

- **Batch Processing**: `DictRecognizerResult` for structured data

419

- **Error Handling**: `InvalidParamError` for configuration issues

420

- **Base Classes**: `PIIEntity` provides common position functionality

421

422

## JSON Serialization

423

424

Most entities support JSON serialization for API integration:

425

426

```python

427

# RecognizerResult from JSON

428

result = RecognizerResult.from_json({

429

"entity_type": "PERSON",

430

"start": 0,

431

"end": 8,

432

"score": 0.9

433

})

434

435

# OperatorConfig from JSON

436

config = OperatorConfig.from_json({

437

"type": "mask",

438

"masking_char": "*",

439

"chars_to_mask": 4

440

})

441

442

# EngineResult to JSON

443

json_output = engine_result.to_json()

444

```