or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

batch-processing.mdconfiguration.mdcontext-enhancement.mdcore-analysis.mdentity-recognizers.mdindex.mdpredefined-recognizers.md

predefined-recognizers.mddocs/

0

# Predefined Recognizers

1

2

Presidio Analyzer includes over 50 built-in recognizers for common PII types, organized into generic entities, country-specific identifiers, NLP-based recognizers, and third-party service integrations.

3

4

## Generic Recognizers

5

6

Universal PII types that apply across multiple countries and contexts.

7

8

### Email Recognition

9

10

```python { .api }

11

class EmailRecognizer(PatternRecognizer):

12

"""

13

Detects email addresses using comprehensive regex patterns.

14

15

Supported entities: ["EMAIL_ADDRESS"]

16

Languages: Multi-language support

17

"""

18

```

19

20

### Phone Number Recognition

21

22

```python { .api }

23

class PhoneRecognizer(PatternRecognizer):

24

"""

25

Detects phone numbers in various international formats.

26

27

Supported entities: ["PHONE_NUMBER"]

28

Languages: Multi-language support with region-specific patterns

29

"""

30

```

31

32

### Credit Card Recognition

33

34

```python { .api }

35

class CreditCardRecognizer(PatternRecognizer):

36

"""

37

Detects credit card numbers with format validation.

38

39

Supported entities: ["CREDIT_CARD"]

40

Languages: Multi-language support

41

Features: Supports major card types (Visa, MasterCard, American Express, etc.)

42

"""

43

```

44

45

### URL Recognition

46

47

```python { .api }

48

class UrlRecognizer(PatternRecognizer):

49

"""

50

Detects URLs and web addresses.

51

52

Supported entities: ["URL"]

53

Languages: Multi-language support

54

Features: HTTP/HTTPS, FTP, and other protocol detection

55

"""

56

```

57

58

### IP Address Recognition

59

60

```python { .api }

61

class IpRecognizer(PatternRecognizer):

62

"""

63

Detects IPv4 and IPv6 addresses.

64

65

Supported entities: ["IP_ADDRESS"]

66

Languages: Multi-language support

67

Features: Validates IP address format

68

"""

69

```

70

71

### Cryptocurrency Recognition

72

73

```python { .api }

74

class CryptoRecognizer(PatternRecognizer):

75

"""

76

Detects cryptocurrency wallet addresses.

77

78

Supported entities: ["CRYPTO"]

79

Languages: Multi-language support

80

Features: Bitcoin, Ethereum, and other major cryptocurrencies

81

"""

82

```

83

84

### IBAN Recognition

85

86

```python { .api }

87

class IbanRecognizer(PatternRecognizer):

88

"""

89

Detects International Bank Account Numbers.

90

91

Supported entities: ["IBAN_CODE"]

92

Languages: Multi-language support

93

Features: IBAN format validation and country code verification

94

"""

95

```

96

97

### Date Recognition

98

99

```python { .api }

100

class DateRecognizer(PatternRecognizer):

101

"""

102

Detects date patterns in various formats.

103

104

Supported entities: ["DATE_TIME"]

105

Languages: Multi-language support

106

Features: Multiple date formats (MM/DD/YYYY, DD-MM-YYYY, etc.)

107

"""

108

```

109

110

## United States Recognizers

111

112

PII types specific to the United States.

113

114

### Social Security Number

115

116

```python { .api }

117

class UsSsnRecognizer(PatternRecognizer):

118

"""

119

Detects US Social Security Numbers with validation.

120

121

Supported entities: ["US_SSN"]

122

Languages: ["en"]

123

Features: Format validation and invalid number filtering

124

"""

125

```

126

127

### US Driver License

128

129

```python { .api }

130

class UsLicenseRecognizer(PatternRecognizer):

131

"""

132

Detects US driver license numbers for all 50 states.

133

134

Supported entities: ["US_DRIVER_LICENSE"]

135

Languages: ["en"]

136

Features: State-specific format patterns

137

"""

138

```

139

140

### US Passport

141

142

```python { .api }

143

class UsPassportRecognizer(PatternRecognizer):

144

"""

145

Detects US passport numbers.

146

147

Supported entities: ["US_PASSPORT"]

148

Languages: ["en"]

149

Features: Current and legacy passport number formats

150

"""

151

```

152

153

### US Bank Account

154

155

```python { .api }

156

class UsBankRecognizer(PatternRecognizer):

157

"""

158

Detects US bank account numbers.

159

160

Supported entities: ["US_BANK_NUMBER"]

161

Languages: ["en"]

162

Features: Account number pattern recognition

163

"""

164

```

165

166

### US ITIN

167

168

```python { .api }

169

class UsItinRecognizer(PatternRecognizer):

170

"""

171

Detects US Individual Taxpayer Identification Numbers.

172

173

Supported entities: ["US_ITIN"]

174

Languages: ["en"]

175

Features: ITIN format validation

176

"""

177

```

178

179

### ABA Routing Number

180

181

```python { .api }

182

class AbaRoutingRecognizer(PatternRecognizer):

183

"""

184

Detects US ABA routing numbers with checksum validation.

185

186

Supported entities: ["ABA_ROUTING_NUMBER"]

187

Languages: ["en"]

188

Features: 9-digit routing number validation

189

"""

190

```

191

192

### Medical License

193

194

```python { .api }

195

class MedicalLicenseRecognizer(PatternRecognizer):

196

"""

197

Detects US medical license numbers.

198

199

Supported entities: ["MEDICAL_LICENSE"]

200

Languages: ["en"]

201

Features: State-specific medical license patterns

202

"""

203

```

204

205

## United Kingdom Recognizers

206

207

### NHS Number

208

209

```python { .api }

210

class NhsRecognizer(PatternRecognizer):

211

"""

212

Detects UK NHS (National Health Service) numbers.

213

214

Supported entities: ["UK_NHS"]

215

Languages: ["en"]

216

Features: NHS number format validation

217

"""

218

```

219

220

### UK National Insurance Number

221

222

```python { .api }

223

class UkNinoRecognizer(PatternRecognizer):

224

"""

225

Detects UK National Insurance Numbers.

226

227

Supported entities: ["UK_NINO"]

228

Languages: ["en"]

229

Features: NINO format validation and invalid prefix filtering

230

"""

231

```

232

233

## European Union Recognizers

234

235

### Italy

236

237

```python { .api }

238

class ItFiscalCodeRecognizer(PatternRecognizer):

239

"""

240

Detects Italian fiscal codes (Codice Fiscale).

241

242

Supported entities: ["IT_FISCAL_CODE"]

243

Languages: ["en", "it"]

244

Features: Fiscal code format validation

245

"""

246

247

class ItDriverLicenseRecognizer(PatternRecognizer):

248

"""

249

Detects Italian driver license numbers.

250

251

Supported entities: ["IT_DRIVER_LICENSE"]

252

Languages: ["en", "it"]

253

"""

254

255

class ItVatCodeRecognizer(PatternRecognizer):

256

"""

257

Detects Italian VAT codes.

258

259

Supported entities: ["IT_VAT_CODE"]

260

Languages: ["en", "it"]

261

Features: VAT code validation

262

"""

263

264

class ItIdentityCardRecognizer(PatternRecognizer):

265

"""

266

Detects Italian identity card numbers.

267

268

Supported entities: ["IT_IDENTITY_CARD"]

269

Languages: ["en", "it"]

270

"""

271

272

class ItPassportRecognizer(PatternRecognizer):

273

"""

274

Detects Italian passport numbers.

275

276

Supported entities: ["IT_PASSPORT"]

277

Languages: ["en", "it"]

278

"""

279

```

280

281

### Spain

282

283

```python { .api }

284

class EsNifRecognizer(PatternRecognizer):

285

"""

286

Detects Spanish NIF (National Identity Document) numbers.

287

288

Supported entities: ["ES_NIF"]

289

Languages: ["en", "es"]

290

Features: NIF checksum validation

291

"""

292

293

class EsNieRecognizer(PatternRecognizer):

294

"""

295

Detects Spanish NIE (Foreign Identity Number) numbers.

296

297

Supported entities: ["ES_NIE"]

298

Languages: ["en", "es"]

299

Features: NIE format validation

300

"""

301

```

302

303

### Poland

304

305

```python { .api }

306

class PlPeselRecognizer(PatternRecognizer):

307

"""

308

Detects Polish PESEL (Personal Identity Number) numbers.

309

310

Supported entities: ["PL_PESEL"]

311

Languages: ["en", "pl"]

312

Features: PESEL checksum validation

313

"""

314

```

315

316

### Finland

317

318

```python { .api }

319

class FiPersonalIdentityCodeRecognizer(PatternRecognizer):

320

"""

321

Detects Finnish personal identity codes.

322

323

Supported entities: ["FI_PERSONAL_IDENTITY_CODE"]

324

Languages: ["en", "fi"]

325

Features: Finnish ID format validation

326

"""

327

```

328

329

## Asia-Pacific Recognizers

330

331

### Australia

332

333

```python { .api }

334

class AuAbnRecognizer(PatternRecognizer):

335

"""

336

Detects Australian Business Numbers (ABN).

337

338

Supported entities: ["AU_ABN"]

339

Languages: ["en"]

340

Features: ABN checksum validation

341

"""

342

343

class AuAcnRecognizer(PatternRecognizer):

344

"""

345

Detects Australian Company Numbers (ACN).

346

347

Supported entities: ["AU_ACN"]

348

Languages: ["en"]

349

Features: ACN format validation

350

"""

351

352

class AuTfnRecognizer(PatternRecognizer):

353

"""

354

Detects Australian Tax File Numbers (TFN).

355

356

Supported entities: ["AU_TFN"]

357

Languages: ["en"]

358

Features: TFN format validation

359

"""

360

361

class AuMedicareRecognizer(PatternRecognizer):

362

"""

363

Detects Australian Medicare numbers.

364

365

Supported entities: ["AU_MEDICARE"]

366

Languages: ["en"]

367

Features: Medicare number format validation

368

"""

369

```

370

371

### Singapore

372

373

```python { .api }

374

class SgFinRecognizer(PatternRecognizer):

375

"""

376

Detects Singapore FIN (Foreign Identification Number) numbers.

377

378

Supported entities: ["SG_NRIC_FIN"]

379

Languages: ["en"]

380

Features: FIN checksum validation

381

"""

382

383

class SgUenRecognizer(PatternRecognizer):

384

"""

385

Detects Singapore UEN (Unique Entity Number) numbers.

386

387

Supported entities: ["SG_UEN"]

388

Languages: ["en"]

389

Features: UEN format validation

390

"""

391

```

392

393

### South Korea

394

395

```python { .api }

396

class KrRrnRecognizer(PatternRecognizer):

397

"""

398

Detects Korean Resident Registration Numbers.

399

400

Supported entities: ["KR_RRN"]

401

Languages: ["en", "ko"]

402

Features: RRN format validation

403

"""

404

```

405

406

## India Recognizers

407

408

```python { .api }

409

class InAadhaarRecognizer(PatternRecognizer):

410

"""

411

Detects Indian Aadhaar (Unique Identity) numbers.

412

413

Supported entities: ["IN_AADHAAR"]

414

Languages: ["en"]

415

Features: Aadhaar format validation

416

"""

417

418

class InPanRecognizer(PatternRecognizer):

419

"""

420

Detects Indian PAN (Permanent Account Number) numbers.

421

422

Supported entities: ["IN_PAN"]

423

Languages: ["en"]

424

Features: PAN format validation

425

"""

426

427

class InPassportRecognizer(PatternRecognizer):

428

"""

429

Detects Indian passport numbers.

430

431

Supported entities: ["IN_PASSPORT"]

432

Languages: ["en"]

433

Features: Indian passport format patterns

434

"""

435

436

class InVehicleRegistrationRecognizer(PatternRecognizer):

437

"""

438

Detects Indian vehicle registration numbers.

439

440

Supported entities: ["IN_VEHICLE_REGISTRATION"]

441

Languages: ["en"]

442

Features: Indian vehicle registration format patterns

443

"""

444

445

class InVoterRecognizer(PatternRecognizer):

446

"""

447

Detects Indian voter ID numbers.

448

449

Supported entities: ["IN_VOTER"]

450

Languages: ["en"]

451

Features: Indian voter ID format validation

452

"""

453

```

454

455

## NLP-Based Recognizers

456

457

Recognizers that use Natural Language Processing models for entity detection.

458

459

### spaCy NLP Recognizer

460

461

```python { .api }

462

class SpacyRecognizer(LocalRecognizer):

463

"""

464

Uses spaCy NLP models for named entity recognition.

465

466

Supported entities: ["PERSON", "LOCATION", "ORGANIZATION"] and others

467

Languages: Multiple languages supported by spaCy models

468

Features: Leverages spaCy's pre-trained NER models

469

"""

470

def __init__(

471

self,

472

supported_entities: List[str] = None,

473

check_label_groups: Tuple[Set, Set] = None,

474

supported_language: str = "en",

475

ner_strength: float = 0.85

476

): ...

477

```

478

479

### Stanza NLP Recognizer

480

481

```python { .api }

482

class StanzaRecognizer(LocalRecognizer):

483

"""

484

Uses Stanford Stanza NLP models for named entity recognition.

485

486

Supported entities: ["PERSON", "LOCATION", "ORGANIZATION"] and others

487

Languages: Multiple languages supported by Stanza

488

Features: Stanford NLP Group's state-of-the-art NER models

489

"""

490

def __init__(

491

self,

492

supported_entities: List[str] = None,

493

check_label_groups: Tuple[Set, Set] = None,

494

supported_language: str = "en",

495

ner_strength: float = 0.85

496

): ...

497

```

498

499

### Transformers NLP Recognizer

500

501

```python { .api }

502

class TransformersRecognizer(LocalRecognizer):

503

"""

504

Uses Hugging Face Transformers models for named entity recognition.

505

506

Supported entities: ["PERSON", "LOCATION", "ORGANIZATION"] and others

507

Languages: Multiple languages depending on model

508

Features: Supports BERT, RoBERTa, and other transformer models

509

"""

510

def __init__(

511

self,

512

model_id_or_path: str = None,

513

aggregation_strategy: str = "simple",

514

supported_entities: List[str] = None,

515

pipeline_kwargs: Dict = None,

516

model_kwargs: Dict = None

517

): ...

518

```

519

520

## Third-Party Service Recognizers

521

522

Integrations with external PII detection services.

523

524

### Azure AI Language

525

526

```python { .api }

527

class AzureAILanguageRecognizer(RemoteRecognizer):

528

"""

529

Integrates with Azure AI Language service for PII detection.

530

531

Supported entities: Multiple PII types supported by Azure

532

Languages: Multiple languages supported by Azure AI

533

Features: Cloud-based detection with high accuracy

534

"""

535

def __init__(

536

self,

537

endpoint: str = None,

538

credential: str = None,

539

supported_entities: List[str] = None,

540

supported_language: str = "en"

541

): ...

542

```

543

544

### Azure Health De-identification

545

546

```python { .api }

547

class AzureHealthDeidRecognizer(RemoteRecognizer):

548

"""

549

Integrates with Azure Health De-identification service.

550

551

Supported entities: Healthcare-specific PII types

552

Languages: ["en"]

553

Features: Specialized for healthcare and medical text

554

"""

555

def __init__(

556

self,

557

deid_service_name: str,

558

supported_entities: List[str] = None,

559

supported_language: str = "en"

560

): ...

561

```

562

563

## Usage Examples

564

565

### Using Specific Recognizers

566

567

```python

568

from presidio_analyzer import AnalyzerEngine

569

570

# Initialize analyzer

571

analyzer = AnalyzerEngine()

572

573

# Detect only US-specific PII

574

text = "John's SSN is 123-45-6789 and his driver license is D1234567"

575

results = analyzer.analyze(

576

text=text,

577

language="en",

578

entities=["US_SSN", "US_DRIVER_LICENSE"]

579

)

580

581

for result in results:

582

detected_text = text[result.start:result.end]

583

print(f"Found {result.entity_type}: {detected_text}")

584

```

585

586

### Multi-Country Detection

587

588

```python

589

from presidio_analyzer import AnalyzerEngine

590

591

analyzer = AnalyzerEngine()

592

593

# Detect various international identifiers

594

text = """

595

Contact information:

596

- UK: NHS 123 456 7890

597

- Italy: Fiscal Code RSSMRA80A01H501U

598

- Spain: NIF 12345678Z

599

- Australia: ABN 12 345 678 901

600

"""

601

602

results = analyzer.analyze(text=text, language="en")

603

604

# Group results by entity type

605

entity_groups = {}

606

for result in results:

607

if result.entity_type not in entity_groups:

608

entity_groups[result.entity_type] = []

609

entity_groups[result.entity_type].append(text[result.start:result.end])

610

611

for entity_type, detected_values in entity_groups.items():

612

print(f"{entity_type}: {detected_values}")

613

```

614

615

### Financial Data Detection

616

617

```python

618

from presidio_analyzer import AnalyzerEngine

619

620

analyzer = AnalyzerEngine()

621

622

# Detect financial identifiers

623

text = """

624

Payment details:

625

- Credit Card: 4532-1234-5678-9012

626

- IBAN: GB82 WEST 1234 5698 7654 32

627

- ABA Routing: 121000248

628

- Account: 1234567890

629

"""

630

631

financial_entities = [

632

"CREDIT_CARD",

633

"IBAN_CODE",

634

"ABA_ROUTING_NUMBER",

635

"US_BANK_NUMBER"

636

]

637

638

results = analyzer.analyze(

639

text=text,

640

language="en",

641

entities=financial_entities

642

)

643

644

print(f"Found {len(results)} financial identifiers")

645

for result in results:

646

masked_value = "X" * (result.end - result.start)

647

print(f"{result.entity_type}: {masked_value} (score: {result.score:.2f})")

648

```

649

650

### Healthcare Data Detection

651

652

```python

653

from presidio_analyzer import AnalyzerEngine

654

655

# Configure for healthcare context

656

analyzer = AnalyzerEngine()

657

658

healthcare_text = """

659

Patient: John Smith (DOB: 01/15/1980)

660

SSN: 123-45-6789

661

Phone: 555-123-4567

662

Email: john.smith@email.com

663

Medical License: MD123456

664

"""

665

666

# Detect healthcare-relevant PII

667

healthcare_entities = [

668

"PERSON",

669

"DATE_TIME",

670

"US_SSN",

671

"PHONE_NUMBER",

672

"EMAIL_ADDRESS",

673

"MEDICAL_LICENSE"

674

]

675

676

results = analyzer.analyze(

677

text=healthcare_text,

678

language="en",

679

entities=healthcare_entities,

680

context=["patient", "medical", "healthcare", "doctor"]

681

)

682

683

print(f"Healthcare PII detected: {len(results)} items")

684

```

685

686

### Custom Entity Type Priority

687

688

```python

689

from presidio_analyzer import AnalyzerEngine

690

691

analyzer = AnalyzerEngine()

692

693

# Prioritize certain entity types with higher thresholds

694

text = "Contact: john.doe@company.com, phone: 555-0123, SSN: 123-45-6789"

695

696

# High-confidence detection for sensitive data

697

sensitive_results = analyzer.analyze(

698

text=text,

699

language="en",

700

entities=["US_SSN"],

701

score_threshold=0.9 # Very high confidence only

702

)

703

704

# Standard detection for contact info

705

contact_results = analyzer.analyze(

706

text=text,

707

language="en",

708

entities=["EMAIL_ADDRESS", "PHONE_NUMBER"],

709

score_threshold=0.5 # Standard confidence

710

)

711

712

print(f"High-confidence sensitive data: {len(sensitive_results)}")

713

print(f"Contact information: {len(contact_results)}")

714

```

715

716

## Recognizer Categories Summary

717

718

### By Confidence Level

719

- **High confidence (0.9+)**: Validated formats (SSN, credit cards with Luhn validation)

720

- **Medium confidence (0.7-0.9)**: Well-defined patterns (phone numbers, emails)

721

- **Lower confidence (0.5-0.7)**: Contextual or NLP-based detection

722

723

### By Validation Features

724

- **Checksum validation**: Credit cards, SSNs, IBANs, ABNs

725

- **Format validation**: Phone numbers, emails, passport numbers

726

- **Pattern matching**: License plates, product codes

727

- **NLP-based**: Person names, locations, organizations

728

729

### By Language Support

730

- **Multi-language**: Email, phone, credit card, URL, IP, crypto

731

- **English only**: Most US-specific recognizers

732

- **Regional**: Country-specific recognizers in local languages

733

734

### By Processing Requirements

735

- **Pattern-only**: Most built-in recognizers (fast, no NLP needed)

736

- **NLP-dependent**: spaCy, Stanza, Transformers recognizers

737

- **External service**: Azure AI Language, Health De-identification