0
# Predefined Recognizers
1
2
Presidio Analyzer includes over 50 built-in recognizers for common PII types, organized into generic entities, country-specific identifiers, NLP-based recognizers, and third-party service integrations.
3
4
## Generic Recognizers
5
6
Universal PII types that apply across multiple countries and contexts.
7
8
### Email Recognition
9
10
```python { .api }
11
class EmailRecognizer(PatternRecognizer):
12
"""
13
Detects email addresses using comprehensive regex patterns.
14
15
Supported entities: ["EMAIL_ADDRESS"]
16
Languages: Multi-language support
17
"""
18
```
19
20
### Phone Number Recognition
21
22
```python { .api }
23
class PhoneRecognizer(PatternRecognizer):
24
"""
25
Detects phone numbers in various international formats.
26
27
Supported entities: ["PHONE_NUMBER"]
28
Languages: Multi-language support with region-specific patterns
29
"""
30
```
31
32
### Credit Card Recognition
33
34
```python { .api }
35
class CreditCardRecognizer(PatternRecognizer):
36
"""
37
Detects credit card numbers with format validation.
38
39
Supported entities: ["CREDIT_CARD"]
40
Languages: Multi-language support
41
Features: Supports major card types (Visa, MasterCard, American Express, etc.)
42
"""
43
```
44
45
### URL Recognition
46
47
```python { .api }
48
class UrlRecognizer(PatternRecognizer):
49
"""
50
Detects URLs and web addresses.
51
52
Supported entities: ["URL"]
53
Languages: Multi-language support
54
Features: HTTP/HTTPS, FTP, and other protocol detection
55
"""
56
```
57
58
### IP Address Recognition
59
60
```python { .api }
61
class IpRecognizer(PatternRecognizer):
62
"""
63
Detects IPv4 and IPv6 addresses.
64
65
Supported entities: ["IP_ADDRESS"]
66
Languages: Multi-language support
67
Features: Validates IP address format
68
"""
69
```
70
71
### Cryptocurrency Recognition
72
73
```python { .api }
74
class CryptoRecognizer(PatternRecognizer):
75
"""
76
Detects cryptocurrency wallet addresses.
77
78
Supported entities: ["CRYPTO"]
79
Languages: Multi-language support
80
Features: Bitcoin, Ethereum, and other major cryptocurrencies
81
"""
82
```
83
84
### IBAN Recognition
85
86
```python { .api }
87
class IbanRecognizer(PatternRecognizer):
88
"""
89
Detects International Bank Account Numbers.
90
91
Supported entities: ["IBAN_CODE"]
92
Languages: Multi-language support
93
Features: IBAN format validation and country code verification
94
"""
95
```
96
97
### Date Recognition
98
99
```python { .api }
100
class DateRecognizer(PatternRecognizer):
101
"""
102
Detects date patterns in various formats.
103
104
Supported entities: ["DATE_TIME"]
105
Languages: Multi-language support
106
Features: Multiple date formats (MM/DD/YYYY, DD-MM-YYYY, etc.)
107
"""
108
```
109
110
## United States Recognizers
111
112
PII types specific to the United States.
113
114
### Social Security Number
115
116
```python { .api }
117
class UsSsnRecognizer(PatternRecognizer):
118
"""
119
Detects US Social Security Numbers with validation.
120
121
Supported entities: ["US_SSN"]
122
Languages: ["en"]
123
Features: Format validation and invalid number filtering
124
"""
125
```
126
127
### US Driver License
128
129
```python { .api }
130
class UsLicenseRecognizer(PatternRecognizer):
131
"""
132
Detects US driver license numbers for all 50 states.
133
134
Supported entities: ["US_DRIVER_LICENSE"]
135
Languages: ["en"]
136
Features: State-specific format patterns
137
"""
138
```
139
140
### US Passport
141
142
```python { .api }
143
class UsPassportRecognizer(PatternRecognizer):
144
"""
145
Detects US passport numbers.
146
147
Supported entities: ["US_PASSPORT"]
148
Languages: ["en"]
149
Features: Current and legacy passport number formats
150
"""
151
```
152
153
### US Bank Account
154
155
```python { .api }
156
class UsBankRecognizer(PatternRecognizer):
157
"""
158
Detects US bank account numbers.
159
160
Supported entities: ["US_BANK_NUMBER"]
161
Languages: ["en"]
162
Features: Account number pattern recognition
163
"""
164
```
165
166
### US ITIN
167
168
```python { .api }
169
class UsItinRecognizer(PatternRecognizer):
170
"""
171
Detects US Individual Taxpayer Identification Numbers.
172
173
Supported entities: ["US_ITIN"]
174
Languages: ["en"]
175
Features: ITIN format validation
176
"""
177
```
178
179
### ABA Routing Number
180
181
```python { .api }
182
class AbaRoutingRecognizer(PatternRecognizer):
183
"""
184
Detects US ABA routing numbers with checksum validation.
185
186
Supported entities: ["ABA_ROUTING_NUMBER"]
187
Languages: ["en"]
188
Features: 9-digit routing number validation
189
"""
190
```
191
192
### Medical License
193
194
```python { .api }
195
class MedicalLicenseRecognizer(PatternRecognizer):
196
"""
197
Detects US medical license numbers.
198
199
Supported entities: ["MEDICAL_LICENSE"]
200
Languages: ["en"]
201
Features: State-specific medical license patterns
202
"""
203
```
204
205
## United Kingdom Recognizers
206
207
### NHS Number
208
209
```python { .api }
210
class NhsRecognizer(PatternRecognizer):
211
"""
212
Detects UK NHS (National Health Service) numbers.
213
214
Supported entities: ["UK_NHS"]
215
Languages: ["en"]
216
Features: NHS number format validation
217
"""
218
```
219
220
### UK National Insurance Number
221
222
```python { .api }
223
class UkNinoRecognizer(PatternRecognizer):
224
"""
225
Detects UK National Insurance Numbers.
226
227
Supported entities: ["UK_NINO"]
228
Languages: ["en"]
229
Features: NINO format validation and invalid prefix filtering
230
"""
231
```
232
233
## European Union Recognizers
234
235
### Italy
236
237
```python { .api }
238
class ItFiscalCodeRecognizer(PatternRecognizer):
239
"""
240
Detects Italian fiscal codes (Codice Fiscale).
241
242
Supported entities: ["IT_FISCAL_CODE"]
243
Languages: ["en", "it"]
244
Features: Fiscal code format validation
245
"""
246
247
class ItDriverLicenseRecognizer(PatternRecognizer):
248
"""
249
Detects Italian driver license numbers.
250
251
Supported entities: ["IT_DRIVER_LICENSE"]
252
Languages: ["en", "it"]
253
"""
254
255
class ItVatCodeRecognizer(PatternRecognizer):
256
"""
257
Detects Italian VAT codes.
258
259
Supported entities: ["IT_VAT_CODE"]
260
Languages: ["en", "it"]
261
Features: VAT code validation
262
"""
263
264
class ItIdentityCardRecognizer(PatternRecognizer):
265
"""
266
Detects Italian identity card numbers.
267
268
Supported entities: ["IT_IDENTITY_CARD"]
269
Languages: ["en", "it"]
270
"""
271
272
class ItPassportRecognizer(PatternRecognizer):
273
"""
274
Detects Italian passport numbers.
275
276
Supported entities: ["IT_PASSPORT"]
277
Languages: ["en", "it"]
278
"""
279
```
280
281
### Spain
282
283
```python { .api }
284
class EsNifRecognizer(PatternRecognizer):
285
"""
286
Detects Spanish NIF (National Identity Document) numbers.
287
288
Supported entities: ["ES_NIF"]
289
Languages: ["en", "es"]
290
Features: NIF checksum validation
291
"""
292
293
class EsNieRecognizer(PatternRecognizer):
294
"""
295
Detects Spanish NIE (Foreign Identity Number) numbers.
296
297
Supported entities: ["ES_NIE"]
298
Languages: ["en", "es"]
299
Features: NIE format validation
300
"""
301
```
302
303
### Poland
304
305
```python { .api }
306
class PlPeselRecognizer(PatternRecognizer):
307
"""
308
Detects Polish PESEL (Personal Identity Number) numbers.
309
310
Supported entities: ["PL_PESEL"]
311
Languages: ["en", "pl"]
312
Features: PESEL checksum validation
313
"""
314
```
315
316
### Finland
317
318
```python { .api }
319
class FiPersonalIdentityCodeRecognizer(PatternRecognizer):
320
"""
321
Detects Finnish personal identity codes.
322
323
Supported entities: ["FI_PERSONAL_IDENTITY_CODE"]
324
Languages: ["en", "fi"]
325
Features: Finnish ID format validation
326
"""
327
```
328
329
## Asia-Pacific Recognizers
330
331
### Australia
332
333
```python { .api }
334
class AuAbnRecognizer(PatternRecognizer):
335
"""
336
Detects Australian Business Numbers (ABN).
337
338
Supported entities: ["AU_ABN"]
339
Languages: ["en"]
340
Features: ABN checksum validation
341
"""
342
343
class AuAcnRecognizer(PatternRecognizer):
344
"""
345
Detects Australian Company Numbers (ACN).
346
347
Supported entities: ["AU_ACN"]
348
Languages: ["en"]
349
Features: ACN format validation
350
"""
351
352
class AuTfnRecognizer(PatternRecognizer):
353
"""
354
Detects Australian Tax File Numbers (TFN).
355
356
Supported entities: ["AU_TFN"]
357
Languages: ["en"]
358
Features: TFN format validation
359
"""
360
361
class AuMedicareRecognizer(PatternRecognizer):
362
"""
363
Detects Australian Medicare numbers.
364
365
Supported entities: ["AU_MEDICARE"]
366
Languages: ["en"]
367
Features: Medicare number format validation
368
"""
369
```
370
371
### Singapore
372
373
```python { .api }
374
class SgFinRecognizer(PatternRecognizer):
375
"""
376
Detects Singapore FIN (Foreign Identification Number) numbers.
377
378
Supported entities: ["SG_NRIC_FIN"]
379
Languages: ["en"]
380
Features: FIN checksum validation
381
"""
382
383
class SgUenRecognizer(PatternRecognizer):
384
"""
385
Detects Singapore UEN (Unique Entity Number) numbers.
386
387
Supported entities: ["SG_UEN"]
388
Languages: ["en"]
389
Features: UEN format validation
390
"""
391
```
392
393
### South Korea
394
395
```python { .api }
396
class KrRrnRecognizer(PatternRecognizer):
397
"""
398
Detects Korean Resident Registration Numbers.
399
400
Supported entities: ["KR_RRN"]
401
Languages: ["en", "ko"]
402
Features: RRN format validation
403
"""
404
```
405
406
## India Recognizers
407
408
```python { .api }
409
class InAadhaarRecognizer(PatternRecognizer):
410
"""
411
Detects Indian Aadhaar (Unique Identity) numbers.
412
413
Supported entities: ["IN_AADHAAR"]
414
Languages: ["en"]
415
Features: Aadhaar format validation
416
"""
417
418
class InPanRecognizer(PatternRecognizer):
419
"""
420
Detects Indian PAN (Permanent Account Number) numbers.
421
422
Supported entities: ["IN_PAN"]
423
Languages: ["en"]
424
Features: PAN format validation
425
"""
426
427
class InPassportRecognizer(PatternRecognizer):
428
"""
429
Detects Indian passport numbers.
430
431
Supported entities: ["IN_PASSPORT"]
432
Languages: ["en"]
433
Features: Indian passport format patterns
434
"""
435
436
class InVehicleRegistrationRecognizer(PatternRecognizer):
437
"""
438
Detects Indian vehicle registration numbers.
439
440
Supported entities: ["IN_VEHICLE_REGISTRATION"]
441
Languages: ["en"]
442
Features: Indian vehicle registration format patterns
443
"""
444
445
class InVoterRecognizer(PatternRecognizer):
446
"""
447
Detects Indian voter ID numbers.
448
449
Supported entities: ["IN_VOTER"]
450
Languages: ["en"]
451
Features: Indian voter ID format validation
452
"""
453
```
454
455
## NLP-Based Recognizers
456
457
Recognizers that use Natural Language Processing models for entity detection.
458
459
### spaCy NLP Recognizer
460
461
```python { .api }
462
class SpacyRecognizer(LocalRecognizer):
463
"""
464
Uses spaCy NLP models for named entity recognition.
465
466
Supported entities: ["PERSON", "LOCATION", "ORGANIZATION"] and others
467
Languages: Multiple languages supported by spaCy models
468
Features: Leverages spaCy's pre-trained NER models
469
"""
470
def __init__(
471
self,
472
supported_entities: List[str] = None,
473
check_label_groups: Tuple[Set, Set] = None,
474
supported_language: str = "en",
475
ner_strength: float = 0.85
476
): ...
477
```
478
479
### Stanza NLP Recognizer
480
481
```python { .api }
482
class StanzaRecognizer(LocalRecognizer):
483
"""
484
Uses Stanford Stanza NLP models for named entity recognition.
485
486
Supported entities: ["PERSON", "LOCATION", "ORGANIZATION"] and others
487
Languages: Multiple languages supported by Stanza
488
Features: Stanford NLP Group's state-of-the-art NER models
489
"""
490
def __init__(
491
self,
492
supported_entities: List[str] = None,
493
check_label_groups: Tuple[Set, Set] = None,
494
supported_language: str = "en",
495
ner_strength: float = 0.85
496
): ...
497
```
498
499
### Transformers NLP Recognizer
500
501
```python { .api }
502
class TransformersRecognizer(LocalRecognizer):
503
"""
504
Uses Hugging Face Transformers models for named entity recognition.
505
506
Supported entities: ["PERSON", "LOCATION", "ORGANIZATION"] and others
507
Languages: Multiple languages depending on model
508
Features: Supports BERT, RoBERTa, and other transformer models
509
"""
510
def __init__(
511
self,
512
model_id_or_path: str = None,
513
aggregation_strategy: str = "simple",
514
supported_entities: List[str] = None,
515
pipeline_kwargs: Dict = None,
516
model_kwargs: Dict = None
517
): ...
518
```
519
520
## Third-Party Service Recognizers
521
522
Integrations with external PII detection services.
523
524
### Azure AI Language
525
526
```python { .api }
527
class AzureAILanguageRecognizer(RemoteRecognizer):
528
"""
529
Integrates with Azure AI Language service for PII detection.
530
531
Supported entities: Multiple PII types supported by Azure
532
Languages: Multiple languages supported by Azure AI
533
Features: Cloud-based detection with high accuracy
534
"""
535
def __init__(
536
self,
537
endpoint: str = None,
538
credential: str = None,
539
supported_entities: List[str] = None,
540
supported_language: str = "en"
541
): ...
542
```
543
544
### Azure Health De-identification
545
546
```python { .api }
547
class AzureHealthDeidRecognizer(RemoteRecognizer):
548
"""
549
Integrates with Azure Health De-identification service.
550
551
Supported entities: Healthcare-specific PII types
552
Languages: ["en"]
553
Features: Specialized for healthcare and medical text
554
"""
555
def __init__(
556
self,
557
deid_service_name: str,
558
supported_entities: List[str] = None,
559
supported_language: str = "en"
560
): ...
561
```
562
563
## Usage Examples
564
565
### Using Specific Recognizers
566
567
```python
568
from presidio_analyzer import AnalyzerEngine
569
570
# Initialize analyzer
571
analyzer = AnalyzerEngine()
572
573
# Detect only US-specific PII
574
text = "John's SSN is 123-45-6789 and his driver license is D1234567"
575
results = analyzer.analyze(
576
text=text,
577
language="en",
578
entities=["US_SSN", "US_DRIVER_LICENSE"]
579
)
580
581
for result in results:
582
detected_text = text[result.start:result.end]
583
print(f"Found {result.entity_type}: {detected_text}")
584
```
585
586
### Multi-Country Detection
587
588
```python
589
from presidio_analyzer import AnalyzerEngine
590
591
analyzer = AnalyzerEngine()
592
593
# Detect various international identifiers
594
text = """
595
Contact information:
596
- UK: NHS 123 456 7890
597
- Italy: Fiscal Code RSSMRA80A01H501U
598
- Spain: NIF 12345678Z
599
- Australia: ABN 12 345 678 901
600
"""
601
602
results = analyzer.analyze(text=text, language="en")
603
604
# Group results by entity type
605
entity_groups = {}
606
for result in results:
607
if result.entity_type not in entity_groups:
608
entity_groups[result.entity_type] = []
609
entity_groups[result.entity_type].append(text[result.start:result.end])
610
611
for entity_type, detected_values in entity_groups.items():
612
print(f"{entity_type}: {detected_values}")
613
```
614
615
### Financial Data Detection
616
617
```python
618
from presidio_analyzer import AnalyzerEngine
619
620
analyzer = AnalyzerEngine()
621
622
# Detect financial identifiers
623
text = """
624
Payment details:
625
- Credit Card: 4532-1234-5678-9012
626
- IBAN: GB82 WEST 1234 5698 7654 32
627
- ABA Routing: 121000248
628
- Account: 1234567890
629
"""
630
631
financial_entities = [
632
"CREDIT_CARD",
633
"IBAN_CODE",
634
"ABA_ROUTING_NUMBER",
635
"US_BANK_NUMBER"
636
]
637
638
results = analyzer.analyze(
639
text=text,
640
language="en",
641
entities=financial_entities
642
)
643
644
print(f"Found {len(results)} financial identifiers")
645
for result in results:
646
masked_value = "X" * (result.end - result.start)
647
print(f"{result.entity_type}: {masked_value} (score: {result.score:.2f})")
648
```
649
650
### Healthcare Data Detection
651
652
```python
653
from presidio_analyzer import AnalyzerEngine
654
655
# Configure for healthcare context
656
analyzer = AnalyzerEngine()
657
658
healthcare_text = """
659
Patient: John Smith (DOB: 01/15/1980)
660
SSN: 123-45-6789
661
Phone: 555-123-4567
662
Email: john.smith@email.com
663
Medical License: MD123456
664
"""
665
666
# Detect healthcare-relevant PII
667
healthcare_entities = [
668
"PERSON",
669
"DATE_TIME",
670
"US_SSN",
671
"PHONE_NUMBER",
672
"EMAIL_ADDRESS",
673
"MEDICAL_LICENSE"
674
]
675
676
results = analyzer.analyze(
677
text=healthcare_text,
678
language="en",
679
entities=healthcare_entities,
680
context=["patient", "medical", "healthcare", "doctor"]
681
)
682
683
print(f"Healthcare PII detected: {len(results)} items")
684
```
685
686
### Custom Entity Type Priority
687
688
```python
689
from presidio_analyzer import AnalyzerEngine
690
691
analyzer = AnalyzerEngine()
692
693
# Prioritize certain entity types with higher thresholds
694
text = "Contact: john.doe@company.com, phone: 555-0123, SSN: 123-45-6789"
695
696
# High-confidence detection for sensitive data
697
sensitive_results = analyzer.analyze(
698
text=text,
699
language="en",
700
entities=["US_SSN"],
701
score_threshold=0.9 # Very high confidence only
702
)
703
704
# Standard detection for contact info
705
contact_results = analyzer.analyze(
706
text=text,
707
language="en",
708
entities=["EMAIL_ADDRESS", "PHONE_NUMBER"],
709
score_threshold=0.5 # Standard confidence
710
)
711
712
print(f"High-confidence sensitive data: {len(sensitive_results)}")
713
print(f"Contact information: {len(contact_results)}")
714
```
715
716
## Recognizer Categories Summary
717
718
### By Confidence Level
719
- **High confidence (0.9+)**: Validated formats (SSN, credit cards with Luhn validation)
720
- **Medium confidence (0.7-0.9)**: Well-defined patterns (phone numbers, emails)
721
- **Lower confidence (0.5-0.7)**: Contextual or NLP-based detection
722
723
### By Validation Features
724
- **Checksum validation**: Credit cards, SSNs, IBANs, ABNs
725
- **Format validation**: Phone numbers, emails, passport numbers
726
- **Pattern matching**: License plates, product codes
727
- **NLP-based**: Person names, locations, organizations
728
729
### By Language Support
730
- **Multi-language**: Email, phone, credit card, URL, IP, crypto
731
- **English only**: Most US-specific recognizers
732
- **Regional**: Country-specific recognizers in local languages
733
734
### By Processing Requirements
735
- **Pattern-only**: Most built-in recognizers (fast, no NLP needed)
736
- **NLP-dependent**: spaCy, Stanza, Transformers recognizers
737
- **External service**: Azure AI Language, Health De-identification