0
# Google Cloud DLP
1
2
Google Cloud Data Loss Prevention (DLP) API enables organizations to discover, classify, and protect sensitive data across their cloud and hybrid environments. It provides comprehensive content inspection, data transformation, risk analysis, and automated data discovery capabilities with extensive configuration options for compliance and privacy requirements.
3
4
## Package Information
5
6
- **Package Name**: google-cloud-dlp
7
- **Language**: Python
8
- **Installation**: `pip install google-cloud-dlp`
9
10
## Core Imports
11
12
```python
13
from google.cloud import dlp
14
```
15
16
For direct access to v2 API:
17
18
```python
19
from google.cloud import dlp_v2
20
```
21
22
## Basic Usage
23
24
```python
25
from google.cloud import dlp
26
27
# Initialize the DLP client
28
client = dlp.DlpServiceClient()
29
30
# Basic content inspection
31
parent = f"projects/{project_id}/locations/global"
32
content_item = dlp.ContentItem(value="My SSN is 123-45-6789")
33
34
# Configure inspection
35
inspect_config = dlp.InspectConfig(
36
info_types=[dlp.InfoType(name="US_SOCIAL_SECURITY_NUMBER")]
37
)
38
39
# Create request
40
request = dlp.InspectContentRequest(
41
parent=parent,
42
inspect_config=inspect_config,
43
item=content_item,
44
)
45
46
# Inspect content
47
response = client.inspect_content(request=request)
48
49
# Process findings
50
for finding in response.result.findings:
51
print(f"Found {finding.info_type.name}: {finding.quote}")
52
```
53
54
## Architecture
55
56
The Google Cloud DLP API follows a service-oriented architecture with distinct functional areas:
57
58
- **Client Libraries**: Synchronous and asynchronous clients for API interaction
59
- **Content Analysis**: Real-time inspection, de-identification, and image redaction
60
- **Job Management**: Long-running batch operations with triggers and scheduling
61
- **Data Discovery**: Automated scanning and profiling of cloud data sources
62
- **Template System**: Reusable configurations for inspection and transformation
63
- **Type System**: Extensive type definitions for configuration and results
64
65
The API supports both immediate operations for small datasets and batch processing for enterprise-scale data protection workflows.
66
67
## Capabilities
68
69
### Content Analysis
70
71
Real-time analysis of text and images to detect, redact, and transform sensitive information. Supports immediate inspection with customizable info types and confidence levels.
72
73
```python { .api }
74
def inspect_content(
75
request: dlp.InspectContentRequest,
76
*,
77
retry: OptionalRetry = gapic_v1.method.DEFAULT,
78
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
79
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
80
) -> dlp.InspectContentResponse: ...
81
82
def deidentify_content(
83
request: dlp.DeidentifyContentRequest,
84
*,
85
retry: OptionalRetry = gapic_v1.method.DEFAULT,
86
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
87
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
88
) -> dlp.DeidentifyContentResponse: ...
89
90
def redact_image(
91
request: dlp.RedactImageRequest,
92
*,
93
retry: OptionalRetry = gapic_v1.method.DEFAULT,
94
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
95
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
96
) -> dlp.RedactImageResponse: ...
97
```
98
99
[Content Analysis](./content-analysis.md)
100
101
### Template Management
102
103
Reusable configurations for inspection and de-identification operations. Templates standardize DLP policies across an organization and simplify repeated operations.
104
105
```python { .api }
106
def create_inspect_template(
107
request: dlp.CreateInspectTemplateRequest,
108
*,
109
parent: Optional[str] = None,
110
inspect_template: Optional[dlp.InspectTemplate] = None,
111
retry: OptionalRetry = gapic_v1.method.DEFAULT,
112
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
113
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
114
) -> dlp.InspectTemplate: ...
115
116
def create_deidentify_template(
117
request: dlp.CreateDeidentifyTemplateRequest,
118
*,
119
parent: Optional[str] = None,
120
deidentify_template: Optional[dlp.DeidentifyTemplate] = None,
121
retry: OptionalRetry = gapic_v1.method.DEFAULT,
122
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
123
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
124
) -> dlp.DeidentifyTemplate: ...
125
```
126
127
[Template Management](./template-management.md)
128
129
### Job Management
130
131
Long-running batch operations for processing large datasets, including scheduled triggers, hybrid content inspection, and job lifecycle management.
132
133
```python { .api }
134
def create_dlp_job(
135
request: dlp.CreateDlpJobRequest,
136
*,
137
parent: Optional[str] = None,
138
inspect_job: Optional[dlp.InspectJobConfig] = None,
139
risk_job: Optional[dlp.RiskAnalysisJobConfig] = None,
140
retry: OptionalRetry = gapic_v1.method.DEFAULT,
141
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
142
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
143
) -> dlp.DlpJob: ...
144
145
def create_job_trigger(
146
request: dlp.CreateJobTriggerRequest,
147
*,
148
parent: Optional[str] = None,
149
job_trigger: Optional[dlp.JobTrigger] = None,
150
retry: OptionalRetry = gapic_v1.method.DEFAULT,
151
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
152
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
153
) -> dlp.JobTrigger: ...
154
```
155
156
[Job Management](./job-management.md)
157
158
### Data Discovery
159
160
Automated scanning and profiling of cloud data sources to understand data distribution, sensitivity, and compliance posture across BigQuery, Cloud Storage, Cloud SQL, and more.
161
162
```python { .api }
163
def create_discovery_config(
164
request: dlp.CreateDiscoveryConfigRequest,
165
*,
166
parent: Optional[str] = None,
167
discovery_config: Optional[dlp.DiscoveryConfig] = None,
168
retry: OptionalRetry = gapic_v1.method.DEFAULT,
169
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
170
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
171
) -> dlp.DiscoveryConfig: ...
172
```
173
174
[Data Discovery](./data-discovery.md)
175
176
### Data Profiling
177
178
Access to data profiles and insights generated by discovery scans, providing visibility into data sensitivity, distribution, and risk levels across projects, tables, columns, and file stores.
179
180
```python { .api }
181
def get_project_data_profile(
182
request: dlp.GetProjectDataProfileRequest,
183
*,
184
name: Optional[str] = None,
185
retry: OptionalRetry = gapic_v1.method.DEFAULT,
186
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
187
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
188
) -> dlp.ProjectDataProfile: ...
189
190
def get_table_data_profile(
191
request: dlp.GetTableDataProfileRequest,
192
*,
193
name: Optional[str] = None,
194
retry: OptionalRetry = gapic_v1.method.DEFAULT,
195
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
196
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
197
) -> dlp.TableDataProfile: ...
198
199
def get_column_data_profile(
200
request: dlp.GetColumnDataProfileRequest,
201
*,
202
name: Optional[str] = None,
203
retry: OptionalRetry = gapic_v1.method.DEFAULT,
204
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
205
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
206
) -> dlp.ColumnDataProfile: ...
207
208
def get_file_store_data_profile(
209
request: dlp.GetFileStoreDataProfileRequest,
210
*,
211
name: Optional[str] = None,
212
retry: OptionalRetry = gapic_v1.method.DEFAULT,
213
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
214
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
215
) -> dlp.FileStoreDataProfile: ...
216
```
217
218
[Data Profiling](./data-profiling.md)
219
220
### Stored Info Types
221
222
Custom sensitive information detection patterns for organization-specific data types. Extends built-in detectors with custom dictionaries, regular expressions, and machine learning models.
223
224
```python { .api }
225
def create_stored_info_type(
226
request: dlp.CreateStoredInfoTypeRequest,
227
*,
228
parent: Optional[str] = None,
229
config: Optional[dlp.StoredInfoTypeConfig] = None,
230
retry: OptionalRetry = gapic_v1.method.DEFAULT,
231
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
232
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
233
) -> dlp.StoredInfoType: ...
234
```
235
236
[Stored Info Types](./stored-info-types.md)
237
238
### Connection Management
239
240
External data source connections for accessing data outside Google Cloud, including database connections, cloud storage from other providers, and hybrid environments.
241
242
```python { .api }
243
def create_connection(
244
request: dlp.CreateConnectionRequest,
245
*,
246
parent: Optional[str] = None,
247
connection: Optional[dlp.Connection] = None,
248
retry: OptionalRetry = gapic_v1.method.DEFAULT,
249
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
250
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
251
) -> dlp.Connection: ...
252
```
253
254
[Connection Management](./connection-management.md)
255
256
## Core Types
257
258
### Client Classes
259
260
```python { .api }
261
class DlpServiceClient:
262
"""Synchronous client for Google Cloud DLP service operations."""
263
264
def __init__(
265
self,
266
*,
267
credentials: Optional[ga_credentials.Credentials] = None,
268
transport: Optional[DlpServiceTransport] = None,
269
client_options: Optional[ClientOptions] = None,
270
client_info: gapic_v1.client_info.ClientInfo = DEFAULT_CLIENT_INFO,
271
) -> None: ...
272
273
class DlpServiceAsyncClient:
274
"""Asynchronous client for Google Cloud DLP service operations."""
275
276
def __init__(
277
self,
278
*,
279
credentials: Optional[ga_credentials.Credentials] = None,
280
transport: Optional[DlpServiceAsyncTransport] = None,
281
client_options: Optional[ClientOptions] = None,
282
client_info: gapic_v1.client_info.ClientInfo = DEFAULT_CLIENT_INFO,
283
) -> None: ...
284
```
285
286
### Core Data Types
287
288
```python { .api }
289
class ContentItem:
290
"""Container for content to be inspected."""
291
292
value: str
293
table: Table
294
byte_item: ByteContentItem
295
296
class InfoType:
297
"""Type of information detector."""
298
299
name: str
300
version: str
301
sensitivity_score: SensitivityScore
302
303
class Finding:
304
"""Detected sensitive information."""
305
306
info_type: InfoType
307
likelihood: Likelihood
308
location: Location
309
quote: str
310
quote_info: QuoteInfo
311
312
class InspectConfig:
313
"""Configuration for content inspection."""
314
315
info_types: Sequence[InfoType]
316
min_likelihood: Likelihood
317
limits: InspectConfig.FindingLimits
318
include_quote: bool
319
exclude_info_types: bool
320
```
321
322
### Transformation Types
323
324
```python { .api }
325
class DeidentifyConfig:
326
"""Configuration for content de-identification."""
327
328
info_type_transformations: InfoTypeTransformations
329
record_transformations: RecordTransformations
330
transformation_error_handling: TransformationErrorHandling
331
332
class PrimitiveTransformation:
333
"""Basic data transformation operations."""
334
335
replace_config: ReplaceValueConfig
336
redact_config: RedactConfig
337
character_mask_config: CharacterMaskConfig
338
crypto_replace_ffx_fpe_config: CryptoReplaceFfxFpeConfig
339
fixed_size_bucketing_config: FixedSizeBucketingConfig
340
bucketing_config: BucketingConfig
341
replace_dictionary_config: ReplaceDictionaryConfig
342
time_part_config: TimePartConfig
343
crypto_hash_config: CryptoHashConfig
344
date_shift_config: DateShiftConfig
345
crypto_deterministic_config: CryptoDeterministicConfig
346
```
347
348
### Enumeration Types
349
350
```python { .api }
351
class Likelihood(proto.Enum):
352
"""Likelihood levels for detection confidence."""
353
354
LIKELIHOOD_UNSPECIFIED = 0
355
VERY_UNLIKELY = 1
356
UNLIKELY = 2
357
POSSIBLE = 3
358
LIKELY = 4
359
VERY_LIKELY = 5
360
361
class FileType(proto.Enum):
362
"""Supported file types for processing."""
363
364
FILE_TYPE_UNSPECIFIED = 0
365
BINARY_FILE = 1
366
TEXT_FILE = 2
367
IMAGE = 3
368
WORD = 5
369
PDF = 6
370
AVRO = 7
371
CSV = 8
372
TSV = 9
373
POWERPOINT = 11
374
EXCEL = 12
375
```