0
# Content Analysis
1
2
Real-time analysis of text and images to detect, redact, and transform sensitive information. Content analysis operations process data immediately and return results synchronously, making them ideal for interactive applications and small-scale data processing.
3
4
## Capabilities
5
6
### Content Inspection
7
8
Analyzes text content to identify sensitive information using built-in and custom detectors. Supports various content formats including plain text, structured tables, and document metadata.
9
10
```python { .api }
11
def inspect_content(
12
request: dlp.InspectContentRequest,
13
*,
14
retry: OptionalRetry = gapic_v1.method.DEFAULT,
15
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
16
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
17
) -> dlp.InspectContentResponse:
18
"""
19
Finds potentially sensitive info in content.
20
21
Args:
22
request: InspectContentRequest containing content and configuration
23
retry: Retry configuration for failed requests
24
timeout: Timeout for the request
25
metadata: Request metadata
26
27
Returns:
28
InspectContentResponse with detected findings
29
"""
30
```
31
32
#### Usage Example
33
34
```python
35
from google.cloud import dlp
36
37
client = dlp.DlpServiceClient()
38
parent = f"projects/{project_id}/locations/global"
39
40
# Configure what to detect
41
inspect_config = dlp.InspectConfig(
42
info_types=[
43
dlp.InfoType(name="PHONE_NUMBER"),
44
dlp.InfoType(name="EMAIL_ADDRESS"),
45
dlp.InfoType(name="CREDIT_CARD_NUMBER"),
46
],
47
min_likelihood=dlp.Likelihood.POSSIBLE,
48
include_quote=True,
49
)
50
51
# Content to inspect
52
content_item = dlp.ContentItem(
53
value="Contact John at john@example.com or 555-123-4567"
54
)
55
56
# Create and send request
57
request = dlp.InspectContentRequest(
58
parent=parent,
59
inspect_config=inspect_config,
60
item=content_item,
61
)
62
63
response = client.inspect_content(request=request)
64
65
# Process findings
66
for finding in response.result.findings:
67
print(f"Found {finding.info_type.name}: {finding.quote}")
68
print(f"Likelihood: {finding.likelihood}")
69
```
70
71
### Content De-identification
72
73
Transforms sensitive information in content using various techniques including masking, encryption, tokenization, and bucketing. Supports both reversible and irreversible transformations.
74
75
```python { .api }
76
def deidentify_content(
77
request: dlp.DeidentifyContentRequest,
78
*,
79
retry: OptionalRetry = gapic_v1.method.DEFAULT,
80
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
81
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
82
) -> dlp.DeidentifyContentResponse:
83
"""
84
De-identifies potentially sensitive info from a ContentItem.
85
86
Args:
87
request: DeidentifyContentRequest with content and transformation config
88
retry: Retry configuration for failed requests
89
timeout: Timeout for the request
90
metadata: Request metadata
91
92
Returns:
93
DeidentifyContentResponse with transformed content
94
"""
95
```
96
97
#### Usage Example
98
99
```python
100
from google.cloud import dlp
101
102
client = dlp.DlpServiceClient()
103
parent = f"projects/{project_id}/locations/global"
104
105
# Configure de-identification
106
deidentify_config = dlp.DeidentifyConfig(
107
info_type_transformations=dlp.InfoTypeTransformations(
108
transformations=[
109
dlp.InfoTypeTransformations.InfoTypeTransformation(
110
info_types=[dlp.InfoType(name="EMAIL_ADDRESS")],
111
primitive_transformation=dlp.PrimitiveTransformation(
112
character_mask_config=dlp.CharacterMaskConfig(
113
masking_character="*",
114
number_to_mask=5,
115
)
116
),
117
)
118
]
119
)
120
)
121
122
# Content to de-identify
123
content_item = dlp.ContentItem(
124
value="Contact support at support@example.com"
125
)
126
127
request = dlp.DeidentifyContentRequest(
128
parent=parent,
129
deidentify_config=deidentify_config,
130
item=content_item,
131
)
132
133
response = client.deidentify_content(request=request)
134
print(f"De-identified: {response.item.value}")
135
```
136
137
### Content Re-identification
138
139
Reverses de-identification transformations to restore original sensitive values. Only works with reversible transformations like deterministic encryption or tokenization.
140
141
```python { .api }
142
def reidentify_content(
143
request: dlp.ReidentifyContentRequest,
144
*,
145
retry: OptionalRetry = gapic_v1.method.DEFAULT,
146
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
147
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
148
) -> dlp.ReidentifyContentResponse:
149
"""
150
Re-identifies content that has been de-identified.
151
152
Args:
153
request: ReidentifyContentRequest with de-identified content and config
154
retry: Retry configuration for failed requests
155
timeout: Timeout for the request
156
metadata: Request metadata
157
158
Returns:
159
ReidentifyContentResponse with original content restored
160
"""
161
```
162
163
### Image Redaction
164
165
Redacts sensitive information from images by detecting and obscuring text within image files. Supports various image formats and redaction methods.
166
167
```python { .api }
168
def redact_image(
169
request: dlp.RedactImageRequest,
170
*,
171
retry: OptionalRetry = gapic_v1.method.DEFAULT,
172
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
173
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
174
) -> dlp.RedactImageResponse:
175
"""
176
Redacts potentially sensitive info from an image.
177
178
Args:
179
request: RedactImageRequest with image data and redaction config
180
retry: Retry configuration for failed requests
181
timeout: Timeout for the request
182
metadata: Request metadata
183
184
Returns:
185
RedactImageResponse with redacted image
186
"""
187
```
188
189
#### Usage Example
190
191
```python
192
from google.cloud import dlp
193
194
client = dlp.DlpServiceClient()
195
parent = f"projects/{project_id}/locations/global"
196
197
# Read image file
198
with open("document.png", "rb") as f:
199
image_data = f.read()
200
201
# Configure redaction
202
inspect_config = dlp.InspectConfig(
203
info_types=[dlp.InfoType(name="PHONE_NUMBER")]
204
)
205
206
image_redaction_config = dlp.RedactImageRequest.ImageRedactionConfig(
207
info_type=dlp.InfoType(name="PHONE_NUMBER"),
208
redact_all_text=False,
209
redaction_color=dlp.Color(red=0.5, green=0.5, blue=0.5),
210
)
211
212
request = dlp.RedactImageRequest(
213
parent=parent,
214
byte_item=dlp.ByteContentItem(
215
type_=dlp.ByteContentItem.BytesType.IMAGE_PNG,
216
data=image_data,
217
),
218
inspect_config=inspect_config,
219
image_redaction_configs=[image_redaction_config],
220
)
221
222
response = client.redact_image(request=request)
223
224
# Save redacted image
225
with open("redacted_document.png", "wb") as f:
226
f.write(response.redacted_image)
227
```
228
229
### Info Type Discovery
230
231
Lists all available sensitive information types that can be detected by the DLP API, including built-in detectors and custom stored info types.
232
233
```python { .api }
234
def list_info_types(
235
request: dlp.ListInfoTypesRequest,
236
*,
237
parent: Optional[str] = None,
238
retry: OptionalRetry = gapic_v1.method.DEFAULT,
239
timeout: Union[float, object] = gapic_v1.method.DEFAULT,
240
metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
241
) -> dlp.ListInfoTypesResponse:
242
"""
243
Returns a list of the sensitive information types that the DLP API supports.
244
245
Args:
246
request: ListInfoTypesRequest
247
parent: Parent resource name (format: locations/{location_id})
248
retry: Retry configuration for failed requests
249
timeout: Timeout for the request
250
metadata: Request metadata
251
252
Returns:
253
ListInfoTypesResponse with available info types
254
"""
255
```
256
257
#### Usage Example
258
259
```python
260
from google.cloud import dlp
261
262
client = dlp.DlpServiceClient()
263
parent = f"projects/{project_id}/locations/global"
264
265
request = dlp.ListInfoTypesRequest(parent=parent)
266
response = client.list_info_types(request=request)
267
268
print("Available Info Types:")
269
for info_type in response.info_types:
270
print(f"- {info_type.name}: {info_type.display_name}")
271
if info_type.description:
272
print(f" {info_type.description[:100]}...")
273
```
274
275
## Types
276
277
### Request Types
278
279
```python { .api }
280
class InspectContentRequest:
281
"""Request to inspect content for sensitive information."""
282
283
parent: str
284
inspect_config: InspectConfig
285
item: ContentItem
286
inspect_template_name: str
287
location_id: str
288
289
class DeidentifyContentRequest:
290
"""Request to de-identify sensitive content."""
291
292
parent: str
293
deidentify_config: DeidentifyConfig
294
inspect_config: InspectConfig
295
item: ContentItem
296
inspect_template_name: str
297
deidentify_template_name: str
298
location_id: str
299
300
class ReidentifyContentRequest:
301
"""Request to re-identify previously de-identified content."""
302
303
parent: str
304
reidentify_config: DeidentifyConfig
305
inspect_config: InspectConfig
306
item: ContentItem
307
inspect_template_name: str
308
reidentify_template_name: str
309
location_id: str
310
311
class RedactImageRequest:
312
"""Request to redact sensitive information from images."""
313
314
parent: str
315
location_id: str
316
inspect_config: InspectConfig
317
image_redaction_configs: Sequence[ImageRedactionConfig]
318
include_findings: bool
319
byte_item: ByteContentItem
320
321
class ListInfoTypesRequest:
322
"""Request to list available information types."""
323
324
parent: str
325
language_code: str
326
filter: str
327
location_id: str
328
```
329
330
### Response Types
331
332
```python { .api }
333
class InspectContentResponse:
334
"""Response from content inspection."""
335
336
result: InspectResult
337
338
class DeidentifyContentResponse:
339
"""Response from content de-identification."""
340
341
item: ContentItem
342
overview: TransformationOverview
343
344
class ReidentifyContentResponse:
345
"""Response from content re-identification."""
346
347
item: ContentItem
348
overview: TransformationOverview
349
350
class RedactImageResponse:
351
"""Response from image redaction."""
352
353
redacted_image: bytes
354
extracted_text: str
355
inspect_result: InspectResult
356
357
class ListInfoTypesResponse:
358
"""Response listing available information types."""
359
360
info_types: Sequence[InfoTypeDescription]
361
```
362
363
### Configuration Types
364
365
```python { .api }
366
class InspectConfig:
367
"""Configuration for content inspection."""
368
369
info_types: Sequence[InfoType]
370
min_likelihood: Likelihood
371
limits: FindingLimits
372
include_quote: bool
373
exclude_info_types: bool
374
custom_info_types: Sequence[CustomInfoType]
375
content_options: Sequence[ContentOption]
376
rule_set: Sequence[InspectionRuleSet]
377
378
class DeidentifyConfig:
379
"""Configuration for de-identification transformations."""
380
381
info_type_transformations: InfoTypeTransformations
382
record_transformations: RecordTransformations
383
transformation_error_handling: TransformationErrorHandling
384
385
class ContentItem:
386
"""Container for content to be processed."""
387
388
value: str
389
table: Table
390
byte_item: ByteContentItem
391
```