0
# Google Cloud Data Catalog
1
2
Google Cloud Data Catalog is a fully managed and highly scalable data discovery and metadata management service. It provides comprehensive APIs for cataloging, organizing, and managing metadata for data assets across Google Cloud services and beyond.
3
4
## Package Information
5
6
- **Package Name**: google-cloud-datacatalog
7
- **Language**: Python
8
- **Installation**: `pip install google-cloud-datacatalog`
9
10
## Core Imports
11
12
```python
13
from google.cloud import datacatalog_v1
14
```
15
16
Individual client imports:
17
18
```python
19
from google.cloud.datacatalog_v1 import DataCatalogClient
20
from google.cloud.datacatalog_v1 import PolicyTagManagerClient
21
from google.cloud.datacatalog_v1 import PolicyTagManagerSerializationClient
22
```
23
24
Async client imports:
25
26
```python
27
from google.cloud.datacatalog_v1 import DataCatalogAsyncClient
28
from google.cloud.datacatalog_v1 import PolicyTagManagerAsyncClient
29
from google.cloud.datacatalog_v1 import PolicyTagManagerSerializationAsyncClient
30
```
31
32
Import types and request/response objects:
33
34
```python
35
from google.cloud.datacatalog_v1.types import (
36
Entry, EntryGroup, Tag, TagTemplate, PolicyTag, Taxonomy,
37
CreateEntryRequest, SearchCatalogRequest, # ... other types
38
)
39
```
40
41
## Basic Usage
42
43
```python
44
from google.cloud import datacatalog_v1
45
46
# Create a client
47
client = datacatalog_v1.DataCatalogClient()
48
49
# Search catalog
50
search_request = datacatalog_v1.SearchCatalogRequest(
51
scope=datacatalog_v1.SearchCatalogRequest.Scope(
52
include_org_ids=["my-org-id"]
53
),
54
query="type=table"
55
)
56
search_results = client.search_catalog(request=search_request)
57
58
# Create entry group
59
entry_group = datacatalog_v1.EntryGroup(
60
display_name="My Entry Group",
61
description="A sample entry group"
62
)
63
create_entry_group_request = datacatalog_v1.CreateEntryGroupRequest(
64
parent="projects/my-project/locations/us-central1",
65
entry_group_id="my-entry-group",
66
entry_group=entry_group
67
)
68
created_group = client.create_entry_group(request=create_entry_group_request)
69
70
# Create entry
71
entry = datacatalog_v1.Entry(
72
display_name="My Table",
73
description="A sample table entry",
74
type_=datacatalog_v1.EntryType.TABLE
75
)
76
create_entry_request = datacatalog_v1.CreateEntryRequest(
77
parent=created_group.name,
78
entry_id="my-table",
79
entry=entry
80
)
81
created_entry = client.create_entry(request=create_entry_request)
82
```
83
84
## Architecture
85
86
The Data Catalog API is organized around three main services:
87
88
- **DataCatalogClient**: Core catalog operations including entry management, search, tagging, and templates
89
- **PolicyTagManagerClient**: Data governance through hierarchical policy tags and taxonomies for access control
90
- **PolicyTagManagerSerializationClient**: Import/export capabilities for policy taxonomies across regions
91
92
The API supports both synchronous and asynchronous operations, with comprehensive pagination support for list operations and long-running operations for bulk imports and reconciliation tasks.
93
94
## Capabilities
95
96
### Data Catalog Management
97
98
Core catalog operations including search, entry groups, entries, tagging, and tag templates. This is the primary interface for discovering and managing metadata about data assets.
99
100
```python { .api }
101
class DataCatalogClient:
102
def search_catalog(self, request: SearchCatalogRequest = None, **kwargs) -> SearchCatalogPager: ...
103
def create_entry_group(self, request: CreateEntryGroupRequest = None, **kwargs) -> EntryGroup: ...
104
def get_entry_group(self, request: GetEntryGroupRequest = None, **kwargs) -> EntryGroup: ...
105
def update_entry_group(self, request: UpdateEntryGroupRequest = None, **kwargs) -> EntryGroup: ...
106
def delete_entry_group(self, request: DeleteEntryGroupRequest = None, **kwargs) -> None: ...
107
def list_entry_groups(self, request: ListEntryGroupsRequest = None, **kwargs) -> ListEntryGroupsPager: ...
108
def create_entry(self, request: CreateEntryRequest = None, **kwargs) -> Entry: ...
109
def get_entry(self, request: GetEntryRequest = None, **kwargs) -> Entry: ...
110
def update_entry(self, request: UpdateEntryRequest = None, **kwargs) -> Entry: ...
111
def delete_entry(self, request: DeleteEntryRequest = None, **kwargs) -> None: ...
112
def list_entries(self, request: ListEntriesRequest = None, **kwargs) -> ListEntriesPager: ...
113
def lookup_entry(self, request: LookupEntryRequest = None, **kwargs) -> Entry: ...
114
def set_iam_policy(self, request: SetIamPolicyRequest = None, **kwargs) -> Policy: ...
115
def get_iam_policy(self, request: GetIamPolicyRequest = None, **kwargs) -> Policy: ...
116
def test_iam_permissions(self, request: TestIamPermissionsRequest = None, **kwargs) -> TestIamPermissionsResponse: ...
117
```
118
119
[Data Catalog Management](./data-catalog.md)
120
121
### Policy Tag Management
122
123
Data governance through hierarchical taxonomies and policy tags for fine-grained access control. Enables creation and management of data classification policies.
124
125
```python { .api }
126
class PolicyTagManagerClient:
127
def create_taxonomy(self, request: CreateTaxonomyRequest = None, **kwargs) -> Taxonomy: ...
128
def get_taxonomy(self, request: GetTaxonomyRequest = None, **kwargs) -> Taxonomy: ...
129
def update_taxonomy(self, request: UpdateTaxonomyRequest = None, **kwargs) -> Taxonomy: ...
130
def delete_taxonomy(self, request: DeleteTaxonomyRequest = None, **kwargs) -> None: ...
131
def list_taxonomies(self, request: ListTaxonomiesRequest = None, **kwargs) -> ListTaxonomiesPager: ...
132
def create_policy_tag(self, request: CreatePolicyTagRequest = None, **kwargs) -> PolicyTag: ...
133
def get_policy_tag(self, request: GetPolicyTagRequest = None, **kwargs) -> PolicyTag: ...
134
def update_policy_tag(self, request: UpdatePolicyTagRequest = None, **kwargs) -> PolicyTag: ...
135
def delete_policy_tag(self, request: DeletePolicyTagRequest = None, **kwargs) -> None: ...
136
def list_policy_tags(self, request: ListPolicyTagsRequest = None, **kwargs) -> ListPolicyTagsPager: ...
137
```
138
139
[Policy Tag Management](./policy-tags.md)
140
141
### Taxonomy Serialization
142
143
Import and export capabilities for taxonomies, enabling cross-regional taxonomy management and backup/restore operations.
144
145
```python { .api }
146
class PolicyTagManagerSerializationClient:
147
def replace_taxonomy(self, request: ReplaceTaxonomyRequest = None, **kwargs) -> Taxonomy: ...
148
def import_taxonomies(self, request: ImportTaxonomiesRequest = None, **kwargs) -> ImportTaxonomiesResponse: ...
149
def export_taxonomies(self, request: ExportTaxonomiesRequest = None, **kwargs) -> ExportTaxonomiesResponse: ...
150
```
151
152
[Taxonomy Serialization](./taxonomy-serialization.md)
153
154
### Tag Templates and Tags
155
156
Custom metadata schema definition and attachment to catalog resources. Tag templates define the structure of custom metadata that can be attached to entries.
157
158
```python { .api }
159
class DataCatalogClient:
160
def create_tag_template(self, request: CreateTagTemplateRequest = None, **kwargs) -> TagTemplate: ...
161
def get_tag_template(self, request: GetTagTemplateRequest = None, **kwargs) -> TagTemplate: ...
162
def update_tag_template(self, request: UpdateTagTemplateRequest = None, **kwargs) -> TagTemplate: ...
163
def delete_tag_template(self, request: DeleteTagTemplateRequest = None, **kwargs) -> None: ...
164
def create_tag(self, request: CreateTagRequest = None, **kwargs) -> Tag: ...
165
def update_tag(self, request: UpdateTagRequest = None, **kwargs) -> Tag: ...
166
def delete_tag(self, request: DeleteTagRequest = None, **kwargs) -> None: ...
167
def list_tags(self, request: ListTagsRequest = None, **kwargs) -> ListTagsPager: ...
168
```
169
170
[Tag Templates and Tags](./tags.md)
171
172
### Entry Metadata Management
173
174
Management of entry overview information, contacts, and starring functionality for organizing and maintaining entry metadata.
175
176
```python { .api }
177
class DataCatalogClient:
178
def modify_entry_overview(self, request: ModifyEntryOverviewRequest = None, **kwargs) -> EntryOverview: ...
179
def modify_entry_contacts(self, request: ModifyEntryContactsRequest = None, **kwargs) -> Contacts: ...
180
def star_entry(self, request: StarEntryRequest = None, **kwargs) -> StarEntryResponse: ...
181
def unstar_entry(self, request: UnstarEntryRequest = None, **kwargs) -> UnstarEntryResponse: ...
182
```
183
184
[Entry Metadata Management](./entry-metadata.md)
185
186
### Bulk Operations
187
188
Long-running operations for bulk entry import and tag reconciliation, designed for large-scale metadata management tasks.
189
190
```python { .api }
191
class DataCatalogClient:
192
def import_entries(self, request: ImportEntriesRequest = None, **kwargs) -> Operation: ...
193
def reconcile_tags(self, request: ReconcileTagsRequest = None, **kwargs) -> Operation: ...
194
```
195
196
[Bulk Operations](./bulk-operations.md)
197
198
## Core Types
199
200
```python { .api }
201
# Primary Resources
202
class Entry:
203
name: str
204
linked_resource: str
205
fully_qualified_name: str
206
display_name: str
207
description: str
208
business_context: BusinessContext
209
schema: Schema
210
source_system_timestamps: SystemTimestamps
211
usage_signal: UsageSignal
212
integrated_system: IntegratedSystem
213
user_specified_type: str
214
user_specified_system: str
215
personal_details: PersonalDetails
216
contacts: Contacts
217
labels: MutableMapping[str, str]
218
type_: EntryType
219
220
class EntryGroup:
221
name: str
222
display_name: str
223
description: str
224
data_catalog_timestamps: SystemTimestamps
225
226
class Tag:
227
name: str
228
template: str
229
template_display_name: str
230
column: str
231
fields: MutableMapping[str, TagField]
232
233
class TagTemplate:
234
name: str
235
display_name: str
236
is_publicly_readable: bool
237
fields: MutableMapping[str, TagTemplateField]
238
dataplex_transfer_status: DataplexTransferStatus
239
240
class Taxonomy:
241
name: str
242
display_name: str
243
description: str
244
policy_tag_count: int
245
taxonomy_timestamps: SystemTimestamps
246
activated_policy_types: Sequence[PolicyType]
247
service: ManagingSystem
248
249
class PolicyTag:
250
name: str
251
display_name: str
252
description: str
253
parent_policy_tag: str
254
child_policy_tags: Sequence[str]
255
256
# Search and Response Types
257
class SearchCatalogResult:
258
search_result_type: SearchResultType
259
search_result_subtype: str
260
relative_resource_name: str
261
linked_resource: str
262
modify_time: timestamp_pb2.Timestamp
263
integrated_system: IntegratedSystem
264
user_specified_system: str
265
fully_qualified_name: str
266
display_name: str
267
description: str
268
269
class Schema:
270
columns: Sequence[ColumnSchema]
271
272
class ColumnSchema:
273
column: str
274
type_: str
275
description: str
276
mode: str
277
default_value: str
278
ordinal_position: int
279
highest_indexing_type: IndexingType
280
subcolumns: Sequence['ColumnSchema']
281
looker_column_spec: LookerColumnSpec
282
range_element_type: RangeElementType
283
gc_rule: str
284
285
# IAM Types
286
class Policy:
287
version: int
288
bindings: Sequence[Binding]
289
etag: bytes
290
291
class Binding:
292
role: str
293
members: Sequence[str]
294
condition: Expr
295
296
class SetIamPolicyRequest:
297
resource: str
298
policy: Policy
299
300
class GetIamPolicyRequest:
301
resource: str
302
options: GetPolicyOptions
303
304
class TestIamPermissionsRequest:
305
resource: str
306
permissions: Sequence[str]
307
308
class TestIamPermissionsResponse:
309
permissions: Sequence[str]
310
```
311
312
## Enums
313
314
```python { .api }
315
class EntryType(proto.Enum):
316
ENTRY_TYPE_UNSPECIFIED = 0
317
TABLE = 2
318
MODEL = 5
319
DATA_STREAM = 3
320
FILESET = 4
321
CLUSTER = 6
322
DATABASE = 7
323
DATA_SOURCE_CONNECTION = 8
324
ROUTINE = 9
325
LAKE = 10
326
ZONE = 11
327
SERVICE = 14
328
DATABASE_SCHEMA = 15
329
DASHBOARD = 16
330
EXPLORE = 17
331
LOOK = 18
332
333
class SearchResultType(proto.Enum):
334
SEARCH_RESULT_TYPE_UNSPECIFIED = 0
335
ENTRY = 1
336
TAG_TEMPLATE = 2
337
ENTRY_GROUP = 3
338
339
class IntegratedSystem(proto.Enum):
340
INTEGRATED_SYSTEM_UNSPECIFIED = 0
341
BIGQUERY = 1
342
CLOUD_PUBSUB = 2
343
DATAPROC_METASTORE = 3
344
DATAPLEX = 4
345
CLOUD_SQL = 5
346
CLOUD_BIGTABLE = 6
347
CLOUD_DATAFLOW = 7
348
CLOUD_DATAPROC = 8
349
CLOUD_DATAPREP = 9
350
CLOUD_COMPOSER = 10
351
CLOUD_SPANNER = 11
352
VERTEX_AI = 12
353
LOOKER = 13
354
CLOUD_STORAGE = 14
355
```
356
357
## Pager Types
358
359
```python { .api }
360
class SearchCatalogPager:
361
"""Pager for search_catalog method results"""
362
def __iter__(self) -> Iterator[SearchCatalogResult]: ...
363
364
class ListEntryGroupsPager:
365
"""Pager for list_entry_groups method results"""
366
def __iter__(self) -> Iterator[EntryGroup]: ...
367
368
class ListEntriesPager:
369
"""Pager for list_entries method results"""
370
def __iter__(self) -> Iterator[Entry]: ...
371
372
class ListTagsPager:
373
"""Pager for list_tags method results"""
374
def __iter__(self) -> Iterator[Tag]: ...
375
376
class ListTaxonomiesPager:
377
"""Pager for list_taxonomies method results"""
378
def __iter__(self) -> Iterator[Taxonomy]: ...
379
380
class ListPolicyTagsPager:
381
"""Pager for list_policy_tags method results"""
382
def __iter__(self) -> Iterator[PolicyTag]: ...
383
```