0
# Google Cloud Dataproc Metastore
1
2
A Python client library for Google Cloud Dataproc Metastore, a fully managed, highly available, autoscaled, autohealing, OSS-native metastore service that greatly simplifies technical metadata management. Built on Apache Hive metastore, it serves as a critical component for enterprise data lakes.
3
4
## Package Information
5
6
- **Package Name**: google-cloud-dataproc-metastore
7
- **Language**: Python
8
- **Installation**: `pip install google-cloud-dataproc-metastore`
9
10
## Core Imports
11
12
```python
13
from google.cloud import metastore
14
```
15
16
Version-specific imports:
17
18
```python
19
from google.cloud import metastore_v1
20
from google.cloud import metastore_v1alpha
21
from google.cloud import metastore_v1beta
22
```
23
24
## Basic Usage
25
26
```python
27
from google.cloud import metastore
28
29
# Initialize the client
30
client = metastore.DataprocMetastoreClient()
31
32
# List all metastore services in a location
33
parent = "projects/my-project/locations/us-central1"
34
services = client.list_services(parent=parent)
35
36
for service in services:
37
print(f"Service: {service.name}")
38
print(f"State: {service.state}")
39
print(f"Endpoint URI: {service.endpoint_uri}")
40
41
# Get a specific service
42
service_name = "projects/my-project/locations/us-central1/services/my-metastore"
43
service = client.get_service(name=service_name)
44
print(f"Service tier: {service.tier}")
45
print(f"Hive version: {service.hive_metastore_config.version}")
46
47
# Create a new backup
48
backup_request = metastore.CreateBackupRequest(
49
parent="projects/my-project/locations/us-central1/services/my-metastore",
50
backup_id="my-backup",
51
backup=metastore.Backup(
52
description="Automated backup for disaster recovery"
53
)
54
)
55
operation = client.create_backup(request=backup_request)
56
backup = operation.result() # Wait for completion
57
print(f"Backup created: {backup.name}")
58
```
59
60
## Architecture
61
62
The Google Cloud Dataproc Metastore client library follows Google's standard client library patterns:
63
64
- **Client Classes**: Synchronous and asynchronous clients for different API versions
65
- **Resource Management**: Standardized CRUD operations for services, backups, and federations
66
- **Long-Running Operations**: Built-in support for async operations with progress tracking
67
- **Authentication**: Integrated with Google Cloud authentication (ADC, service accounts)
68
- **Error Handling**: Comprehensive error handling with retry logic and timeout configuration
69
- **Paging**: Automatic handling of paginated API responses
70
71
## Capabilities
72
73
### Service Management
74
75
Comprehensive lifecycle management for Dataproc Metastore services including creation, configuration, updates, and deletion. Supports multiple service tiers and Hive metastore versions with advanced networking and security options.
76
77
```python { .api }
78
class DataprocMetastoreClient:
79
def list_services(self, request=None, *, parent=None, **kwargs): ...
80
def get_service(self, request=None, *, name=None, **kwargs): ...
81
def create_service(self, request=None, *, parent=None, service=None, service_id=None, **kwargs): ...
82
def update_service(self, request=None, *, service=None, update_mask=None, **kwargs): ...
83
def delete_service(self, request=None, *, name=None, **kwargs): ...
84
```
85
86
[Service Management](./service-management.md)
87
88
### Backup and Restore Operations
89
90
Complete backup and restore functionality for metastore services including scheduled backups, point-in-time recovery, and cross-region backup management for disaster recovery scenarios.
91
92
```python { .api }
93
class DataprocMetastoreClient:
94
def list_backups(self, request=None, *, parent=None, **kwargs): ...
95
def get_backup(self, request=None, *, name=None, **kwargs): ...
96
def create_backup(self, request=None, *, parent=None, backup=None, backup_id=None, **kwargs): ...
97
def delete_backup(self, request=None, *, name=None, **kwargs): ...
98
def restore_service(self, request=None, *, service=None, backup=None, **kwargs): ...
99
```
100
101
[Backup and Restore](./backup-restore.md)
102
103
### Metadata Import and Export
104
105
Import metadata from external sources and export metastore data to Google Cloud Storage. Supports various database formats including MySQL and PostgreSQL dumps with comprehensive validation and error handling.
106
107
```python { .api }
108
class DataprocMetastoreClient:
109
def list_metadata_imports(self, request=None, *, parent=None, **kwargs): ...
110
def get_metadata_import(self, request=None, *, name=None, **kwargs): ...
111
def create_metadata_import(self, request=None, *, parent=None, metadata_import=None, metadata_import_id=None, **kwargs): ...
112
def update_metadata_import(self, request=None, *, metadata_import=None, update_mask=None, **kwargs): ...
113
def export_metadata(self, request=None, *, service=None, **kwargs): ...
114
```
115
116
[Metadata Import and Export](./metadata-import-export.md)
117
118
### Federation Management
119
120
Manage metastore federation services that provide unified access to multiple backend metastores. Supports cross-cloud and multi-region federation scenarios for enterprise data lake architectures.
121
122
```python { .api }
123
class DataprocMetastoreFederationClient:
124
def list_federations(self, request=None, *, parent=None, **kwargs): ...
125
def get_federation(self, request=None, *, name=None, **kwargs): ...
126
def create_federation(self, request=None, *, parent=None, federation=None, federation_id=None, **kwargs): ...
127
def update_federation(self, request=None, *, federation=None, update_mask=None, **kwargs): ...
128
def delete_federation(self, request=None, *, name=None, **kwargs): ...
129
```
130
131
[Federation Management](./federation-management.md)
132
133
### Metadata Query Operations
134
135
Execute Hive and Spark SQL queries directly against metastore metadata for advanced analytics and metadata management operations including table movement and resource location management.
136
137
```python { .api }
138
class DataprocMetastoreClient:
139
def query_metadata(self, request=None, *, service=None, query=None, **kwargs): ...
140
def move_table_to_database(self, request=None, *, service=None, table_name=None, db_name=None, destination_db_name=None, **kwargs): ...
141
def alter_metadata_resource_location(self, request=None, *, service=None, resource_name=None, location_uri=None, **kwargs): ...
142
```
143
144
[Metadata Query Operations](./metadata-query.md)
145
146
### Asynchronous Operations
147
148
Asynchronous client implementations for all operations with full async/await support, enabling high-performance concurrent operations and integration with async Python frameworks.
149
150
```python { .api }
151
class DataprocMetastoreAsyncClient:
152
async def list_services(self, request=None, *, parent=None, **kwargs): ...
153
async def get_service(self, request=None, *, name=None, **kwargs): ...
154
async def create_service(self, request=None, *, parent=None, service=None, service_id=None, **kwargs): ...
155
# ... all methods have async equivalents
156
```
157
158
[Asynchronous Operations](./async-operations.md)
159
160
## Common Types
161
162
```python { .api }
163
# Service states
164
class Service:
165
class State(enum.Enum):
166
CREATING = 1
167
ACTIVE = 2
168
SUSPENDING = 3
169
SUSPENDED = 4
170
UPDATING = 5
171
DELETING = 6
172
ERROR = 7
173
174
class Tier(enum.Enum):
175
DEVELOPER = 1
176
ENTERPRISE = 3
177
178
class ReleaseChannel(enum.Enum):
179
CANARY = 1
180
STABLE = 2
181
182
# Configuration classes
183
class HiveMetastoreConfig:
184
version: str
185
config_overrides: Dict[str, str]
186
kerberos_config: Optional[KerberosConfig]
187
auxiliary_versions: List[AuxiliaryVersionConfig]
188
189
class NetworkConfig:
190
consumers: List[NetworkConsumer]
191
enable_private_ip: bool
192
193
class EncryptionConfig:
194
kms_key: str
195
196
# Resource path helpers
197
def service_path(project: str, location: str, service: str) -> str: ...
198
def backup_path(project: str, location: str, service: str, backup: str) -> str: ...
199
def federation_path(project: str, location: str, federation: str) -> str: ...
200
```