0
# Operations Management
1
2
Long-running operation monitoring and management for Google Kubernetes Engine operations. This module provides functionality for tracking cluster and node pool changes, monitoring operation status, and managing operation lifecycle.
3
4
## Capabilities
5
6
### Listing Operations
7
8
Retrieve all operations in a project within a specified zone or across all zones.
9
10
```python { .api }
11
def list_operations(
12
self,
13
request=None, *,
14
project_id=None,
15
zone=None,
16
parent=None,
17
retry=gapic_v1.method.DEFAULT,
18
timeout=None,
19
metadata=()
20
) -> ListOperationsResponse:
21
"""
22
Lists all operations in a project in a specific zone or all zones.
23
24
Args:
25
project_id (str): Deprecated. The Google Developers Console project ID or project number.
26
zone (str): Deprecated. The name of the Google Compute Engine zone.
27
parent (str): The parent (project and location) where the operations will be listed.
28
Format: projects/{project_id}/locations/{location}
29
retry: Retry configuration.
30
timeout (float): Request timeout in seconds.
31
metadata: Additional gRPC metadata.
32
33
Returns:
34
ListOperationsResponse: Response containing the list of operations.
35
"""
36
```
37
38
Usage example:
39
40
```python
41
from google.cloud import container
42
43
client = container.ClusterManagerClient()
44
45
# List all operations in a zone
46
operations = client.list_operations(
47
project_id="my-project",
48
zone="us-central1-a"
49
)
50
51
# Or use the new parent format
52
operations = client.list_operations(
53
parent="projects/my-project/locations/us-central1-a"
54
)
55
56
for operation in operations.operations:
57
print(f"Operation: {operation.name}")
58
print(f"Type: {operation.operation_type}")
59
print(f"Status: {operation.status}")
60
print(f"Target: {operation.target_link}")
61
if operation.start_time:
62
print(f"Started: {operation.start_time}")
63
if operation.end_time:
64
print(f"Ended: {operation.end_time}")
65
```
66
67
### Getting Operation Details
68
69
Retrieve detailed information about a specific operation.
70
71
```python { .api }
72
def get_operation(
73
self,
74
request=None, *,
75
project_id=None,
76
zone=None,
77
operation_id=None,
78
name=None,
79
retry=gapic_v1.method.DEFAULT,
80
timeout=None,
81
metadata=()
82
) -> Operation:
83
"""
84
Gets the specified operation.
85
86
Args:
87
project_id (str): Deprecated. The Google Developers Console project ID or project number.
88
zone (str): Deprecated. The name of the Google Compute Engine zone.
89
operation_id (str): Deprecated. The server-assigned name of the operation.
90
name (str): The name (project, location, operation) of the operation to get.
91
Format: projects/{project_id}/locations/{location}/operations/{operation_id}
92
retry: Retry configuration.
93
timeout (float): Request timeout in seconds.
94
metadata: Additional gRPC metadata.
95
96
Returns:
97
Operation: The operation information.
98
"""
99
```
100
101
Usage example:
102
103
```python
104
operation = client.get_operation(
105
project_id="my-project",
106
zone="us-central1-a",
107
operation_id="operation-1234567890123-5f2a7b4d-a1b2c3d4"
108
)
109
110
print(f"Operation name: {operation.name}")
111
print(f"Operation type: {operation.operation_type}")
112
print(f"Status: {operation.status}")
113
print(f"Detail: {operation.detail}")
114
115
if operation.progress:
116
print(f"Progress: {operation.progress}")
117
118
if operation.error:
119
print(f"Error: {operation.error}")
120
121
# Monitor operation status
122
import time
123
124
def wait_for_operation(client, operation_name):
125
"""Wait for an operation to complete."""
126
while True:
127
op = client.get_operation(name=operation_name)
128
129
if op.status == "DONE":
130
if op.error:
131
print(f"Operation failed: {op.error}")
132
return False
133
else:
134
print("Operation completed successfully")
135
return True
136
elif op.status == "ABORTING":
137
print("Operation is aborting")
138
return False
139
else:
140
print(f"Operation status: {op.status}")
141
time.sleep(10) # Wait 10 seconds before checking again
142
143
# Example usage
144
operation = client.create_cluster(...)
145
success = wait_for_operation(client, operation.name)
146
```
147
148
### Cancelling Operations
149
150
Cancel a running operation.
151
152
```python { .api }
153
def cancel_operation(
154
self,
155
request=None, *,
156
project_id=None,
157
zone=None,
158
operation_id=None,
159
name=None,
160
retry=gapic_v1.method.DEFAULT,
161
timeout=None,
162
metadata=()
163
) -> None:
164
"""
165
Cancels the specified operation.
166
167
Args:
168
project_id (str): Deprecated. The Google Developers Console project ID or project number.
169
zone (str): Deprecated. The name of the Google Compute Engine zone.
170
operation_id (str): Deprecated. The server-assigned name of the operation.
171
name (str): The name (project, location, operation) of the operation to cancel.
172
Format: projects/{project_id}/locations/{location}/operations/{operation_id}
173
retry: Retry configuration.
174
timeout (float): Request timeout in seconds.
175
metadata: Additional gRPC metadata.
176
"""
177
```
178
179
Usage example:
180
181
```python
182
# Cancel a running operation
183
client.cancel_operation(
184
project_id="my-project",
185
zone="us-central1-a",
186
operation_id="operation-1234567890123-5f2a7b4d-a1b2c3d4"
187
)
188
189
print("Operation cancellation requested")
190
191
# Verify cancellation
192
operation = client.get_operation(
193
project_id="my-project",
194
zone="us-central1-a",
195
operation_id="operation-1234567890123-5f2a7b4d-a1b2c3d4"
196
)
197
198
print(f"Operation status after cancellation: {operation.status}")
199
```
200
201
### Monitoring Operation Progress
202
203
Operations in GKE are long-running and can take several minutes to complete. The library provides ways to monitor progress:
204
205
```python
206
def monitor_operation_with_callback(client, operation_name, callback=None):
207
"""
208
Monitor operation with optional progress callback.
209
210
Args:
211
client: ClusterManagerClient instance
212
operation_name: Full operation name
213
callback: Optional function called with operation progress
214
215
Returns:
216
Boolean indicating success/failure
217
"""
218
import time
219
220
while True:
221
operation = client.get_operation(name=operation_name)
222
223
# Call progress callback if provided
224
if callback:
225
callback(operation)
226
227
if operation.status == "DONE":
228
if operation.error:
229
print(f"Operation failed: {operation.error.message}")
230
return False
231
else:
232
print("Operation completed successfully")
233
return True
234
235
elif operation.status in ["ABORTING", "ABORTED"]:
236
print(f"Operation was aborted: {operation.status_message}")
237
return False
238
239
else:
240
# Operation is still running
241
progress_info = []
242
if operation.progress:
243
if hasattr(operation.progress, 'stages'):
244
for stage in operation.progress.stages:
245
progress_info.append(f"{stage.name}: {stage.status}")
246
247
status_msg = f"Status: {operation.status}"
248
if progress_info:
249
status_msg += f" - {', '.join(progress_info)}"
250
if operation.status_message:
251
status_msg += f" - {operation.status_message}"
252
253
print(status_msg)
254
time.sleep(15) # Check every 15 seconds
255
256
# Example with progress callback
257
def progress_callback(operation):
258
print(f"Operation {operation.name}: {operation.status}")
259
if operation.progress and operation.progress.stages:
260
for stage in operation.progress.stages:
261
print(f" Stage {stage.name}: {stage.status}")
262
263
# Use the monitor
264
operation = client.create_cluster(...)
265
success = monitor_operation_with_callback(
266
client,
267
operation.name,
268
callback=progress_callback
269
)
270
```
271
272
## Types
273
274
```python { .api }
275
class ListOperationsRequest:
276
"""ListOperationsRequest lists operations."""
277
project_id: str # Deprecated
278
zone: str # Deprecated
279
parent: str # Required. Format: projects/{project}/locations/{location}
280
281
class ListOperationsResponse:
282
"""ListOperationsResponse is the result of ListOperationsRequest."""
283
operations: MutableSequence[Operation]
284
missing_zones: MutableSequence[str]
285
286
class GetOperationRequest:
287
"""GetOperationRequest gets a single operation."""
288
project_id: str # Deprecated
289
zone: str # Deprecated
290
operation_id: str # Deprecated
291
name: str # Required. Format: projects/{project}/locations/{location}/operations/{operation}
292
293
class CancelOperationRequest:
294
"""CancelOperationRequest cancels a single operation."""
295
project_id: str # Deprecated
296
zone: str # Deprecated
297
operation_id: str # Deprecated
298
name: str # Required. Format: projects/{project}/locations/{location}/operations/{operation}
299
300
class Operation:
301
"""This operation resource represents operations that may have happened or are happening on the cluster."""
302
name: str # The server-assigned ID for the operation
303
zone: str # The name of the Google Compute Engine zone (deprecated)
304
operation_type: str # The operation type
305
status: str # The current status of the operation
306
detail: str # Detailed operation progress, if available
307
status_message: str # Output only. If an error has occurred, a textual description of the error
308
self_link: str # Server-defined URL for this resource
309
target_link: str # Server-defined URL for the target of the operation
310
location: str # The name of the Google Compute Engine location
311
start_time: str # The time the operation started
312
end_time: str # The time the operation completed
313
progress: OperationProgress # Output only. Progress information for an operation
314
cluster_conditions: MutableSequence[StatusCondition] # Which conditions caused the current cluster state
315
nodepool_conditions: MutableSequence[StatusCondition] # Which conditions caused the current node pool state
316
error: Status # The error result of the operation in case of failure
317
318
class OperationProgress:
319
"""Information about operation (or operation stage) progress."""
320
name: str # A non-parameterized string describing an operation stage
321
status: str # Status of an operation stage
322
metrics: MutableSequence[OperationProgress.Metric] # Progress metric bundle
323
stages: MutableSequence[OperationProgress] # Substages of an operation or a stage
324
325
class Metric:
326
"""Progress metric is (string, int|float|string) pair."""
327
name: str # Required. Metric name
328
int_value: int # For metrics with integer value
329
double_value: float # For metrics with floating point value
330
string_value: str # For metrics with string value
331
332
class StatusCondition:
333
"""StatusCondition describes why a cluster or a node pool has a certain status."""
334
code: str # Machine-friendly representation of the condition
335
message: str # Human-friendly representation of the condition
336
canonical_code: str # Canonical code of the condition
337
```
338
339
## Operation Types
340
341
Common operation types you'll encounter:
342
343
- `CREATE_CLUSTER` - Cluster creation
344
- `DELETE_CLUSTER` - Cluster deletion
345
- `UPGRADE_MASTER` - Master version upgrade
346
- `UPGRADE_NODES` - Node version upgrade
347
- `REPAIR_CLUSTER` - Cluster repair
348
- `UPDATE_CLUSTER` - Cluster configuration update
349
- `CREATE_NODE_POOL` - Node pool creation
350
- `DELETE_NODE_POOL` - Node pool deletion
351
- `SET_NODE_POOL_MANAGEMENT` - Node pool management update
352
- `AUTO_REPAIR_NODES` - Automatic node repair
353
- `AUTO_UPGRADE_NODES` - Automatic node upgrade
354
- `SET_LABELS` - Label updates
355
- `SET_MASTER_AUTH` - Master authentication updates
356
- `SET_NODE_POOL_SIZE` - Node pool size changes
357
- `SET_NETWORK_POLICY` - Network policy updates
358
- `SET_MAINTENANCE_POLICY` - Maintenance policy updates
359
360
## Operation Status Values
361
362
- `PENDING` - Operation is queued
363
- `RUNNING` - Operation is in progress
364
- `DONE` - Operation completed successfully
365
- `ABORTING` - Operation is being cancelled
366
- `ABORTED` - Operation was cancelled