0
# Image Generation
1
2
Generate, edit, upscale, and segment images using Imagen models. Provides comprehensive image generation and manipulation capabilities including text-to-image generation, image editing with masks, upscaling for enhanced resolution, image recontextualization, and semantic segmentation.
3
4
## Capabilities
5
6
### Generate Images
7
8
Generate images from text prompts using Imagen models. Supports various aspect ratios, safety filtering, and configuration options.
9
10
```python { .api }
11
def generate_images(
12
*,
13
model: str,
14
prompt: str,
15
config: Optional[GenerateImagesConfig] = None
16
) -> GenerateImagesResponse:
17
"""
18
Generate images from a text prompt.
19
20
Parameters:
21
model (str): Model identifier (e.g., 'imagen-3.0-generate-001', 'imagen-3.0-fast-generate-001').
22
prompt (str): Text description of the image to generate. More detailed prompts
23
typically produce better results.
24
config (GenerateImagesConfig, optional): Generation configuration including:
25
- number_of_images: Number of images to generate (1-8)
26
- aspect_ratio: Image aspect ratio ('1:1', '16:9', '9:16', '4:3', '3:4')
27
- negative_prompt: What to avoid in the image
28
- safety_filter_level: Safety filtering level
29
- person_generation: Control person generation in images
30
- language: Language of the prompt
31
32
Returns:
33
GenerateImagesResponse: Response containing generated images and metadata.
34
35
Raises:
36
ClientError: For client errors (4xx status codes)
37
ServerError: For server errors (5xx status codes)
38
"""
39
...
40
41
async def generate_images(
42
*,
43
model: str,
44
prompt: str,
45
config: Optional[GenerateImagesConfig] = None
46
) -> GenerateImagesResponse:
47
"""Async version of generate_images."""
48
...
49
```
50
51
**Usage Example:**
52
53
```python
54
from google.genai import Client
55
from google.genai.types import GenerateImagesConfig
56
57
client = Client(vertexai=True, project='PROJECT_ID', location='us-central1')
58
59
config = GenerateImagesConfig(
60
number_of_images=4,
61
aspect_ratio='16:9',
62
negative_prompt='blurry, low quality',
63
safety_filter_level='block_some'
64
)
65
66
response = client.models.generate_images(
67
model='imagen-3.0-generate-001',
68
prompt='A serene mountain landscape at sunset with a lake',
69
config=config
70
)
71
72
for i, image in enumerate(response.generated_images):
73
image.pil_image.save(f'generated_{i}.png')
74
```
75
76
### Edit Image
77
78
Edit existing images using text prompts and optional masks to specify editing regions.
79
80
```python { .api }
81
def edit_image(
82
*,
83
model: str,
84
prompt: str,
85
reference_images: Sequence[ReferenceImage],
86
config: Optional[EditImageConfig] = None
87
) -> EditImageResponse:
88
"""
89
Edit an image using a text prompt and reference images.
90
91
Parameters:
92
model (str): Model identifier (e.g., 'imagen-3.0-capability-001').
93
prompt (str): Text description of desired edits.
94
reference_images (Sequence[ReferenceImage]): Reference images for editing.
95
Can include base image, mask image, and control images.
96
config (EditImageConfig, optional): Editing configuration including:
97
- number_of_images: Number of edited variations
98
- negative_prompt: What to avoid
99
- edit_mode: Editing mode (INPAINT, OUTPAINT, etc.)
100
- mask_mode: Mask interpretation mode
101
- mask_dilation: Mask dilation amount
102
- guidance_scale: Prompt adherence strength
103
- safety_filter_level: Safety filtering
104
105
Returns:
106
EditImageResponse: Response containing edited images.
107
108
Raises:
109
ClientError: For client errors (4xx status codes)
110
ServerError: For server errors (5xx status codes)
111
"""
112
...
113
114
async def edit_image(
115
*,
116
model: str,
117
prompt: str,
118
reference_images: Sequence[ReferenceImage],
119
config: Optional[EditImageConfig] = None
120
) -> EditImageResponse:
121
"""Async version of edit_image."""
122
...
123
```
124
125
**Usage Example:**
126
127
```python
128
from google.genai import Client
129
from google.genai.types import (
130
EditImageConfig,
131
ReferenceImage,
132
Image,
133
EditMode
134
)
135
136
client = Client(vertexai=True, project='PROJECT_ID', location='us-central1')
137
138
# Load images
139
base_image = Image.from_file('photo.jpg')
140
mask_image = Image.from_file('mask.png') # White areas will be edited
141
142
reference_images = [
143
ReferenceImage(reference_type='reference_image', reference_image=base_image),
144
ReferenceImage(reference_type='mask_image', reference_image=mask_image)
145
]
146
147
config = EditImageConfig(
148
number_of_images=2,
149
edit_mode=EditMode.INPAINT_INSERTION,
150
negative_prompt='distorted, blurry'
151
)
152
153
response = client.models.edit_image(
154
model='imagen-3.0-capability-001',
155
prompt='A red sports car',
156
reference_images=reference_images,
157
config=config
158
)
159
160
for i, image in enumerate(response.generated_images):
161
image.pil_image.save(f'edited_{i}.png')
162
```
163
164
### Upscale Image
165
166
Upscale images to higher resolutions while preserving quality and details.
167
168
```python { .api }
169
def upscale_image(
170
*,
171
model: str,
172
image: Image,
173
upscale_factor: str,
174
config: Optional[UpscaleImageConfig] = None
175
) -> UpscaleImageResponse:
176
"""
177
Upscale an image to higher resolution.
178
179
Parameters:
180
model (str): Model identifier (e.g., 'imagen-3.0-capability-001').
181
image (Image): Image to upscale.
182
upscale_factor (str): Upscaling factor ('x2' or 'x4').
183
config (UpscaleImageConfig, optional): Upscaling configuration including:
184
- number_of_images: Number of upscaled variations
185
- safety_filter_level: Safety filtering level
186
187
Returns:
188
UpscaleImageResponse: Response containing upscaled images.
189
190
Raises:
191
ClientError: For client errors (4xx status codes)
192
ServerError: For server errors (5xx status codes)
193
"""
194
...
195
196
async def upscale_image(
197
*,
198
model: str,
199
image: Image,
200
upscale_factor: str,
201
config: Optional[UpscaleImageConfig] = None
202
) -> UpscaleImageResponse:
203
"""Async version of upscale_image."""
204
...
205
```
206
207
**Usage Example:**
208
209
```python
210
from google.genai import Client
211
from google.genai.types import Image, UpscaleImageConfig
212
213
client = Client(vertexai=True, project='PROJECT_ID', location='us-central1')
214
215
image = Image.from_file('low_res.jpg')
216
217
config = UpscaleImageConfig(
218
number_of_images=1
219
)
220
221
response = client.models.upscale_image(
222
model='imagen-3.0-capability-001',
223
image=image,
224
upscale_factor='x4',
225
config=config
226
)
227
228
response.generated_images[0].pil_image.save('upscaled_4x.png')
229
```
230
231
### Recontext Image
232
233
Recontextualize product images by changing their backgrounds or settings while preserving the main subject.
234
235
```python { .api }
236
def recontext_image(
237
*,
238
model: str,
239
prompt: str,
240
source: RecontextImageSource,
241
config: Optional[RecontextImageConfig] = None
242
) -> RecontextImageResponse:
243
"""
244
Recontextualize a product image with a new background.
245
246
Parameters:
247
model (str): Model identifier (e.g., 'imagen-3.0-capability-001').
248
prompt (str): Description of the new context/background.
249
source (RecontextImageSource): Source containing product image and optional mask.
250
config (RecontextImageConfig, optional): Configuration including:
251
- number_of_images: Number of variations
252
- negative_prompt: What to avoid
253
- safety_filter_level: Safety filtering
254
255
Returns:
256
RecontextImageResponse: Response with recontextualized images.
257
"""
258
...
259
260
async def recontext_image(
261
*,
262
model: str,
263
prompt: str,
264
source: RecontextImageSource,
265
config: Optional[RecontextImageConfig] = None
266
) -> RecontextImageResponse:
267
"""Async version of recontext_image."""
268
...
269
```
270
271
### Segment Image
272
273
Perform semantic segmentation on images to identify and isolate different objects and regions.
274
275
```python { .api }
276
def segment_image(
277
*,
278
model: str,
279
source: SegmentImageSource,
280
config: Optional[SegmentImageConfig] = None
281
) -> SegmentImageResponse:
282
"""
283
Segment an image to identify objects and regions.
284
285
Parameters:
286
model (str): Model identifier (e.g., 'imagen-3.0-capability-001').
287
source (SegmentImageSource): Source image and optional entity labels.
288
config (SegmentImageConfig, optional): Segmentation configuration including:
289
- segment_mode: Segmentation mode (FOREGROUND, ENTITY)
290
291
Returns:
292
SegmentImageResponse: Response with segmentation masks.
293
"""
294
...
295
296
async def segment_image(
297
*,
298
model: str,
299
source: SegmentImageSource,
300
config: Optional[SegmentImageConfig] = None
301
) -> SegmentImageResponse:
302
"""Async version of segment_image."""
303
...
304
```
305
306
## Types
307
308
```python { .api }
309
from typing import Optional, List, Sequence, Literal
310
from enum import Enum
311
import PIL.Image
312
313
# Configuration types
314
class GenerateImagesConfig:
315
"""
316
Configuration for image generation.
317
318
Attributes:
319
number_of_images (int, optional): Number of images to generate (1-8). Default: 1.
320
aspect_ratio (str, optional): Aspect ratio ('1:1', '16:9', '9:16', '4:3', '3:4'). Default: '1:1'.
321
negative_prompt (str, optional): What to avoid in generated images.
322
safety_filter_level (str, optional): Safety filtering ('block_most', 'block_some', 'block_few').
323
person_generation (PersonGeneration, optional): Control person generation.
324
include_safety_attributes (bool, optional): Include safety attributes in response.
325
language (str, optional): Language of the prompt.
326
add_watermark (bool, optional): Add watermark to images.
327
"""
328
number_of_images: Optional[int] = None
329
aspect_ratio: Optional[str] = None
330
negative_prompt: Optional[str] = None
331
safety_filter_level: Optional[str] = None
332
person_generation: Optional[PersonGeneration] = None
333
include_safety_attributes: Optional[bool] = None
334
language: Optional[str] = None
335
add_watermark: Optional[bool] = None
336
337
class EditImageConfig:
338
"""
339
Configuration for image editing.
340
341
Attributes:
342
number_of_images (int, optional): Number of edited variations.
343
negative_prompt (str, optional): What to avoid.
344
edit_mode (EditMode, optional): Editing mode.
345
mask_mode (MaskMode, optional): Mask interpretation.
346
mask_dilation (float, optional): Mask dilation amount.
347
guidance_scale (float, optional): Prompt adherence (1-20).
348
safety_filter_level (str, optional): Safety filtering.
349
person_generation (PersonGeneration, optional): Person generation control.
350
include_safety_attributes (bool, optional): Include safety attributes.
351
"""
352
number_of_images: Optional[int] = None
353
negative_prompt: Optional[str] = None
354
edit_mode: Optional[EditMode] = None
355
mask_mode: Optional[MaskMode] = None
356
mask_dilation: Optional[float] = None
357
guidance_scale: Optional[float] = None
358
safety_filter_level: Optional[str] = None
359
person_generation: Optional[PersonGeneration] = None
360
include_safety_attributes: Optional[bool] = None
361
362
class UpscaleImageConfig:
363
"""
364
Configuration for image upscaling.
365
366
Attributes:
367
number_of_images (int, optional): Number of upscaled variations.
368
safety_filter_level (str, optional): Safety filtering.
369
include_safety_attributes (bool, optional): Include safety attributes.
370
"""
371
number_of_images: Optional[int] = None
372
safety_filter_level: Optional[str] = None
373
include_safety_attributes: Optional[bool] = None
374
375
class RecontextImageConfig:
376
"""Configuration for image recontextualization."""
377
number_of_images: Optional[int] = None
378
negative_prompt: Optional[str] = None
379
safety_filter_level: Optional[str] = None
380
include_safety_attributes: Optional[bool] = None
381
382
class SegmentImageConfig:
383
"""
384
Configuration for image segmentation.
385
386
Attributes:
387
segment_mode (SegmentMode, optional): Segmentation mode.
388
"""
389
segment_mode: Optional[SegmentMode] = None
390
391
# Response types
392
class GenerateImagesResponse:
393
"""
394
Response from image generation.
395
396
Attributes:
397
generated_images (list[GeneratedImage]): Generated images with metadata.
398
rai_filtered_reason (str, optional): Reason if filtered by safety.
399
"""
400
generated_images: list[GeneratedImage]
401
rai_filtered_reason: Optional[str] = None
402
403
class EditImageResponse:
404
"""Response from image editing."""
405
generated_images: list[GeneratedImage]
406
rai_filtered_reason: Optional[str] = None
407
408
class UpscaleImageResponse:
409
"""Response from image upscaling."""
410
generated_images: list[GeneratedImage]
411
rai_filtered_reason: Optional[str] = None
412
413
class RecontextImageResponse:
414
"""Response from image recontextualization."""
415
generated_images: list[GeneratedImage]
416
rai_filtered_reason: Optional[str] = None
417
418
class SegmentImageResponse:
419
"""
420
Response from image segmentation.
421
422
Attributes:
423
generated_masks (list[GeneratedImageMask]): Segmentation masks.
424
"""
425
generated_masks: list[GeneratedImageMask]
426
427
class GeneratedImage:
428
"""
429
Generated or edited image.
430
431
Attributes:
432
image (Image): Image object.
433
pil_image (PIL.Image.Image): PIL Image for easy manipulation.
434
generation_seed (int, optional): Seed used for generation.
435
rai_info (str, optional): RAI information.
436
safety_attributes (SafetyAttributes, optional): Safety attributes.
437
"""
438
image: Image
439
pil_image: PIL.Image.Image
440
generation_seed: Optional[int] = None
441
rai_info: Optional[str] = None
442
safety_attributes: Optional[SafetyAttributes] = None
443
444
class GeneratedImageMask:
445
"""
446
Segmentation mask.
447
448
Attributes:
449
mask_image (Image): Mask image.
450
entity_label (EntityLabel, optional): Entity label.
451
"""
452
mask_image: Image
453
entity_label: Optional[EntityLabel] = None
454
455
# Input types
456
class ReferenceImage:
457
"""
458
Reference image for editing.
459
460
Attributes:
461
reference_type (str): Type ('reference_image', 'mask_image', 'control_image', etc.).
462
reference_image (Image): The reference image.
463
reference_id (int, optional): Reference identifier.
464
"""
465
reference_type: str
466
reference_image: Image
467
reference_id: Optional[int] = None
468
469
class RecontextImageSource:
470
"""
471
Source for recontextualization.
472
473
Attributes:
474
product_image (Image): Product image.
475
product_mask (Image, optional): Product mask.
476
"""
477
product_image: Image
478
product_mask: Optional[Image] = None
479
480
class SegmentImageSource:
481
"""
482
Source for segmentation.
483
484
Attributes:
485
image (Image): Image to segment.
486
entity_labels (list[EntityLabel], optional): Entity labels to identify.
487
"""
488
image: Image
489
entity_labels: Optional[list[EntityLabel]] = None
490
491
class Image:
492
"""
493
Image data supporting multiple formats.
494
495
Can be created from:
496
- File: Image.from_file('path.jpg')
497
- URL: Image.from_url('https://...')
498
- Bytes: Image.from_bytes(data, mime_type='image/jpeg')
499
- PIL: Image.from_pil(pil_image)
500
"""
501
@staticmethod
502
def from_file(path: str) -> Image: ...
503
504
@staticmethod
505
def from_url(url: str) -> Image: ...
506
507
@staticmethod
508
def from_bytes(data: bytes, mime_type: str) -> Image: ...
509
510
@staticmethod
511
def from_pil(pil_image: PIL.Image.Image) -> Image: ...
512
513
# Enum types
514
class PersonGeneration(Enum):
515
"""Person generation control."""
516
DONT_ALLOW = 'dont_allow'
517
ALLOW_ADULT = 'allow_adult'
518
ALLOW_ALL = 'allow_all'
519
520
class EditMode(Enum):
521
"""Image editing modes."""
522
EDIT_MODE_UNSPECIFIED = 'EDIT_MODE_UNSPECIFIED'
523
INPAINT_INSERTION = 'INPAINT_INSERTION'
524
INPAINT_REMOVAL = 'INPAINT_REMOVAL'
525
OUTPAINT = 'OUTPAINT'
526
PRODUCT_IMAGE = 'PRODUCT_IMAGE'
527
528
class MaskMode(Enum):
529
"""Mask interpretation modes."""
530
MASK_MODE_UNSPECIFIED = 'MASK_MODE_UNSPECIFIED'
531
MASK_MODE_BACKGROUND = 'MASK_MODE_BACKGROUND'
532
MASK_MODE_FOREGROUND = 'MASK_MODE_FOREGROUND'
533
MASK_MODE_SEMANTIC = 'MASK_MODE_SEMANTIC'
534
535
class SegmentMode(Enum):
536
"""Segmentation modes."""
537
SEGMENT_MODE_UNSPECIFIED = 'SEGMENT_MODE_UNSPECIFIED'
538
FOREGROUND = 'FOREGROUND'
539
ENTITY = 'ENTITY'
540
541
class SafetyAttributes:
542
"""Safety attributes for images."""
543
blocked: bool
544
scores: dict[str, float]
545
546
class EntityLabel:
547
"""Entity label for segmentation."""
548
label: str
549
score: Optional[float] = None
550
```
551