Guide for implementing Google Gemini API image generation - create high-quality images from text prompts using gemini-2.5-flash-image model. Use when generating images, creating visual content, or implementing text-to-image features. Supports text-to-image, image editing, multi-image composition, and iterative refinement.
Overall
score
17%
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
Generate high-quality images using Google's Gemini 2.5 Flash Image model with text prompts, image editing, and multi-image composition capabilities.
Use this skill when you need to:
The skill automatically detects your GEMINI_API_KEY in this order:
export GEMINI_API_KEY="your-key".claude/skills/gemini-image-gen/.env./.env (project root)Get your API key: Visit Google AI Studio
Create .env file with:
GEMINI_API_KEY=your_api_key_hereInstall required package:
pip install google-genaifrom google import genai
from google.genai import types
import os
# API key detection handled automatically by helper script
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A serene mountain landscape at sunset with snow-capped peaks',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
# Save to ./docs/assets/
for i, part in enumerate(response.candidates[0].content.parts):
if part.inline_data:
with open(f'./docs/assets/generated-{i}.png', 'wb') as f:
f.write(part.inline_data.data)For convenience, use the provided helper script that handles API key detection and file saving:
# Generate single image
python .claude/skills/gemini-image-gen/scripts/generate.py \
"A futuristic city with flying cars" \
--aspect-ratio 16:9 \
--output ./docs/assets/city.png
# Generate with specific modalities
python .claude/skills/gemini-image-gen/scripts/generate.py \
"Modern architecture design" \
--response-modalities image text \
--aspect-ratio 1:1| Ratio | Resolution | Use Case | Token Cost |
|---|---|---|---|
| 1:1 | 1024×1024 | Social media, avatars | 1290 |
| 16:9 | 1344×768 | Landscapes, banners | 1290 |
| 9:16 | 768×1344 | Mobile, portraits | 1290 |
| 4:3 | 1152×896 | Traditional media | 1290 |
| 3:4 | 896×1152 | Vertical posters | 1290 |
['image']: Generate only images['text']: Generate only text descriptions['image', 'text']: Generate both images and descriptionsProvide existing image + text instructions to modify:
import PIL.Image
img = PIL.Image.open('original.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add a red balloon floating in the sky',
img
]
)Combine up to 3 source images (recommended):
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Combine these images into a cohesive scene',
img1,
img2
]
)Structure effective prompts with three elements:
Example: "A robot in a futuristic city, cyberpunk style with neon lighting and rain-slicked streets"
Quality modifiers:
Text in images:
See references/prompting-guide.md for comprehensive prompt engineering strategies.
The model includes adjustable safety filters. Configure per-request:
config = types.GenerateContentConfig(
response_modalities=['image'],
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
)
]
)See references/safety-settings.md for detailed configuration options.
All generated images should be saved to ./docs/assets/ directory:
# Create directory if needed
mkdir -p ./docs/assetsThe helper script automatically saves to this location with timestamped filenames.
Model: gemini-2.5-flash-image
Common issues and solutions:
API key not found:
# Check environment variables
echo $GEMINI_API_KEY
# Verify .env file exists
cat .claude/skills/gemini-image-gen/.env
# or
cat .envSafety filter blocking:
response.prompt_feedback.block_reasonToken limit exceeded:
For detailed information, see:
references/api-reference.md - Complete API specificationsreferences/prompting-guide.md - Advanced prompt engineeringreferences/safety-settings.md - Safety configuration detailsreferences/code-examples.md - Additional implementation examplesIf you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.