0
# Image Description
1
2
Generate human-readable descriptions of image content in complete English sentences. The service analyzes visual content and creates natural language descriptions that capture the main elements, activities, and context within images.
3
4
## Capabilities
5
6
### Image Description Generation
7
8
Create natural language descriptions of image content with confidence scores and multiple description candidates.
9
10
```python { .api }
11
def describe_image(url, max_candidates=None, language="en", description_exclude=None, model_version="latest", custom_headers=None, raw=False, **operation_config):
12
"""
13
Generate human-readable description of image content.
14
15
Args:
16
url (str): Publicly reachable URL of an image
17
max_candidates (int, optional): Maximum number of description candidates to return (default: 1)
18
language (str, optional): Output language for descriptions.
19
Supported: "en", "es", "ja", "pt", "zh". Default: "en"
20
description_exclude (list[DescriptionExclude], optional): Domain models to exclude.
21
Available values: Celebrities, Landmarks
22
model_version (str, optional): AI model version. Default: "latest"
23
custom_headers (dict, optional): Custom HTTP headers
24
raw (bool, optional): Return raw response. Default: False
25
26
Returns:
27
ImageDescription: Generated descriptions with confidence scores and tags
28
29
Raises:
30
ComputerVisionErrorResponseException: API error occurred
31
"""
32
33
def describe_image_in_stream(image, max_candidates=None, language="en", description_exclude=None, model_version="latest", custom_headers=None, raw=False, **operation_config):
34
"""
35
Generate description from binary image stream.
36
37
Args:
38
image (Generator): Binary image data stream
39
max_candidates (int, optional): Maximum description candidates
40
language (str, optional): Output language
41
description_exclude (list[DescriptionExclude], optional): Domain models to exclude
42
43
Returns:
44
ImageDescription: Generated descriptions and metadata
45
"""
46
```
47
48
## Usage Examples
49
50
### Basic Image Description
51
52
```python
53
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
54
from msrest.authentication import CognitiveServicesCredentials
55
56
# Initialize client
57
credentials = CognitiveServicesCredentials("your-api-key")
58
client = ComputerVisionClient("https://your-endpoint.cognitiveservices.azure.com/", credentials)
59
60
# Generate description for image
61
image_url = "https://example.com/park-scene.jpg"
62
description_result = client.describe_image(image_url)
63
64
# Get the best description
65
if description_result.captions:
66
best_caption = description_result.captions[0]
67
print(f"Description: {best_caption.text}")
68
print(f"Confidence: {best_caption.confidence:.3f}")
69
70
# Show related tags
71
print(f"\nRelated tags:")
72
for tag in description_result.tags:
73
print(f" - {tag}")
74
```
75
76
### Multiple Description Candidates
77
78
```python
79
# Get multiple description candidates
80
image_url = "https://example.com/complex-scene.jpg"
81
description_result = client.describe_image(
82
image_url,
83
max_candidates=3 # Get up to 3 different descriptions
84
)
85
86
print("Description candidates:")
87
for i, caption in enumerate(description_result.captions, 1):
88
print(f"{i}. {caption.text} (confidence: {caption.confidence:.3f})")
89
90
# Choose description with highest confidence
91
best_caption = max(description_result.captions, key=lambda c: c.confidence)
92
print(f"\nBest description: {best_caption.text}")
93
```
94
95
### Multilingual Descriptions
96
97
```python
98
# Generate descriptions in different languages
99
image_url = "https://example.com/street-scene.jpg"
100
101
languages = ["en", "es", "ja"]
102
descriptions = {}
103
104
for lang in languages:
105
try:
106
result = client.describe_image(image_url, language=lang)
107
if result.captions:
108
descriptions[lang] = result.captions[0].text
109
except Exception as e:
110
print(f"Failed to get description in {lang}: {e}")
111
112
# Display results
113
for lang, description in descriptions.items():
114
print(f"{lang}: {description}")
115
```
116
117
### Description from Local File
118
119
```python
120
# Generate description from local image
121
with open("vacation_photo.jpg", "rb") as image_stream:
122
description_result = client.describe_image_in_stream(
123
image_stream,
124
max_candidates=2
125
)
126
127
print("Descriptions:")
128
for caption in description_result.captions:
129
print(f" {caption.text} (confidence: {caption.confidence:.3f})")
130
131
print("\nDetected elements:")
132
for tag in description_result.tags:
133
print(f" - {tag}")
134
```
135
136
### Excluding Domain Models
137
138
```python
139
from azure.cognitiveservices.vision.computervision.models import DescriptionExclude
140
141
# Generate description excluding celebrity and landmark information
142
image_url = "https://example.com/tourist-photo.jpg"
143
description_result = client.describe_image(
144
image_url,
145
description_exclude=[DescriptionExclude.celebrities, DescriptionExclude.landmarks]
146
)
147
148
# This will focus on general scene description rather than identifying specific people or places
149
for caption in description_result.captions:
150
print(f"General description: {caption.text}")
151
```
152
153
### Batch Description Processing
154
155
```python
156
# Process multiple images for descriptions
157
image_urls = [
158
"https://example.com/image1.jpg",
159
"https://example.com/image2.jpg",
160
"https://example.com/image3.jpg"
161
]
162
163
descriptions = []
164
for i, url in enumerate(image_urls):
165
try:
166
result = client.describe_image(url)
167
if result.captions:
168
descriptions.append({
169
'url': url,
170
'description': result.captions[0].text,
171
'confidence': result.captions[0].confidence,
172
'tags': result.tags
173
})
174
print(f"Processed image {i+1}/{len(image_urls)}")
175
except Exception as e:
176
print(f"Error processing {url}: {e}")
177
178
# Display results
179
for desc in descriptions:
180
print(f"\nImage: {desc['url']}")
181
print(f"Description: {desc['description']}")
182
print(f"Confidence: {desc['confidence']:.3f}")
183
print(f"Tags: {', '.join(desc['tags'][:5])}") # Show first 5 tags
184
```
185
186
## Response Data Types
187
188
### ImageDescription
189
190
```python { .api }
191
class ImageDescription:
192
"""
193
Image description generation result.
194
195
Attributes:
196
tags (list[str]): Descriptive tags related to image content
197
captions (list[ImageCaption]): Generated description candidates with confidence scores
198
description_details (ImageDescriptionDetails): Additional description metadata
199
request_id (str): Request identifier
200
metadata (ImageMetadata): Image metadata (dimensions, format)
201
model_version (str): AI model version used
202
"""
203
```
204
205
### ImageCaption
206
207
```python { .api }
208
class ImageCaption:
209
"""
210
Generated image caption with confidence score.
211
212
Attributes:
213
text (str): Natural language description of the image
214
confidence (float): Confidence score for the description (0.0 to 1.0)
215
"""
216
```
217
218
### ImageDescriptionDetails
219
220
```python { .api }
221
class ImageDescriptionDetails:
222
"""
223
Additional details about the description generation process.
224
225
Attributes:
226
tags (list[str]): Extended list of descriptive tags
227
celebrities (list): Celebrity information (if applicable)
228
landmarks (list): Landmark information (if applicable)
229
"""
230
```
231
232
### ImageMetadata
233
234
```python { .api }
235
class ImageMetadata:
236
"""
237
Image metadata information.
238
239
Attributes:
240
height (int): Image height in pixels
241
width (int): Image width in pixels
242
format (str): Image format (e.g., "Jpeg", "Png")
243
"""
244
```
245
246
## Language Support
247
248
The description service supports multiple languages for output:
249
250
- **English (en)**: Full feature support, highest accuracy
251
- **Spanish (es)**: Complete descriptions with good accuracy
252
- **Japanese (ja)**: Natural language descriptions
253
- **Portuguese (pt)**: Comprehensive description capability
254
- **Chinese (zh)**: Simplified Chinese descriptions
255
256
English typically provides the most detailed and accurate descriptions, while other languages may have varying levels of detail and accuracy.
257
258
## Description Quality
259
260
### Confidence Scores
261
262
- **High confidence (0.8-1.0)**: Very reliable descriptions, captures main scene elements accurately
263
- **Medium confidence (0.5-0.8)**: Generally accurate, may miss some details or nuances
264
- **Low confidence (0.0-0.5)**: Basic description, may be vague or incomplete
265
266
### Typical Description Elements
267
268
The service typically includes:
269
270
- **Main subjects**: People, animals, prominent objects
271
- **Settings/locations**: Indoor/outdoor, specific environments (kitchen, park, office)
272
- **Activities**: Actions being performed (sitting, walking, playing)
273
- **Relationships**: Spatial relationships between objects
274
- **Colors and appearance**: Dominant colors, notable visual characteristics
275
- **Atmosphere**: General mood or scene type (busy, peaceful, formal)
276
277
### Best Practices
278
279
- Use multiple candidates (`max_candidates > 1`) for important applications
280
- Check confidence scores to assess description reliability
281
- Combine with tags for additional context and details
282
- Consider the target audience when selecting description candidates
283
- For critical applications, use the highest confidence description