Tessl Tile for pypi/azure-cognitiveservices-vision-computervision@0.9.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

area-of-interest.md domain-analysis.md image-analysis.md image-description.md image-tagging.md index.md object-detection.md ocr-text-recognition.md thumbnail-generation.md

image-description.mddocs/

0
# Image Description
1

2
Generate human-readable descriptions of image content in complete English sentences. The service analyzes visual content and creates natural language descriptions that capture the main elements, activities, and context within images.
3

4
## Capabilities
5

6
### Image Description Generation
7

8
Create natural language descriptions of image content with confidence scores and multiple description candidates.
9

10
```python { .api }
11
def describe_image(url, max_candidates=None, language="en", description_exclude=None, model_version="latest", custom_headers=None, raw=False, **operation_config):
12
    """
13
    Generate human-readable description of image content.
14
    
15
    Args:
16
        url (str): Publicly reachable URL of an image
17
        max_candidates (int, optional): Maximum number of description candidates to return (default: 1)
18
        language (str, optional): Output language for descriptions. 
19
            Supported: "en", "es", "ja", "pt", "zh". Default: "en"
20
        description_exclude (list[DescriptionExclude], optional): Domain models to exclude.
21
            Available values: Celebrities, Landmarks
22
        model_version (str, optional): AI model version. Default: "latest"
23
        custom_headers (dict, optional): Custom HTTP headers
24
        raw (bool, optional): Return raw response. Default: False
25
        
26
    Returns:
27
        ImageDescription: Generated descriptions with confidence scores and tags
28
        
29
    Raises:
30
        ComputerVisionErrorResponseException: API error occurred
31
    """
32

33
def describe_image_in_stream(image, max_candidates=None, language="en", description_exclude=None, model_version="latest", custom_headers=None, raw=False, **operation_config):
34
    """
35
    Generate description from binary image stream.
36
    
37
    Args:
38
        image (Generator): Binary image data stream
39
        max_candidates (int, optional): Maximum description candidates
40
        language (str, optional): Output language
41
        description_exclude (list[DescriptionExclude], optional): Domain models to exclude
42
        
43
    Returns:
44
        ImageDescription: Generated descriptions and metadata
45
    """
46
```
47

48
## Usage Examples
49

50
### Basic Image Description
51

52
```python
53
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
54
from msrest.authentication import CognitiveServicesCredentials
55

56
# Initialize client
57
credentials = CognitiveServicesCredentials("your-api-key")
58
client = ComputerVisionClient("https://your-endpoint.cognitiveservices.azure.com/", credentials)
59

60
# Generate description for image
61
image_url = "https://example.com/park-scene.jpg"
62
description_result = client.describe_image(image_url)
63

64
# Get the best description
65
if description_result.captions:
66
    best_caption = description_result.captions[0]
67
    print(f"Description: {best_caption.text}")
68
    print(f"Confidence: {best_caption.confidence:.3f}")
69

70
# Show related tags
71
print(f"\nRelated tags:")
72
for tag in description_result.tags:
73
    print(f"  - {tag}")
74
```
75

76
### Multiple Description Candidates
77

78
```python
79
# Get multiple description candidates
80
image_url = "https://example.com/complex-scene.jpg"
81
description_result = client.describe_image(
82
    image_url, 
83
    max_candidates=3  # Get up to 3 different descriptions
84
)
85

86
print("Description candidates:")
87
for i, caption in enumerate(description_result.captions, 1):
88
    print(f"{i}. {caption.text} (confidence: {caption.confidence:.3f})")
89

90
# Choose description with highest confidence
91
best_caption = max(description_result.captions, key=lambda c: c.confidence)
92
print(f"\nBest description: {best_caption.text}")
93
```
94

95
### Multilingual Descriptions
96

97
```python
98
# Generate descriptions in different languages
99
image_url = "https://example.com/street-scene.jpg"
100

101
languages = ["en", "es", "ja"]
102
descriptions = {}
103

104
for lang in languages:
105
    try:
106
        result = client.describe_image(image_url, language=lang)
107
        if result.captions:
108
            descriptions[lang] = result.captions[0].text
109
    except Exception as e:
110
        print(f"Failed to get description in {lang}: {e}")
111

112
# Display results
113
for lang, description in descriptions.items():
114
    print(f"{lang}: {description}")
115
```
116

117
### Description from Local File
118

119
```python
120
# Generate description from local image
121
with open("vacation_photo.jpg", "rb") as image_stream:
122
    description_result = client.describe_image_in_stream(
123
        image_stream,
124
        max_candidates=2
125
    )
126
    
127
    print("Descriptions:")
128
    for caption in description_result.captions:
129
        print(f"  {caption.text} (confidence: {caption.confidence:.3f})")
130
    
131
    print("\nDetected elements:")
132
    for tag in description_result.tags:
133
        print(f"  - {tag}")
134
```
135

136
### Excluding Domain Models
137

138
```python
139
from azure.cognitiveservices.vision.computervision.models import DescriptionExclude
140

141
# Generate description excluding celebrity and landmark information
142
image_url = "https://example.com/tourist-photo.jpg"
143
description_result = client.describe_image(
144
    image_url,
145
    description_exclude=[DescriptionExclude.celebrities, DescriptionExclude.landmarks]
146
)
147

148
# This will focus on general scene description rather than identifying specific people or places
149
for caption in description_result.captions:
150
    print(f"General description: {caption.text}")
151
```
152

153
### Batch Description Processing
154

155
```python
156
# Process multiple images for descriptions
157
image_urls = [
158
    "https://example.com/image1.jpg",
159
    "https://example.com/image2.jpg", 
160
    "https://example.com/image3.jpg"
161
]
162

163
descriptions = []
164
for i, url in enumerate(image_urls):
165
    try:
166
        result = client.describe_image(url)
167
        if result.captions:
168
            descriptions.append({
169
                'url': url,
170
                'description': result.captions[0].text,
171
                'confidence': result.captions[0].confidence,
172
                'tags': result.tags
173
            })
174
        print(f"Processed image {i+1}/{len(image_urls)}")
175
    except Exception as e:
176
        print(f"Error processing {url}: {e}")
177

178
# Display results
179
for desc in descriptions:
180
    print(f"\nImage: {desc['url']}")
181
    print(f"Description: {desc['description']}")
182
    print(f"Confidence: {desc['confidence']:.3f}")
183
    print(f"Tags: {', '.join(desc['tags'][:5])}")  # Show first 5 tags
184
```
185

186
## Response Data Types
187

188
### ImageDescription
189

190
```python { .api }
191
class ImageDescription:
192
    """
193
    Image description generation result.
194
    
195
    Attributes:
196
        tags (list[str]): Descriptive tags related to image content
197
        captions (list[ImageCaption]): Generated description candidates with confidence scores
198
        description_details (ImageDescriptionDetails): Additional description metadata
199
        request_id (str): Request identifier
200
        metadata (ImageMetadata): Image metadata (dimensions, format)
201
        model_version (str): AI model version used
202
    """
203
```
204

205
### ImageCaption
206

207
```python { .api }
208
class ImageCaption:
209
    """
210
    Generated image caption with confidence score.
211
    
212
    Attributes:
213
        text (str): Natural language description of the image
214
        confidence (float): Confidence score for the description (0.0 to 1.0)
215
    """
216
```
217

218
### ImageDescriptionDetails
219

220
```python { .api }
221
class ImageDescriptionDetails:
222
    """
223
    Additional details about the description generation process.
224
    
225
    Attributes:
226
        tags (list[str]): Extended list of descriptive tags
227
        celebrities (list): Celebrity information (if applicable)
228
        landmarks (list): Landmark information (if applicable)
229
    """
230
```
231

232
### ImageMetadata
233

234
```python { .api }
235
class ImageMetadata:
236
    """
237
    Image metadata information.
238
    
239
    Attributes:
240
        height (int): Image height in pixels
241
        width (int): Image width in pixels
242
        format (str): Image format (e.g., "Jpeg", "Png")
243
    """
244
```
245

246
## Language Support
247

248
The description service supports multiple languages for output:
249

250
- **English (en)**: Full feature support, highest accuracy
251
- **Spanish (es)**: Complete descriptions with good accuracy  
252
- **Japanese (ja)**: Natural language descriptions
253
- **Portuguese (pt)**: Comprehensive description capability
254
- **Chinese (zh)**: Simplified Chinese descriptions
255

256
English typically provides the most detailed and accurate descriptions, while other languages may have varying levels of detail and accuracy.
257

258
## Description Quality
259

260
### Confidence Scores
261

262
- **High confidence (0.8-1.0)**: Very reliable descriptions, captures main scene elements accurately
263
- **Medium confidence (0.5-0.8)**: Generally accurate, may miss some details or nuances
264
- **Low confidence (0.0-0.5)**: Basic description, may be vague or incomplete
265

266
### Typical Description Elements
267

268
The service typically includes:
269

270
- **Main subjects**: People, animals, prominent objects
271
- **Settings/locations**: Indoor/outdoor, specific environments (kitchen, park, office)
272
- **Activities**: Actions being performed (sitting, walking, playing)
273
- **Relationships**: Spatial relationships between objects
274
- **Colors and appearance**: Dominant colors, notable visual characteristics
275
- **Atmosphere**: General mood or scene type (busy, peaceful, formal)
276

277
### Best Practices
278

279
- Use multiple candidates (`max_candidates > 1`) for important applications
280
- Check confidence scores to assess description reliability
281
- Combine with tags for additional context and details
282
- Consider the target audience when selecting description candidates
283
- For critical applications, use the highest confidence description

Version

Tile

Files

image-description.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

image-description.mddocs/