or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

area-of-interest.mddomain-analysis.mdimage-analysis.mdimage-description.mdimage-tagging.mdindex.mdobject-detection.mdocr-text-recognition.mdthumbnail-generation.md

image-description.mddocs/

0

# Image Description

1

2

Generate human-readable descriptions of image content in complete English sentences. The service analyzes visual content and creates natural language descriptions that capture the main elements, activities, and context within images.

3

4

## Capabilities

5

6

### Image Description Generation

7

8

Create natural language descriptions of image content with confidence scores and multiple description candidates.

9

10

```python { .api }

11

def describe_image(url, max_candidates=None, language="en", description_exclude=None, model_version="latest", custom_headers=None, raw=False, **operation_config):

12

"""

13

Generate human-readable description of image content.

14

15

Args:

16

url (str): Publicly reachable URL of an image

17

max_candidates (int, optional): Maximum number of description candidates to return (default: 1)

18

language (str, optional): Output language for descriptions.

19

Supported: "en", "es", "ja", "pt", "zh". Default: "en"

20

description_exclude (list[DescriptionExclude], optional): Domain models to exclude.

21

Available values: Celebrities, Landmarks

22

model_version (str, optional): AI model version. Default: "latest"

23

custom_headers (dict, optional): Custom HTTP headers

24

raw (bool, optional): Return raw response. Default: False

25

26

Returns:

27

ImageDescription: Generated descriptions with confidence scores and tags

28

29

Raises:

30

ComputerVisionErrorResponseException: API error occurred

31

"""

32

33

def describe_image_in_stream(image, max_candidates=None, language="en", description_exclude=None, model_version="latest", custom_headers=None, raw=False, **operation_config):

34

"""

35

Generate description from binary image stream.

36

37

Args:

38

image (Generator): Binary image data stream

39

max_candidates (int, optional): Maximum description candidates

40

language (str, optional): Output language

41

description_exclude (list[DescriptionExclude], optional): Domain models to exclude

42

43

Returns:

44

ImageDescription: Generated descriptions and metadata

45

"""

46

```

47

48

## Usage Examples

49

50

### Basic Image Description

51

52

```python

53

from azure.cognitiveservices.vision.computervision import ComputerVisionClient

54

from msrest.authentication import CognitiveServicesCredentials

55

56

# Initialize client

57

credentials = CognitiveServicesCredentials("your-api-key")

58

client = ComputerVisionClient("https://your-endpoint.cognitiveservices.azure.com/", credentials)

59

60

# Generate description for image

61

image_url = "https://example.com/park-scene.jpg"

62

description_result = client.describe_image(image_url)

63

64

# Get the best description

65

if description_result.captions:

66

best_caption = description_result.captions[0]

67

print(f"Description: {best_caption.text}")

68

print(f"Confidence: {best_caption.confidence:.3f}")

69

70

# Show related tags

71

print(f"\nRelated tags:")

72

for tag in description_result.tags:

73

print(f" - {tag}")

74

```

75

76

### Multiple Description Candidates

77

78

```python

79

# Get multiple description candidates

80

image_url = "https://example.com/complex-scene.jpg"

81

description_result = client.describe_image(

82

image_url,

83

max_candidates=3 # Get up to 3 different descriptions

84

)

85

86

print("Description candidates:")

87

for i, caption in enumerate(description_result.captions, 1):

88

print(f"{i}. {caption.text} (confidence: {caption.confidence:.3f})")

89

90

# Choose description with highest confidence

91

best_caption = max(description_result.captions, key=lambda c: c.confidence)

92

print(f"\nBest description: {best_caption.text}")

93

```

94

95

### Multilingual Descriptions

96

97

```python

98

# Generate descriptions in different languages

99

image_url = "https://example.com/street-scene.jpg"

100

101

languages = ["en", "es", "ja"]

102

descriptions = {}

103

104

for lang in languages:

105

try:

106

result = client.describe_image(image_url, language=lang)

107

if result.captions:

108

descriptions[lang] = result.captions[0].text

109

except Exception as e:

110

print(f"Failed to get description in {lang}: {e}")

111

112

# Display results

113

for lang, description in descriptions.items():

114

print(f"{lang}: {description}")

115

```

116

117

### Description from Local File

118

119

```python

120

# Generate description from local image

121

with open("vacation_photo.jpg", "rb") as image_stream:

122

description_result = client.describe_image_in_stream(

123

image_stream,

124

max_candidates=2

125

)

126

127

print("Descriptions:")

128

for caption in description_result.captions:

129

print(f" {caption.text} (confidence: {caption.confidence:.3f})")

130

131

print("\nDetected elements:")

132

for tag in description_result.tags:

133

print(f" - {tag}")

134

```

135

136

### Excluding Domain Models

137

138

```python

139

from azure.cognitiveservices.vision.computervision.models import DescriptionExclude

140

141

# Generate description excluding celebrity and landmark information

142

image_url = "https://example.com/tourist-photo.jpg"

143

description_result = client.describe_image(

144

image_url,

145

description_exclude=[DescriptionExclude.celebrities, DescriptionExclude.landmarks]

146

)

147

148

# This will focus on general scene description rather than identifying specific people or places

149

for caption in description_result.captions:

150

print(f"General description: {caption.text}")

151

```

152

153

### Batch Description Processing

154

155

```python

156

# Process multiple images for descriptions

157

image_urls = [

158

"https://example.com/image1.jpg",

159

"https://example.com/image2.jpg",

160

"https://example.com/image3.jpg"

161

]

162

163

descriptions = []

164

for i, url in enumerate(image_urls):

165

try:

166

result = client.describe_image(url)

167

if result.captions:

168

descriptions.append({

169

'url': url,

170

'description': result.captions[0].text,

171

'confidence': result.captions[0].confidence,

172

'tags': result.tags

173

})

174

print(f"Processed image {i+1}/{len(image_urls)}")

175

except Exception as e:

176

print(f"Error processing {url}: {e}")

177

178

# Display results

179

for desc in descriptions:

180

print(f"\nImage: {desc['url']}")

181

print(f"Description: {desc['description']}")

182

print(f"Confidence: {desc['confidence']:.3f}")

183

print(f"Tags: {', '.join(desc['tags'][:5])}") # Show first 5 tags

184

```

185

186

## Response Data Types

187

188

### ImageDescription

189

190

```python { .api }

191

class ImageDescription:

192

"""

193

Image description generation result.

194

195

Attributes:

196

tags (list[str]): Descriptive tags related to image content

197

captions (list[ImageCaption]): Generated description candidates with confidence scores

198

description_details (ImageDescriptionDetails): Additional description metadata

199

request_id (str): Request identifier

200

metadata (ImageMetadata): Image metadata (dimensions, format)

201

model_version (str): AI model version used

202

"""

203

```

204

205

### ImageCaption

206

207

```python { .api }

208

class ImageCaption:

209

"""

210

Generated image caption with confidence score.

211

212

Attributes:

213

text (str): Natural language description of the image

214

confidence (float): Confidence score for the description (0.0 to 1.0)

215

"""

216

```

217

218

### ImageDescriptionDetails

219

220

```python { .api }

221

class ImageDescriptionDetails:

222

"""

223

Additional details about the description generation process.

224

225

Attributes:

226

tags (list[str]): Extended list of descriptive tags

227

celebrities (list): Celebrity information (if applicable)

228

landmarks (list): Landmark information (if applicable)

229

"""

230

```

231

232

### ImageMetadata

233

234

```python { .api }

235

class ImageMetadata:

236

"""

237

Image metadata information.

238

239

Attributes:

240

height (int): Image height in pixels

241

width (int): Image width in pixels

242

format (str): Image format (e.g., "Jpeg", "Png")

243

"""

244

```

245

246

## Language Support

247

248

The description service supports multiple languages for output:

249

250

- **English (en)**: Full feature support, highest accuracy

251

- **Spanish (es)**: Complete descriptions with good accuracy

252

- **Japanese (ja)**: Natural language descriptions

253

- **Portuguese (pt)**: Comprehensive description capability

254

- **Chinese (zh)**: Simplified Chinese descriptions

255

256

English typically provides the most detailed and accurate descriptions, while other languages may have varying levels of detail and accuracy.

257

258

## Description Quality

259

260

### Confidence Scores

261

262

- **High confidence (0.8-1.0)**: Very reliable descriptions, captures main scene elements accurately

263

- **Medium confidence (0.5-0.8)**: Generally accurate, may miss some details or nuances

264

- **Low confidence (0.0-0.5)**: Basic description, may be vague or incomplete

265

266

### Typical Description Elements

267

268

The service typically includes:

269

270

- **Main subjects**: People, animals, prominent objects

271

- **Settings/locations**: Indoor/outdoor, specific environments (kitchen, park, office)

272

- **Activities**: Actions being performed (sitting, walking, playing)

273

- **Relationships**: Spatial relationships between objects

274

- **Colors and appearance**: Dominant colors, notable visual characteristics

275

- **Atmosphere**: General mood or scene type (busy, peaceful, formal)

276

277

### Best Practices

278

279

- Use multiple candidates (`max_candidates > 1`) for important applications

280

- Check confidence scores to assess description reliability

281

- Combine with tags for additional context and details

282

- Consider the target audience when selecting description candidates

283

- For critical applications, use the highest confidence description