0
# Image Translation
1
2
OCR-based image translation for text content within images, supporting 13-18 languages with line-by-line translation capabilities. Two API endpoints provide different processing approaches: standard OCR translation and enhanced LLM-powered translation.
3
4
## Capabilities
5
6
### Standard Image Translation
7
8
Recognizes and translates text in images line by line for 13 languages using OCR technology. Suitable for documents, signs, and text-heavy images.
9
10
```python { .api }
11
def ImageTranslate(self, request: models.ImageTranslateRequest) -> models.ImageTranslateResponse:
12
"""
13
Translate text within images using OCR recognition.
14
15
Args:
16
request: ImageTranslateRequest with image data and parameters
17
18
Returns:
19
ImageTranslateResponse with translated text records and positions
20
21
Raises:
22
TencentCloudSDKException: For various error conditions
23
"""
24
```
25
26
**Usage Example:**
27
28
```python
29
import base64
30
from tencentcloud.common import credential
31
from tencentcloud.tmt.v20180321.tmt_client import TmtClient
32
from tencentcloud.tmt.v20180321 import models
33
34
# Initialize client
35
cred = credential.Credential("SecretId", "SecretKey")
36
client = TmtClient(cred, "ap-beijing")
37
38
# Read and encode image
39
with open("document.png", "rb") as f:
40
image_data = base64.b64encode(f.read()).decode()
41
42
# Create image translation request
43
req = models.ImageTranslateRequest()
44
req.SessionUuid = "unique-session-id"
45
req.Scene = "doc" # Document scene
46
req.Data = image_data
47
req.Source = "en"
48
req.Target = "zh"
49
req.ProjectId = 0
50
51
# Perform image translation
52
resp = client.ImageTranslate(req)
53
print(f"Session: {resp.SessionUuid}")
54
print(f"Translation: {resp.Source} -> {resp.Target}")
55
56
# Process translated text records
57
for item in resp.ImageRecord.Value:
58
print(f"Original: {item.SourceText}")
59
print(f"Translated: {item.TargetText}")
60
print(f"Position: ({item.X}, {item.Y}) {item.W}x{item.H}")
61
```
62
63
### Enhanced LLM Image Translation
64
65
Advanced image translation for 18 languages using LLM technology, providing improved accuracy and context understanding.
66
67
```python { .api }
68
def ImageTranslateLLM(self, request: models.ImageTranslateLLMRequest) -> models.ImageTranslateLLMResponse:
69
"""
70
Translate text within images using enhanced LLM processing.
71
72
Args:
73
request: ImageTranslateLLMRequest with image data and parameters
74
75
Returns:
76
ImageTranslateLLMResponse with translated results and output image URL
77
78
Raises:
79
TencentCloudSDKException: For various error conditions
80
"""
81
```
82
83
**Usage Example:**
84
85
```python
86
# Create enhanced image translation request
87
req = models.ImageTranslateLLMRequest()
88
req.Data = image_data # Base64 encoded image
89
req.Target = "zh"
90
# Alternatively, use URL instead of Data:
91
# req.Url = "https://example.com/image.jpg"
92
93
# Perform enhanced translation
94
resp = client.ImageTranslateLLM(req)
95
print(f"Enhanced translation completed")
96
print(f"Source language: {resp.Source}")
97
print(f"Full source text: {resp.SourceText}")
98
print(f"Full translated text: {resp.TargetText}")
99
100
# Save result image
101
import base64
102
with open("translated_image.jpg", "wb") as f:
103
f.write(base64.b64decode(resp.Data))
104
105
# Process translation details
106
for detail in resp.TransDetails:
107
print(f"Line: {detail.SourceLineText} -> {detail.TargetLineText}")
108
print(f"Position: ({detail.BoundingBox.X}, {detail.BoundingBox.Y})")
109
```
110
111
## Request/Response Models
112
113
### ImageTranslateRequest
114
115
```python { .api }
116
class ImageTranslateRequest:
117
"""
118
Request parameters for standard image translation.
119
120
Attributes:
121
SessionUuid (str): Unique session identifier
122
Scene (str): Scene type (e.g., "doc" for documents)
123
Data (str): Base64 encoded image data
124
Source (str): Source language code
125
Target (str): Target language code
126
ProjectId (int): Project ID (default: 0)
127
"""
128
```
129
130
### ImageTranslateResponse
131
132
```python { .api }
133
class ImageTranslateResponse:
134
"""
135
Response from standard image translation.
136
137
Attributes:
138
SessionUuid (str): Session identifier from request
139
Source (str): Source language
140
Target (str): Target language
141
ImageRecord (ImageRecord): Image translation result
142
RequestId (str): Unique request identifier
143
"""
144
```
145
146
### ImageTranslateLLMRequest
147
148
```python { .api }
149
class ImageTranslateLLMRequest:
150
"""
151
Request parameters for enhanced LLM image translation.
152
153
Attributes:
154
Data (str): Base64 encoded image data (PNG, JPG, JPEG)
155
Target (str): Target language code
156
Url (str): Image URL (alternative to Data)
157
"""
158
```
159
160
### ImageTranslateLLMResponse
161
162
```python { .api }
163
class ImageTranslateLLMResponse:
164
"""
165
Response from enhanced LLM image translation.
166
167
Attributes:
168
Data (str): Base64 encoded result image (JPG format)
169
Source (str): Detected source language
170
Target (str): Target language
171
SourceText (str): All original text from image
172
TargetText (str): All translated text
173
Angle (float): Image rotation angle (0-359 degrees)
174
TransDetails (list[TransDetail]): Translation detail information
175
RequestId (str): Unique request identifier
176
"""
177
```
178
179
### ImageRecord
180
181
```python { .api }
182
class ImageRecord:
183
"""
184
Image translation record container.
185
186
Attributes:
187
Value (list[ItemValue]): List of translated text items with positions
188
"""
189
```
190
191
### ItemValue
192
193
```python { .api }
194
class ItemValue:
195
"""
196
Individual translated text item with position information.
197
198
Attributes:
199
SourceText (str): Original text
200
TargetText (str): Translated text
201
X (int): X coordinate
202
Y (int): Y coordinate
203
W (int): Width
204
H (int): Height
205
"""
206
```
207
208
### TransDetail
209
210
```python { .api }
211
class TransDetail:
212
"""
213
LLM translation detail for each text line.
214
215
Attributes:
216
SourceLineText (str): Original line text
217
TargetLineText (str): Translated line text
218
BoundingBox (BoundingBox): Text position and dimensions
219
LinesCount (int): Number of lines
220
LineHeight (int): Line height in pixels
221
SpamCode (int): Content safety check result (0=normal)
222
"""
223
```
224
225
### BoundingBox
226
227
```python { .api }
228
class BoundingBox:
229
"""
230
Bounding box coordinates for text positioning.
231
232
Attributes:
233
X (int): Left edge X coordinate
234
Y (int): Top edge Y coordinate
235
Width (int): Box width in pixels
236
Height (int): Box height in pixels
237
"""
238
```
239
240
## Supported Image Formats
241
242
### Input Formats (Both APIs)
243
- **PNG**: Portable Network Graphics
244
- **JPG/JPEG**: Joint Photographic Experts Group
245
246
### Output Formats
247
- **Standard API**: Text records with position data
248
- **LLM API**: JPG image with translated text + text records
249
250
## Language Support
251
252
### Standard Image Translation (13 languages)
253
Core language support for document translation:
254
- Chinese (zh, zh-TW, zh-HK, zh-TR)
255
- English (en), Japanese (ja), Korean (ko)
256
- European: French (fr), German (de), Spanish (es), Italian (it)
257
- Others: Russian (ru), Arabic (ar)
258
259
### Enhanced LLM Translation (18 languages)
260
Extended language support with improved accuracy:
261
- All standard languages plus additional coverage
262
- Better context understanding for complex layouts
263
- Improved handling of mixed-language content
264
265
## Scene Types
266
267
### Document Scene ("doc")
268
Optimized for:
269
- Text documents and PDFs
270
- Business documents
271
- Academic papers
272
- Technical documentation
273
- Forms and contracts
274
275
### General Scene
276
Suitable for:
277
- Street signs and signage
278
- Product labels
279
- Handwritten notes
280
- Mixed content images
281
282
## Best Practices
283
284
### Image Quality
285
- Use high-resolution images (minimum 300 DPI recommended)
286
- Ensure good contrast between text and background
287
- Avoid blurry or distorted images
288
- Minimize image compression artifacts
289
290
### Text Layout
291
- Works best with horizontal text layouts
292
- Supports line-by-line processing
293
- Handles multiple text blocks per image
294
- Preserves relative positioning information
295
296
### API Selection
297
- **Use ImageTranslate** for: Simple document translation, cost-sensitive applications
298
- **Use ImageTranslateLLM** for: Complex layouts, mixed languages, higher accuracy requirements
299
300
## Error Handling
301
302
Common error scenarios for image translation:
303
304
- **FAILEDOPERATION_DOWNLOADERR**: Image data processing error
305
- **FAILEDOPERATION_LANGUAGERECOGNITIONERR**: Language detection failure
306
- **UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE**: Language pair not supported
307
- **INVALIDPARAMETER**: Invalid image data or parameters
308
309
Example error handling:
310
311
```python
312
try:
313
resp = client.ImageTranslate(req)
314
for record in resp.ImageRecord:
315
print(f"Translated: {record.Value}")
316
except TencentCloudSDKException as e:
317
if e.code == "FAILEDOPERATION_LANGUAGERECOGNITIONERR":
318
print("Could not detect text in image")
319
elif e.code == "UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE":
320
print("Language pair not supported for image translation")
321
else:
322
print(f"Image translation error: {e.code} - {e.message}")
323
```