or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

file-translation.mdimage-translation.mdindex.mdspeech-translation.mdtext-translation.md

image-translation.mddocs/

0

# Image Translation

1

2

OCR-based image translation for text content within images, supporting 13-18 languages with line-by-line translation capabilities. Two API endpoints provide different processing approaches: standard OCR translation and enhanced LLM-powered translation.

3

4

## Capabilities

5

6

### Standard Image Translation

7

8

Recognizes and translates text in images line by line for 13 languages using OCR technology. Suitable for documents, signs, and text-heavy images.

9

10

```python { .api }

11

def ImageTranslate(self, request: models.ImageTranslateRequest) -> models.ImageTranslateResponse:

12

"""

13

Translate text within images using OCR recognition.

14

15

Args:

16

request: ImageTranslateRequest with image data and parameters

17

18

Returns:

19

ImageTranslateResponse with translated text records and positions

20

21

Raises:

22

TencentCloudSDKException: For various error conditions

23

"""

24

```

25

26

**Usage Example:**

27

28

```python

29

import base64

30

from tencentcloud.common import credential

31

from tencentcloud.tmt.v20180321.tmt_client import TmtClient

32

from tencentcloud.tmt.v20180321 import models

33

34

# Initialize client

35

cred = credential.Credential("SecretId", "SecretKey")

36

client = TmtClient(cred, "ap-beijing")

37

38

# Read and encode image

39

with open("document.png", "rb") as f:

40

image_data = base64.b64encode(f.read()).decode()

41

42

# Create image translation request

43

req = models.ImageTranslateRequest()

44

req.SessionUuid = "unique-session-id"

45

req.Scene = "doc" # Document scene

46

req.Data = image_data

47

req.Source = "en"

48

req.Target = "zh"

49

req.ProjectId = 0

50

51

# Perform image translation

52

resp = client.ImageTranslate(req)

53

print(f"Session: {resp.SessionUuid}")

54

print(f"Translation: {resp.Source} -> {resp.Target}")

55

56

# Process translated text records

57

for item in resp.ImageRecord.Value:

58

print(f"Original: {item.SourceText}")

59

print(f"Translated: {item.TargetText}")

60

print(f"Position: ({item.X}, {item.Y}) {item.W}x{item.H}")

61

```

62

63

### Enhanced LLM Image Translation

64

65

Advanced image translation for 18 languages using LLM technology, providing improved accuracy and context understanding.

66

67

```python { .api }

68

def ImageTranslateLLM(self, request: models.ImageTranslateLLMRequest) -> models.ImageTranslateLLMResponse:

69

"""

70

Translate text within images using enhanced LLM processing.

71

72

Args:

73

request: ImageTranslateLLMRequest with image data and parameters

74

75

Returns:

76

ImageTranslateLLMResponse with translated results and output image URL

77

78

Raises:

79

TencentCloudSDKException: For various error conditions

80

"""

81

```

82

83

**Usage Example:**

84

85

```python

86

# Create enhanced image translation request

87

req = models.ImageTranslateLLMRequest()

88

req.Data = image_data # Base64 encoded image

89

req.Target = "zh"

90

# Alternatively, use URL instead of Data:

91

# req.Url = "https://example.com/image.jpg"

92

93

# Perform enhanced translation

94

resp = client.ImageTranslateLLM(req)

95

print(f"Enhanced translation completed")

96

print(f"Source language: {resp.Source}")

97

print(f"Full source text: {resp.SourceText}")

98

print(f"Full translated text: {resp.TargetText}")

99

100

# Save result image

101

import base64

102

with open("translated_image.jpg", "wb") as f:

103

f.write(base64.b64decode(resp.Data))

104

105

# Process translation details

106

for detail in resp.TransDetails:

107

print(f"Line: {detail.SourceLineText} -> {detail.TargetLineText}")

108

print(f"Position: ({detail.BoundingBox.X}, {detail.BoundingBox.Y})")

109

```

110

111

## Request/Response Models

112

113

### ImageTranslateRequest

114

115

```python { .api }

116

class ImageTranslateRequest:

117

"""

118

Request parameters for standard image translation.

119

120

Attributes:

121

SessionUuid (str): Unique session identifier

122

Scene (str): Scene type (e.g., "doc" for documents)

123

Data (str): Base64 encoded image data

124

Source (str): Source language code

125

Target (str): Target language code

126

ProjectId (int): Project ID (default: 0)

127

"""

128

```

129

130

### ImageTranslateResponse

131

132

```python { .api }

133

class ImageTranslateResponse:

134

"""

135

Response from standard image translation.

136

137

Attributes:

138

SessionUuid (str): Session identifier from request

139

Source (str): Source language

140

Target (str): Target language

141

ImageRecord (ImageRecord): Image translation result

142

RequestId (str): Unique request identifier

143

"""

144

```

145

146

### ImageTranslateLLMRequest

147

148

```python { .api }

149

class ImageTranslateLLMRequest:

150

"""

151

Request parameters for enhanced LLM image translation.

152

153

Attributes:

154

Data (str): Base64 encoded image data (PNG, JPG, JPEG)

155

Target (str): Target language code

156

Url (str): Image URL (alternative to Data)

157

"""

158

```

159

160

### ImageTranslateLLMResponse

161

162

```python { .api }

163

class ImageTranslateLLMResponse:

164

"""

165

Response from enhanced LLM image translation.

166

167

Attributes:

168

Data (str): Base64 encoded result image (JPG format)

169

Source (str): Detected source language

170

Target (str): Target language

171

SourceText (str): All original text from image

172

TargetText (str): All translated text

173

Angle (float): Image rotation angle (0-359 degrees)

174

TransDetails (list[TransDetail]): Translation detail information

175

RequestId (str): Unique request identifier

176

"""

177

```

178

179

### ImageRecord

180

181

```python { .api }

182

class ImageRecord:

183

"""

184

Image translation record container.

185

186

Attributes:

187

Value (list[ItemValue]): List of translated text items with positions

188

"""

189

```

190

191

### ItemValue

192

193

```python { .api }

194

class ItemValue:

195

"""

196

Individual translated text item with position information.

197

198

Attributes:

199

SourceText (str): Original text

200

TargetText (str): Translated text

201

X (int): X coordinate

202

Y (int): Y coordinate

203

W (int): Width

204

H (int): Height

205

"""

206

```

207

208

### TransDetail

209

210

```python { .api }

211

class TransDetail:

212

"""

213

LLM translation detail for each text line.

214

215

Attributes:

216

SourceLineText (str): Original line text

217

TargetLineText (str): Translated line text

218

BoundingBox (BoundingBox): Text position and dimensions

219

LinesCount (int): Number of lines

220

LineHeight (int): Line height in pixels

221

SpamCode (int): Content safety check result (0=normal)

222

"""

223

```

224

225

### BoundingBox

226

227

```python { .api }

228

class BoundingBox:

229

"""

230

Bounding box coordinates for text positioning.

231

232

Attributes:

233

X (int): Left edge X coordinate

234

Y (int): Top edge Y coordinate

235

Width (int): Box width in pixels

236

Height (int): Box height in pixels

237

"""

238

```

239

240

## Supported Image Formats

241

242

### Input Formats (Both APIs)

243

- **PNG**: Portable Network Graphics

244

- **JPG/JPEG**: Joint Photographic Experts Group

245

246

### Output Formats

247

- **Standard API**: Text records with position data

248

- **LLM API**: JPG image with translated text + text records

249

250

## Language Support

251

252

### Standard Image Translation (13 languages)

253

Core language support for document translation:

254

- Chinese (zh, zh-TW, zh-HK, zh-TR)

255

- English (en), Japanese (ja), Korean (ko)

256

- European: French (fr), German (de), Spanish (es), Italian (it)

257

- Others: Russian (ru), Arabic (ar)

258

259

### Enhanced LLM Translation (18 languages)

260

Extended language support with improved accuracy:

261

- All standard languages plus additional coverage

262

- Better context understanding for complex layouts

263

- Improved handling of mixed-language content

264

265

## Scene Types

266

267

### Document Scene ("doc")

268

Optimized for:

269

- Text documents and PDFs

270

- Business documents

271

- Academic papers

272

- Technical documentation

273

- Forms and contracts

274

275

### General Scene

276

Suitable for:

277

- Street signs and signage

278

- Product labels

279

- Handwritten notes

280

- Mixed content images

281

282

## Best Practices

283

284

### Image Quality

285

- Use high-resolution images (minimum 300 DPI recommended)

286

- Ensure good contrast between text and background

287

- Avoid blurry or distorted images

288

- Minimize image compression artifacts

289

290

### Text Layout

291

- Works best with horizontal text layouts

292

- Supports line-by-line processing

293

- Handles multiple text blocks per image

294

- Preserves relative positioning information

295

296

### API Selection

297

- **Use ImageTranslate** for: Simple document translation, cost-sensitive applications

298

- **Use ImageTranslateLLM** for: Complex layouts, mixed languages, higher accuracy requirements

299

300

## Error Handling

301

302

Common error scenarios for image translation:

303

304

- **FAILEDOPERATION_DOWNLOADERR**: Image data processing error

305

- **FAILEDOPERATION_LANGUAGERECOGNITIONERR**: Language detection failure

306

- **UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE**: Language pair not supported

307

- **INVALIDPARAMETER**: Invalid image data or parameters

308

309

Example error handling:

310

311

```python

312

try:

313

resp = client.ImageTranslate(req)

314

for record in resp.ImageRecord:

315

print(f"Translated: {record.Value}")

316

except TencentCloudSDKException as e:

317

if e.code == "FAILEDOPERATION_LANGUAGERECOGNITIONERR":

318

print("Could not detect text in image")

319

elif e.code == "UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE":

320

print("Language pair not supported for image translation")

321

else:

322

print(f"Image translation error: {e.code} - {e.message}")

323

```