CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-tencentcloud-sdk-python-tmt

Tencent Cloud Machine Translation (TMT) SDK for Python providing comprehensive text, file, image, and speech translation capabilities

Overview
Eval results
Files

image-translation.mddocs/

Image Translation

OCR-based image translation for text content within images, supporting 13-18 languages with line-by-line translation capabilities. Two API endpoints provide different processing approaches: standard OCR translation and enhanced LLM-powered translation.

Capabilities

Standard Image Translation

Recognizes and translates text in images line by line for 13 languages using OCR technology. Suitable for documents, signs, and text-heavy images.

def ImageTranslate(self, request: models.ImageTranslateRequest) -> models.ImageTranslateResponse:
    """
    Translate text within images using OCR recognition.
    
    Args:
        request: ImageTranslateRequest with image data and parameters
        
    Returns:
        ImageTranslateResponse with translated text records and positions
        
    Raises:
        TencentCloudSDKException: For various error conditions
    """

Usage Example:

import base64
from tencentcloud.common import credential
from tencentcloud.tmt.v20180321.tmt_client import TmtClient
from tencentcloud.tmt.v20180321 import models

# Initialize client
cred = credential.Credential("SecretId", "SecretKey")
client = TmtClient(cred, "ap-beijing")

# Read and encode image
with open("document.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

# Create image translation request
req = models.ImageTranslateRequest()
req.SessionUuid = "unique-session-id"
req.Scene = "doc"  # Document scene
req.Data = image_data
req.Source = "en"
req.Target = "zh"
req.ProjectId = 0

# Perform image translation
resp = client.ImageTranslate(req)
print(f"Session: {resp.SessionUuid}")
print(f"Translation: {resp.Source} -> {resp.Target}")

# Process translated text records
for item in resp.ImageRecord.Value:
    print(f"Original: {item.SourceText}")
    print(f"Translated: {item.TargetText}")
    print(f"Position: ({item.X}, {item.Y}) {item.W}x{item.H}")

Enhanced LLM Image Translation

Advanced image translation for 18 languages using LLM technology, providing improved accuracy and context understanding.

def ImageTranslateLLM(self, request: models.ImageTranslateLLMRequest) -> models.ImageTranslateLLMResponse:
    """
    Translate text within images using enhanced LLM processing.
    
    Args:
        request: ImageTranslateLLMRequest with image data and parameters
        
    Returns:
        ImageTranslateLLMResponse with translated results and output image URL
        
    Raises:
        TencentCloudSDKException: For various error conditions
    """

Usage Example:

# Create enhanced image translation request
req = models.ImageTranslateLLMRequest()
req.Data = image_data  # Base64 encoded image
req.Target = "zh"
# Alternatively, use URL instead of Data:
# req.Url = "https://example.com/image.jpg"

# Perform enhanced translation
resp = client.ImageTranslateLLM(req)
print(f"Enhanced translation completed")
print(f"Source language: {resp.Source}")
print(f"Full source text: {resp.SourceText}")
print(f"Full translated text: {resp.TargetText}")

# Save result image
import base64
with open("translated_image.jpg", "wb") as f:
    f.write(base64.b64decode(resp.Data))

# Process translation details
for detail in resp.TransDetails:
    print(f"Line: {detail.SourceLineText} -> {detail.TargetLineText}")
    print(f"Position: ({detail.BoundingBox.X}, {detail.BoundingBox.Y})")

Request/Response Models

ImageTranslateRequest

class ImageTranslateRequest:
    """
    Request parameters for standard image translation.
    
    Attributes:
        SessionUuid (str): Unique session identifier
        Scene (str): Scene type (e.g., "doc" for documents)
        Data (str): Base64 encoded image data
        Source (str): Source language code
        Target (str): Target language code
        ProjectId (int): Project ID (default: 0)
    """

ImageTranslateResponse

class ImageTranslateResponse:
    """
    Response from standard image translation.
    
    Attributes:
        SessionUuid (str): Session identifier from request
        Source (str): Source language
        Target (str): Target language
        ImageRecord (ImageRecord): Image translation result
        RequestId (str): Unique request identifier
    """

ImageTranslateLLMRequest

class ImageTranslateLLMRequest:
    """
    Request parameters for enhanced LLM image translation.
    
    Attributes:
        Data (str): Base64 encoded image data (PNG, JPG, JPEG)
        Target (str): Target language code
        Url (str): Image URL (alternative to Data)
    """

ImageTranslateLLMResponse

class ImageTranslateLLMResponse:
    """
    Response from enhanced LLM image translation.
    
    Attributes:
        Data (str): Base64 encoded result image (JPG format)
        Source (str): Detected source language
        Target (str): Target language
        SourceText (str): All original text from image
        TargetText (str): All translated text
        Angle (float): Image rotation angle (0-359 degrees)
        TransDetails (list[TransDetail]): Translation detail information
        RequestId (str): Unique request identifier
    """

ImageRecord

class ImageRecord:
    """
    Image translation record container.
    
    Attributes:
        Value (list[ItemValue]): List of translated text items with positions
    """

ItemValue

class ItemValue:
    """
    Individual translated text item with position information.
    
    Attributes:
        SourceText (str): Original text
        TargetText (str): Translated text
        X (int): X coordinate
        Y (int): Y coordinate
        W (int): Width
        H (int): Height
    """

TransDetail

class TransDetail:
    """
    LLM translation detail for each text line.
    
    Attributes:
        SourceLineText (str): Original line text
        TargetLineText (str): Translated line text
        BoundingBox (BoundingBox): Text position and dimensions
        LinesCount (int): Number of lines
        LineHeight (int): Line height in pixels
        SpamCode (int): Content safety check result (0=normal)
    """

BoundingBox

class BoundingBox:
    """
    Bounding box coordinates for text positioning.
    
    Attributes:
        X (int): Left edge X coordinate
        Y (int): Top edge Y coordinate  
        Width (int): Box width in pixels
        Height (int): Box height in pixels
    """

Supported Image Formats

Input Formats (Both APIs)

  • PNG: Portable Network Graphics
  • JPG/JPEG: Joint Photographic Experts Group

Output Formats

  • Standard API: Text records with position data
  • LLM API: JPG image with translated text + text records

Language Support

Standard Image Translation (13 languages)

Core language support for document translation:

  • Chinese (zh, zh-TW, zh-HK, zh-TR)
  • English (en), Japanese (ja), Korean (ko)
  • European: French (fr), German (de), Spanish (es), Italian (it)
  • Others: Russian (ru), Arabic (ar)

Enhanced LLM Translation (18 languages)

Extended language support with improved accuracy:

  • All standard languages plus additional coverage
  • Better context understanding for complex layouts
  • Improved handling of mixed-language content

Scene Types

Document Scene ("doc")

Optimized for:

  • Text documents and PDFs
  • Business documents
  • Academic papers
  • Technical documentation
  • Forms and contracts

General Scene

Suitable for:

  • Street signs and signage
  • Product labels
  • Handwritten notes
  • Mixed content images

Best Practices

Image Quality

  • Use high-resolution images (minimum 300 DPI recommended)
  • Ensure good contrast between text and background
  • Avoid blurry or distorted images
  • Minimize image compression artifacts

Text Layout

  • Works best with horizontal text layouts
  • Supports line-by-line processing
  • Handles multiple text blocks per image
  • Preserves relative positioning information

API Selection

  • Use ImageTranslate for: Simple document translation, cost-sensitive applications
  • Use ImageTranslateLLM for: Complex layouts, mixed languages, higher accuracy requirements

Error Handling

Common error scenarios for image translation:

  • FAILEDOPERATION_DOWNLOADERR: Image data processing error
  • FAILEDOPERATION_LANGUAGERECOGNITIONERR: Language detection failure
  • UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE: Language pair not supported
  • INVALIDPARAMETER: Invalid image data or parameters

Example error handling:

try:
    resp = client.ImageTranslate(req)
    for record in resp.ImageRecord:
        print(f"Translated: {record.Value}")
except TencentCloudSDKException as e:
    if e.code == "FAILEDOPERATION_LANGUAGERECOGNITIONERR":
        print("Could not detect text in image")
    elif e.code == "UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE":
        print("Language pair not supported for image translation")
    else:
        print(f"Image translation error: {e.code} - {e.message}")

Install with Tessl CLI

npx tessl i tessl/pypi-tencentcloud-sdk-python-tmt@3.0.1

docs

file-translation.md

image-translation.md

index.md

speech-translation.md

text-translation.md

tile.json