tessl/pypi-tencentcloud-sdk-python-tmt

Tencent Cloud Machine Translation (TMT) SDK for Python providing comprehensive text, file, image, and speech translation capabilities

Overview

Eval results

Files

Image Translation

Name: tessl/pypi-tencentcloud-sdk-python-tmt
Author: tessl

OCR-based image translation for text content within images, supporting 13-18 languages with line-by-line translation capabilities. Two API endpoints provide different processing approaches: standard OCR translation and enhanced LLM-powered translation.

Capabilities

Standard Image Translation

Recognizes and translates text in images line by line for 13 languages using OCR technology. Suitable for documents, signs, and text-heavy images.

def ImageTranslate(self, request: models.ImageTranslateRequest) -> models.ImageTranslateResponse:
    """
    Translate text within images using OCR recognition.
    
    Args:
        request: ImageTranslateRequest with image data and parameters
        
    Returns:
        ImageTranslateResponse with translated text records and positions
        
    Raises:
        TencentCloudSDKException: For various error conditions
    """

Usage Example:

import base64
from tencentcloud.common import credential
from tencentcloud.tmt.v20180321.tmt_client import TmtClient
from tencentcloud.tmt.v20180321 import models

# Initialize client
cred = credential.Credential("SecretId", "SecretKey")
client = TmtClient(cred, "ap-beijing")

# Read and encode image
with open("document.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

# Create image translation request
req = models.ImageTranslateRequest()
req.SessionUuid = "unique-session-id"
req.Scene = "doc"  # Document scene
req.Data = image_data
req.Source = "en"
req.Target = "zh"
req.ProjectId = 0

# Perform image translation
resp = client.ImageTranslate(req)
print(f"Session: {resp.SessionUuid}")
print(f"Translation: {resp.Source} -> {resp.Target}")

# Process translated text records
for item in resp.ImageRecord.Value:
    print(f"Original: {item.SourceText}")
    print(f"Translated: {item.TargetText}")
    print(f"Position: ({item.X}, {item.Y}) {item.W}x{item.H}")

Enhanced LLM Image Translation

Advanced image translation for 18 languages using LLM technology, providing improved accuracy and context understanding.

def ImageTranslateLLM(self, request: models.ImageTranslateLLMRequest) -> models.ImageTranslateLLMResponse:
    """
    Translate text within images using enhanced LLM processing.
    
    Args:
        request: ImageTranslateLLMRequest with image data and parameters
        
    Returns:
        ImageTranslateLLMResponse with translated results and output image URL
        
    Raises:
        TencentCloudSDKException: For various error conditions
    """

Usage Example:

# Create enhanced image translation request
req = models.ImageTranslateLLMRequest()
req.Data = image_data  # Base64 encoded image
req.Target = "zh"
# Alternatively, use URL instead of Data:
# req.Url = "https://example.com/image.jpg"

# Perform enhanced translation
resp = client.ImageTranslateLLM(req)
print(f"Enhanced translation completed")
print(f"Source language: {resp.Source}")
print(f"Full source text: {resp.SourceText}")
print(f"Full translated text: {resp.TargetText}")

# Save result image
import base64
with open("translated_image.jpg", "wb") as f:
    f.write(base64.b64decode(resp.Data))

# Process translation details
for detail in resp.TransDetails:
    print(f"Line: {detail.SourceLineText} -> {detail.TargetLineText}")
    print(f"Position: ({detail.BoundingBox.X}, {detail.BoundingBox.Y})")

Request/Response Models

ImageTranslateRequest

class ImageTranslateRequest:
    """
    Request parameters for standard image translation.
    
    Attributes:
        SessionUuid (str): Unique session identifier
        Scene (str): Scene type (e.g., "doc" for documents)
        Data (str): Base64 encoded image data
        Source (str): Source language code
        Target (str): Target language code
        ProjectId (int): Project ID (default: 0)
    """

ImageTranslateResponse

class ImageTranslateResponse:
    """
    Response from standard image translation.
    
    Attributes:
        SessionUuid (str): Session identifier from request
        Source (str): Source language
        Target (str): Target language
        ImageRecord (ImageRecord): Image translation result
        RequestId (str): Unique request identifier
    """

ImageTranslateLLMRequest

class ImageTranslateLLMRequest:
    """
    Request parameters for enhanced LLM image translation.
    
    Attributes:
        Data (str): Base64 encoded image data (PNG, JPG, JPEG)
        Target (str): Target language code
        Url (str): Image URL (alternative to Data)
    """

ImageTranslateLLMResponse

class ImageTranslateLLMResponse:
    """
    Response from enhanced LLM image translation.
    
    Attributes:
        Data (str): Base64 encoded result image (JPG format)
        Source (str): Detected source language
        Target (str): Target language
        SourceText (str): All original text from image
        TargetText (str): All translated text
        Angle (float): Image rotation angle (0-359 degrees)
        TransDetails (list[TransDetail]): Translation detail information
        RequestId (str): Unique request identifier
    """

ImageRecord

class ImageRecord:
    """
    Image translation record container.
    
    Attributes:
        Value (list[ItemValue]): List of translated text items with positions
    """

ItemValue

class ItemValue:
    """
    Individual translated text item with position information.
    
    Attributes:
        SourceText (str): Original text
        TargetText (str): Translated text
        X (int): X coordinate
        Y (int): Y coordinate
        W (int): Width
        H (int): Height
    """

TransDetail

class TransDetail:
    """
    LLM translation detail for each text line.
    
    Attributes:
        SourceLineText (str): Original line text
        TargetLineText (str): Translated line text
        BoundingBox (BoundingBox): Text position and dimensions
        LinesCount (int): Number of lines
        LineHeight (int): Line height in pixels
        SpamCode (int): Content safety check result (0=normal)
    """

BoundingBox

class BoundingBox:
    """
    Bounding box coordinates for text positioning.
    
    Attributes:
        X (int): Left edge X coordinate
        Y (int): Top edge Y coordinate  
        Width (int): Box width in pixels
        Height (int): Box height in pixels
    """

Supported Image Formats

Input Formats (Both APIs)

PNG: Portable Network Graphics
JPG/JPEG: Joint Photographic Experts Group

Output Formats

Standard API: Text records with position data
LLM API: JPG image with translated text + text records

Language Support

Standard Image Translation (13 languages)

Core language support for document translation:

Chinese (zh, zh-TW, zh-HK, zh-TR)
English (en), Japanese (ja), Korean (ko)
European: French (fr), German (de), Spanish (es), Italian (it)
Others: Russian (ru), Arabic (ar)

Enhanced LLM Translation (18 languages)

Extended language support with improved accuracy:

All standard languages plus additional coverage
Better context understanding for complex layouts
Improved handling of mixed-language content

Scene Types

Document Scene ("doc")

Optimized for:

Text documents and PDFs
Business documents
Academic papers
Technical documentation
Forms and contracts

General Scene

Suitable for:

Street signs and signage
Product labels
Handwritten notes
Mixed content images

Best Practices

Image Quality

Use high-resolution images (minimum 300 DPI recommended)
Ensure good contrast between text and background
Avoid blurry or distorted images
Minimize image compression artifacts

Text Layout

Works best with horizontal text layouts
Supports line-by-line processing
Handles multiple text blocks per image
Preserves relative positioning information

API Selection

Use ImageTranslate for: Simple document translation, cost-sensitive applications
Use ImageTranslateLLM for: Complex layouts, mixed languages, higher accuracy requirements

Error Handling

Common error scenarios for image translation:

FAILEDOPERATION_DOWNLOADERR: Image data processing error
FAILEDOPERATION_LANGUAGERECOGNITIONERR: Language detection failure
UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE: Language pair not supported
INVALIDPARAMETER: Invalid image data or parameters

Example error handling:

try:
    resp = client.ImageTranslate(req)
    for record in resp.ImageRecord:
        print(f"Translated: {record.Value}")
except TencentCloudSDKException as e:
    if e.code == "FAILEDOPERATION_LANGUAGERECOGNITIONERR":
        print("Could not detect text in image")
    elif e.code == "UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE":
        print("Language pair not supported for image translation")
    else:
        print(f"Image translation error: {e.code} - {e.message}")

Install with Tessl CLI

npx tessl i tessl/pypi-tencentcloud-sdk-python-tmt

docs

file-translation.md

image-translation.md

index.md

speech-translation.md

text-translation.md

tile.json

tessl/pypi-tencentcloud-sdk-python-tmt