Tessl Tile for pypi/tencentcloud-sdk-python@3.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

file-translation.md image-translation.md index.md speech-translation.md text-translation.md

speech-translation.mddocs/

0
# Speech Translation
1

2
Audio translation combining speech recognition and translation for Chinese-English bidirectional processing. Supports both streaming and batch audio processing with multiple audio format compatibility.
3

4
## Capabilities
5

6
### Audio Translation
7

8
Recognizes speech in audio files and translates the recognized text to the target language. Supports real-time streaming and batch processing modes.
9

10
```python { .api }
11
def SpeechTranslate(self, request: models.SpeechTranslateRequest) -> models.SpeechTranslateResponse:
12
    """
13
    Translate speech audio to text in target language.
14
    
15
    Args:
16
        request: SpeechTranslateRequest with audio data and parameters
17
        
18
    Returns:
19
        SpeechTranslateResponse with translated text result
20
        
21
    Raises:
22
        TencentCloudSDKException: For various error conditions
23
    """
24
```
25

26
**Usage Example (Single Audio File):**
27

28
```python
29
import base64
30
from tencentcloud.common import credential
31
from tencentcloud.tmt.v20180321.tmt_client import TmtClient
32
from tencentcloud.tmt.v20180321 import models
33

34
# Initialize client
35
cred = credential.Credential("SecretId", "SecretKey")
36
client = TmtClient(cred, "ap-beijing")
37

38
# Read and encode audio file
39
with open("speech.wav", "rb") as f:
40
    audio_data = base64.b64encode(f.read()).decode()
41

42
# Create speech translation request
43
req = models.SpeechTranslateRequest()
44
req.SessionUuid = "unique-session-id"
45
req.Source = "zh"  # Chinese input
46
req.Target = "en"  # English output
47
req.AudioFormat = 1  # PCM format
48
req.Data = audio_data
49
req.Seq = 0  # Sequence number
50
req.IsEnd = 1  # Single file, mark as end
51
req.ProjectId = 0
52

53
# Perform speech translation
54
resp = client.SpeechTranslate(req)
55
print(f"Session: {resp.SessionUuid}")
56
print(f"Translation: {resp.Source} -> {resp.Target}")
57
print(f"Original: {resp.SourceText}")
58
print(f"Translated: {resp.TargetText}")
59
print(f"Recognition status: {resp.RecognizeStatus}")
60
```
61

62
**Usage Example (Streaming Audio):**
63

64
```python
65
def stream_audio_translation(client, audio_chunks, session_uuid):
66
    """
67
    Process streaming audio chunks for real-time translation.
68
    
69
    Args:
70
        client: TmtClient instance
71
        audio_chunks: List of audio data chunks (200-500ms each)
72
        session_uuid: Unique session identifier
73
    
74
    Returns:
75
        List of translation results
76
    """
77
    results = []
78
    
79
    for i, chunk in enumerate(audio_chunks):
80
        req = models.SpeechTranslateRequest()
81
        req.SessionUuid = session_uuid
82
        req.Source = "en"
83
        req.Target = "zh"
84
        req.AudioFormat = 1  # PCM only for streaming
85
        req.Data = base64.b64encode(chunk).decode()
86
        req.Seq = i
87
        req.IsEnd = 1 if i == len(audio_chunks) - 1 else 0
88
        req.ProjectId = 0
89
        
90
        try:
91
            resp = client.SpeechTranslate(req)
92
            if resp.TargetText:
93
                results.append(resp.TargetText)
94
                print(f"Chunk {i}: {resp.SourceText} -> {resp.TargetText}")
95
        except Exception as e:
96
            print(f"Error processing chunk {i}: {e}")
97
            
98
    return results
99

100
# Example usage
101
session_id = "streaming-session-001"
102
# audio_chunks would be your segmented audio data
103
# results = stream_audio_translation(client, audio_chunks, session_id)
104
```
105

106
## Request/Response Models
107

108
### SpeechTranslateRequest
109

110
```python { .api }
111
class SpeechTranslateRequest:
112
    """
113
    Request parameters for speech translation.
114
    
115
    Attributes:
116
        SessionUuid (str): Unique session identifier for tracking
117
        Source (str): Source language code (zh, en)
118
        Target (str): Target language code (zh, en)
119
        AudioFormat (int): Audio format (1: PCM, 2: MP3, 3: SPEEX)
120
        Data (str): Base64 encoded audio data
121
        Seq (int): Sequence number for streaming (starts from 0)
122
        IsEnd (int): End flag (0: more chunks, 1: final chunk)
123
        ProjectId (int): Project ID (default: 0)
124
    """
125
```
126

127
### SpeechTranslateResponse
128

129
```python { .api }
130
class SpeechTranslateResponse:
131
    """
132
    Response from speech translation.
133
    
134
    Attributes:
135
        SessionUuid (str): Session identifier from request
136
        RecognizeStatus (int): Speech recognition status (1=processing, 0=complete)
137
        SourceText (str): Recognized original text
138
        TargetText (str): Translated text result
139
        Seq (int): Audio fragment sequence number
140
        Source (str): Source language
141
        Target (str): Target language
142
        VadSeq (int): Voice activity detection sequence number
143
        RequestId (str): Unique request identifier
144
    """
145
```
146

147
## Supported Audio Formats
148

149
### Format Specifications
150

151
**PCM (Format ID: 1)**
152
- **Sampling Rate**: 16kHz
153
- **Bit Depth**: 16-bit
154
- **Channels**: Mono (single channel)
155
- **Streaming Support**: Yes (required for real-time)
156
- **Chunk Duration**: 200-500ms per chunk
157
- **Use Case**: Real-time streaming translation
158

159
**MP3 (Format ID: 2)**  
160
- **Streaming Support**: No (batch only)
161
- **Max Duration**: 8 seconds
162
- **Use Case**: Pre-recorded audio files
163
- **Quality**: Variable bitrate supported
164

165
**SPEEX (Format ID: 3)**
166
- **Streaming Support**: No (batch only) 
167
- **Max Duration**: 8 seconds
168
- **Use Case**: Compressed voice recordings
169
- **Quality**: Optimized for speech
170

171
## Language Support
172

173
Speech translation currently supports **Chinese-English bidirectional translation**:
174

175
### Supported Language Pairs
176
- **Chinese to English**: zh → en
177
- **English to Chinese**: en → zh
178

179
### Language Codes
180
- **zh**: Simplified Chinese (Mandarin)
181
- **en**: English
182

183
## Processing Modes
184

185
### Streaming Mode (PCM only)
186
- Real-time processing of audio chunks
187
- 200-500ms chunk duration recommended
188
- Sequential processing with Seq numbering
189
- IsEnd=1 for final chunk
190
- Immediate translation results
191

192
### Batch Mode (All formats)
193
- Single audio file processing
194
- Maximum 8 seconds duration (MP3, SPEEX)
195
- No duration limit for PCM
196
- IsEnd=1, Seq=0 for single file
197
- Complete translation after processing
198

199
## Audio Quality Requirements
200

201
### Clear Speech
202
- Minimal background noise
203
- Clear pronunciation
204
- Avoid overlapping speakers
205
- Consistent volume levels
206

207
### Technical Requirements
208
- Proper sampling rate (16kHz for PCM)
209
- Adequate bit depth (16-bit minimum)
210
- Stable audio stream without dropouts
211
- Proper audio encoding
212

213
## Session Management
214

215
### Session UUID
216
- Unique identifier for each translation session
217
- Required for tracking streaming sessions
218
- Use consistent UUID across all chunks in a session
219
- Helps correlate results with audio input
220

221
### Sequence Numbers
222
- Start from 0 for first chunk
223
- Increment by 1 for each subsequent chunk
224
- Used for proper ordering in streaming mode
225
- Critical for maintaining audio continuity
226

227
## Error Handling
228

229
Common error scenarios for speech translation:
230

231
- **UNSUPPORTEDOPERATION_AUDIODURATIONEXCEED**: Audio exceeds maximum duration
232
- **UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE**: Language pair not supported  
233
- **FAILEDOPERATION_REQUESTAILABERR**: Audio processing failure
234
- **INVALIDPARAMETER_SEQINTERVALTOOLARGE**: Invalid sequence numbering
235
- **INVALIDPARAMETER_DUPLICATEDSESSIONIDANDSEQ**: Duplicate session/sequence
236

237
Example error handling:
238

239
```python
240
def safe_speech_translate(client, request):
241
    """Safely perform speech translation with error handling."""
242
    try:
243
        response = client.SpeechTranslate(request)
244
        return response.TargetText
245
    except TencentCloudSDKException as e:
246
        if e.code == "UNSUPPORTEDOPERATION_AUDIODURATIONEXCEED":
247
            print("Audio file too long, split into smaller chunks")
248
        elif e.code == "UNSUPPORTEDOPERATION_UNSUPPORTEDLANGUAGE":
249
            print("Language pair not supported, use zh<->en only")
250
        elif e.code == "FAILEDOPERATION_REQUESTAILABERR":
251
            print("Audio processing failed, check audio quality")
252
        else:
253
            print(f"Speech translation error: {e.code} - {e.message}")
254
        return None
255

256
# Usage
257
result = safe_speech_translate(client, req)
258
if result:
259
    print(f"Translation: {result}")
260
```
261

262
## Best Practices
263

264
### Audio Preparation
265
- Use high-quality recording equipment
266
- Record in quiet environments
267
- Maintain consistent speaking pace
268
- Avoid background music or noise
269

270
### Streaming Implementation  
271
- Buffer audio in 200-500ms chunks
272
- Implement proper sequence numbering
273
- Handle network interruptions gracefully
274
- Process results as they arrive
275

276
### Error Recovery
277
- Implement retry logic for transient errors
278
- Validate audio format before submission
279
- Monitor session state across chunks
280
- Provide user feedback for processing status

Version

Tile

Files

speech-translation.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

speech-translation.mddocs/