Open-source neural machine translation library based on OpenNMT's CTranslate2
npx @tessl/cli install tessl/pypi-argostranslate@1.9.00
# Argostranslate
1
2
An open-source offline neural machine translation library that enables developers to perform language translation without requiring internet connectivity or external API calls. Built on top of OpenNMT's CTranslate2 framework, argostranslate supports automatic language detection, pivoting through intermediate languages for indirect translation paths, and manages installable language model packages. The library offers multiple interfaces including Python API, command-line tools, and GUI applications, making it suitable for integration into various applications while maintaining offline functionality and user privacy.
3
4
## Package Information
5
6
- **Package Name**: argostranslate
7
- **Language**: Python
8
- **Installation**: `pip install argostranslate`
9
- **License**: MIT
10
- **Python Version**: >=3.5
11
12
## Core Imports
13
14
```python
15
import argostranslate.translate
16
import argostranslate.package
17
```
18
19
For translation functionality:
20
21
```python
22
from argostranslate import translate
23
```
24
25
For package management:
26
27
```python
28
from argostranslate import package
29
```
30
31
## Basic Usage
32
33
```python
34
from argostranslate import translate, package
35
36
# Install a translation package first (if not already installed)
37
available_packages = package.get_available_packages()
38
en_to_es_packages = [p for p in available_packages if p.from_code == "en" and p.to_code == "es"]
39
if en_to_es_packages:
40
en_to_es_packages[0].install()
41
42
# Perform translation
43
translated_text = translate.translate("Hello world", "en", "es")
44
print(translated_text) # "Hola mundo"
45
46
# Get available languages
47
installed_languages = translate.get_installed_languages()
48
for lang in installed_languages:
49
print(f"{lang.code}: {lang.name}")
50
51
# Get translation object for reuse
52
translation = translate.get_translation_from_codes("en", "es")
53
if translation:
54
result = translation.translate("How are you?")
55
print(result) # "¿Cómo estás?"
56
```
57
58
## Architecture
59
60
Argostranslate uses a modular architecture built around several key components:
61
62
- **Translation Engine**: Core translation functionality with support for multiple backends (OpenNMT, LibreTranslate, OpenAI)
63
- **Package System**: Manages downloadable language model packages with automatic dependency resolution
64
- **Language Detection**: Automatic source language identification when not specified
65
- **Translation Pivoting**: Enables indirect translation through intermediate languages when direct models aren't available
66
- **Caching Layer**: Performance optimization through translation result caching
67
- **CLI Tools**: Command-line interfaces for both translation (`argos-translate`) and package management (`argospm`)
68
69
The library supports multiple translation backends, allowing users to choose between offline neural models (default), remote API services, or large language model providers based on their needs.
70
71
## Capabilities
72
73
### Core Translation
74
75
Primary translation functionality including simple text translation, multiple translation hypotheses, language detection, and translation chaining through intermediate languages when direct translation models are unavailable.
76
77
```python { .api }
78
def translate(q: str, from_code: str, to_code: str) -> str:
79
"""Main translation function for simple text translation."""
80
81
def get_installed_languages() -> list[Language]:
82
"""Get list of installed languages."""
83
84
def get_language_from_code(code: str) -> Language | None:
85
"""Get language object from ISO code."""
86
87
def get_translation_from_codes(from_code: str, to_code: str) -> ITranslation | None:
88
"""Get translation object for reuse."""
89
```
90
91
```python { .api }
92
class Language:
93
def __init__(self, code: str, name: str): ...
94
def get_translation(self, to: Language) -> ITranslation | None: ...
95
96
class ITranslation:
97
def translate(self, input_text: str) -> str: ...
98
def hypotheses(self, input_text: str, num_hypotheses: int = 4) -> list[Hypothesis]: ...
99
100
class Hypothesis:
101
def __init__(self, value: str, score: float): ...
102
```
103
104
[Translation](./translation.md)
105
106
### Package Management
107
108
Installation and management of translation model packages, including downloading from remote repositories, installing from local files, and managing package dependencies and updates.
109
110
```python { .api }
111
def get_installed_packages(path: Path = None) -> list[Package]:
112
"""Get list of installed translation packages."""
113
114
def get_available_packages() -> list[AvailablePackage]:
115
"""Get list of packages available for download."""
116
117
def install_from_path(path: Path):
118
"""Install package from local file path."""
119
120
def uninstall(pkg: Package):
121
"""Remove installed package."""
122
```
123
124
```python { .api }
125
class Package:
126
def __init__(self, package_path: Path): ...
127
def update(self): ...
128
def get_readme(self) -> str | None: ...
129
130
class AvailablePackage:
131
def __init__(self, metadata): ...
132
def download(self) -> Path: ...
133
def install(self): ...
134
```
135
136
[Package Management](./package-management.md)
137
138
### External API Integration
139
140
Integration with external translation services including LibreTranslate API and OpenAI language models, providing alternative translation backends when offline models are insufficient or unavailable.
141
142
```python { .api }
143
class LibreTranslateAPI:
144
def __init__(self, url: str = None, api_key: str = None): ...
145
def translate(self, q: str, source: str = "en", target: str = "es") -> str: ...
146
def languages(self): ...
147
def detect(self, q: str): ...
148
149
class OpenAIAPI:
150
def __init__(self, api_key: str): ...
151
def infer(self, prompt: str) -> str | None: ...
152
```
153
154
[External APIs](./external-apis.md)
155
156
### Text Processing
157
158
Advanced text processing capabilities including tokenization, sentence boundary detection, format preservation during translation, and byte pair encoding support for high-quality neural machine translation.
159
160
```python { .api }
161
# Tokenization interfaces
162
class Tokenizer:
163
def encode(self, sentence: str) -> List[str]: ...
164
def decode(self, tokens: List[str]) -> str: ...
165
166
class SentencePieceTokenizer(Tokenizer):
167
def __init__(self, model_file: Path): ...
168
169
class BPETokenizer(Tokenizer):
170
def __init__(self, model_file: Path, from_code: str, to_code: str): ...
171
172
# Sentence boundary detection
173
def get_sbd_package() -> Package | None: ...
174
def detect_sentence(input_text: str, sbd_translation, sentence_guess_length: int = 150) -> int: ...
175
176
# Format preservation
177
class ITag:
178
translateable: bool
179
def text(self) -> str: ...
180
181
class Tag(ITag):
182
def __init__(self, children: ITag | str, translateable: bool = True): ...
183
184
def translate_preserve_formatting(underlying_translation: ITranslation, input_text: str) -> str: ...
185
186
# Byte Pair Encoding
187
class BPE:
188
def __init__(self, codes, merges: int = -1, separator: str = '@@', vocab = None, glossaries = None): ...
189
def segment(self, sentence): ...
190
```
191
192
[Text Processing](./text-processing.md)
193
194
### Configuration and Settings
195
196
Configuration management including data directories, cache settings, remote repository URLs, device selection (CPU/CUDA), API keys, and experimental feature flags.
197
198
```python { .api }
199
# Configuration variables
200
debug: bool
201
data_dir: Path
202
package_data_dir: Path
203
cache_dir: Path
204
remote_repo: str
205
device: str
206
libretranslate_api_key: str
207
openai_api_key: str
208
209
# Model provider selection
210
class ModelProvider:
211
OPENNMT = 0
212
LIBRETRANSLATE = 1
213
OPENAI = 2
214
215
model_provider: ModelProvider
216
```
217
218
[Configuration](./configuration.md)
219
220
## Command Line Interfaces
221
222
### argos-translate CLI
223
224
Main command-line interface for performing translations directly from the terminal.
225
226
**Entry Point**: `argos-translate`
227
228
**Usage**: Provides command-line access to translation functionality with support for different languages and input/output options.
229
230
### argospm CLI
231
232
Package manager for installing and managing translation model packages.
233
234
**Entry Point**: `argospm`
235
236
**Common Commands**:
237
- Update package index
238
- Install packages
239
- List installed packages
240
- Search available packages
241
- Remove packages
242
243
## Error Handling
244
245
The library uses standard Python exceptions. Common error scenarios include:
246
247
- `FileNotFoundError`: When package files or directories are not found
248
- Network errors during package downloads
249
- Translation errors when unsupported language pairs are requested
250
- API authentication errors for external services
251
252
## Supported Languages
253
254
Argostranslate supports over 30 languages with automatic pivoting capabilities. Language support depends on installed translation packages. The library includes a comprehensive language database (`languages.csv`) with ISO codes and names for 184 languages.
255
256
## Advanced Features
257
258
- **Translation Caching**: Automatic caching of translation results for improved performance
259
- **Sentence Boundary Detection**: Intelligent text segmentation for better translation quality
260
- **Format Preservation**: Maintains text formatting during translation using tag-based processing
261
- **BPE Tokenization**: Byte Pair Encoding support for improved translation accuracy
262
- **Fewshot Translation**: Integration with large language models for advanced translation scenarios