Fast inference engine for Transformer models
npx @tessl/cli install tessl/pypi-ctranslate2@4.6.00
# CTranslate2
1
2
A high-performance C++ and Python library specifically designed for efficient inference with Transformer models across various architectures including encoder-decoder models (Transformer, BART, T5, Whisper), decoder-only models (GPT-2, Llama, Mistral), and encoder-only models (BERT, RoBERTa). The library implements a custom runtime that applies advanced performance optimization techniques such as weights quantization, layer fusion, batch reordering, and memory management to significantly accelerate inference and reduce memory usage on both CPU and GPU platforms.
3
4
## Package Information
5
6
- **Package Name**: ctranslate2
7
- **Package Type**: PyPI
8
- **Language**: Python (with C++ backend)
9
- **Installation**: `pip install ctranslate2`
10
11
## Core Imports
12
13
```python
14
import ctranslate2
15
```
16
17
Common usage patterns:
18
19
```python
20
from ctranslate2 import Translator, Generator, Encoder
21
from ctranslate2 import TransformersConverter, contains_model
22
```
23
24
## Basic Usage
25
26
```python
27
import ctranslate2
28
29
# Translation example (seq2seq models)
30
translator = ctranslate2.Translator("path/to/ct2_model", device="cpu")
31
results = translator.translate_batch([["Hello", "world"]])
32
print(results[0].hypotheses[0]) # Translated text
33
34
# Generation example (language models)
35
generator = ctranslate2.Generator("path/to/ct2_model", device="cpu")
36
results = generator.generate_batch([["The quick brown"]])
37
print(results[0].sequences[0]) # Generated continuation
38
39
# Model conversion example
40
converter = ctranslate2.converters.TransformersConverter("microsoft/DialoGPT-medium")
41
converter.convert("ct2_model_output")
42
```
43
44
## Architecture
45
46
CTranslate2 follows a modular architecture:
47
48
- **Core Inference Classes**: `Translator`, `Generator`, `Encoder` for different model types
49
- **Model Converters**: Framework-specific converters for Transformers, Fairseq, OpenNMT, etc.
50
- **Model Specifications**: Programmatic model definition classes for building models from scratch
51
- **Specialized Models**: Domain-specific classes like `Whisper` for speech recognition
52
- **Storage and Configuration**: `StorageView` for efficient tensor operations, device management
53
54
## Capabilities
55
56
### Model Inference
57
58
Core inference functionality for running Transformer models with high performance. Supports translation, generation, and encoding tasks with batching, streaming, and asynchronous processing.
59
60
```python { .api }
61
class Translator:
62
def __init__(self, model_path: str, device: str = "auto",
63
device_index: int = 0, compute_type: str = "default",
64
inter_threads: int = 1, intra_threads: int = 0,
65
max_queued_batches: int = 0, flash_attention: bool = False,
66
tensor_parallel: bool = False, files: dict = None): ...
67
68
def translate_batch(self, source: list, target_prefix: list = None, **kwargs) -> list: ...
69
def score_batch(self, source: list, target: list, **kwargs) -> list: ...
70
71
class Generator:
72
def __init__(self, model_path: str, device: str = "auto",
73
device_index: int = 0, compute_type: str = "default",
74
inter_threads: int = 1, intra_threads: int = 0,
75
max_queued_batches: int = 0, flash_attention: bool = False,
76
tensor_parallel: bool = False, files: dict = None): ...
77
78
def generate_batch(self, start_tokens: list, **kwargs) -> list: ...
79
def score_batch(self, tokens: list, **kwargs) -> list: ...
80
81
class Encoder:
82
def __init__(self, model_path: str, device: str = "auto",
83
device_index: int = 0, compute_type: str = "default",
84
inter_threads: int = 1, intra_threads: int = 0,
85
max_queued_batches: int = 0, files: dict = None): ...
86
87
def forward_batch(self, inputs: list, **kwargs) -> list: ...
88
```
89
90
[Model Inference](./inference.md)
91
92
### Model Conversion
93
94
Convert models from popular frameworks (Transformers, Fairseq, OpenNMT, etc.) to CTranslate2 format for optimized inference. Supports quantization, file copying, and various framework-specific options.
95
96
```python { .api }
97
class TransformersConverter:
98
def __init__(self, model_name_or_path: str, activation_scales: str = None,
99
copy_files: list = None, load_as_float16: bool = False,
100
revision: str = None, low_cpu_mem_usage: bool = False,
101
trust_remote_code: bool = False): ...
102
103
def convert(self, output_dir: str, vmap: str = None,
104
quantization: str = None, force: bool = False): ...
105
106
# Additional converters
107
class FairseqConverter: ...
108
class OpenNMTPyConverter: ...
109
class OpenNMTTFConverter: ...
110
class MarianConverter: ...
111
class OpusMTConverter: ...
112
class OpenAIGPT2Converter: ...
113
```
114
115
[Model Conversion](./converters.md)
116
117
### Model Specifications
118
119
Programmatically define and build Transformer model architectures from scratch. Supports various model types including sequence-to-sequence, decoder-only, and encoder-only models with extensive configuration options.
120
121
```python { .api }
122
class TransformerSpec:
123
def __init__(self, encoder: TransformerEncoderSpec, decoder: TransformerDecoderSpec): ...
124
@classmethod
125
def from_config(cls, num_layers: int, num_heads: int, **kwargs): ...
126
127
def save(self, output_dir: str): ...
128
def validate(self): ...
129
def optimize(self, quantization: str = None): ...
130
131
class TransformerDecoderModelSpec:
132
def __init__(self, decoder: TransformerDecoderSpec): ...
133
@classmethod
134
def from_config(cls, num_layers: int, num_heads: int, **kwargs): ...
135
136
class TransformerEncoderModelSpec:
137
def __init__(self, encoder: TransformerEncoderSpec, pooling_layer: bool = False): ...
138
```
139
140
[Model Specifications](./specifications.md)
141
142
### Specialized Models
143
144
Domain-specific model classes for speech recognition and audio processing tasks. Includes Whisper for speech-to-text and Wav2Vec2 for speech representation learning.
145
146
```python { .api }
147
class Whisper:
148
def __init__(self, model_path: str, device: str = "auto", **kwargs): ...
149
def transcribe(self, features: list, **kwargs) -> list: ...
150
def detect_language(self, features: list, **kwargs) -> list: ...
151
152
class Wav2Vec2:
153
def __init__(self, model_path: str, device: str = "auto", **kwargs): ...
154
def encode(self, features: list, **kwargs) -> list: ...
155
156
class Wav2Vec2Bert:
157
def __init__(self, model_path: str, device: str = "auto", **kwargs): ...
158
def encode(self, features: list, **kwargs) -> list: ...
159
```
160
161
[Specialized Models](./specialized.md)
162
163
### Utilities and Configuration
164
165
Helper functions for model management, device configuration, logging, and tensor operations. Includes utilities for checking model compatibility and managing computational resources.
166
167
```python { .api }
168
def contains_model(path: str) -> bool: ...
169
def get_cuda_device_count() -> int: ...
170
def get_supported_compute_types(device: str, device_index: int = 0) -> list: ...
171
def set_random_seed(seed: int): ...
172
def get_log_level() -> str: ...
173
def set_log_level(level: str): ...
174
175
class StorageView:
176
def __init__(self, array=None, dtype=None): ...
177
def numpy(self): ...
178
def copy(self): ...
179
def to(self, dtype: str): ...
180
181
@property
182
def shape(self) -> tuple: ...
183
@property
184
def size(self) -> int: ...
185
@property
186
def dtype(self) -> str: ...
187
```
188
189
[Utilities](./utilities.md)
190
191
## Types
192
193
```python { .api }
194
# Result classes
195
class TranslationResult:
196
hypotheses: list[str]
197
scores: list[float]
198
199
class GenerationResult:
200
sequences: list[list[str]]
201
scores: list[float]
202
203
class ScoringResult:
204
scores: list[float]
205
206
class GenerationStepResult:
207
token: str
208
token_id: int
209
is_last: bool
210
log_prob: float
211
212
class EncoderForwardOutput:
213
last_hidden_state: StorageView
214
pooler_output: StorageView
215
216
# Enumerations
217
class DataType:
218
FLOAT32: str
219
FLOAT16: str
220
INT8: str
221
INT16: str
222
INT32: str
223
224
class Device:
225
CPU: str
226
CUDA: str
227
AUTO: str
228
229
# Configuration classes
230
class ExecutionStats:
231
num_tokens: int
232
num_examples: int
233
total_time_in_ms: float
234
235
class MpiInfo:
236
rank: int
237
size: int
238
```