0
# Model Conversion
1
2
Convert models from popular frameworks (Transformers, Fairseq, OpenNMT, etc.) to CTranslate2 format for optimized inference. CTranslate2 converters support quantization, file copying, and various framework-specific options to ensure optimal performance and compatibility.
3
4
## Capabilities
5
6
### Transformers Converter
7
8
Convert Hugging Face Transformers models to CTranslate2 format. Supports most popular model architectures including BERT, GPT-2, T5, BART, and more.
9
10
```python { .api }
11
class TransformersConverter:
12
def __init__(self, model_name_or_path: str, activation_scales: str = None,
13
copy_files: list = None, load_as_float16: bool = False,
14
revision: str = None, low_cpu_mem_usage: bool = False,
15
trust_remote_code: bool = False):
16
"""
17
Initialize converter for Hugging Face Transformers models.
18
19
Args:
20
model_name_or_path (str): Model name on Hub or local path
21
activation_scales (str): Path to activation scales for smoothquant
22
copy_files (list): Additional files to copy to output directory
23
load_as_float16 (bool): Load model weights in float16
24
revision (str): Model revision/branch to use
25
low_cpu_mem_usage (bool): Enable low CPU memory loading
26
trust_remote_code (bool): Allow custom code execution
27
"""
28
29
def convert(self, output_dir: str, vmap: str = None,
30
quantization: str = None, force: bool = False) -> str:
31
"""
32
Convert the model to CTranslate2 format.
33
34
Args:
35
output_dir (str): Output directory for converted model
36
vmap (str): Path to vocabulary mapping file
37
quantization (str): Quantization type ("int8", "int8_float16", "int16", "float16")
38
force (bool): Overwrite output directory if it exists
39
40
Returns:
41
str: Path to the converted model directory
42
"""
43
44
def convert_from_args(self, args) -> str:
45
"""
46
Convert model using parsed command-line arguments.
47
48
Args:
49
args: Parsed arguments object
50
51
Returns:
52
str: Path to the converted model directory
53
"""
54
55
@staticmethod
56
def declare_arguments(parser):
57
"""
58
Add converter-specific arguments to argument parser.
59
60
Args:
61
parser: ArgumentParser instance to modify
62
"""
63
```
64
65
### Fairseq Converter
66
67
Convert Fairseq models to CTranslate2 format. Supports various Fairseq model architectures.
68
69
```python { .api }
70
class FairseqConverter:
71
def __init__(self, model_path: str, data_dir: str = None):
72
"""
73
Initialize converter for Fairseq models.
74
75
Args:
76
model_path (str): Path to Fairseq model checkpoint
77
data_dir (str): Path to data directory with vocabularies
78
"""
79
80
def convert(self, output_dir: str, vmap: str = None,
81
quantization: str = None, force: bool = False) -> str:
82
"""
83
Convert the Fairseq model to CTranslate2 format.
84
85
Args:
86
output_dir (str): Output directory for converted model
87
vmap (str): Path to vocabulary mapping file
88
quantization (str): Quantization type
89
force (bool): Overwrite output directory if it exists
90
91
Returns:
92
str: Path to the converted model directory
93
"""
94
```
95
96
### OpenNMT Converters
97
98
Convert OpenNMT-py and OpenNMT-tf models to CTranslate2 format.
99
100
```python { .api }
101
class OpenNMTPyConverter:
102
def __init__(self, model_path: str):
103
"""
104
Initialize converter for OpenNMT-py models.
105
106
Args:
107
model_path (str): Path to OpenNMT-py model file
108
"""
109
110
def convert(self, output_dir: str, vmap: str = None,
111
quantization: str = None, force: bool = False) -> str:
112
"""Convert the OpenNMT-py model to CTranslate2 format."""
113
114
class OpenNMTTFConverter:
115
def __init__(self, model_path: str):
116
"""
117
Initialize converter for OpenNMT-tf models.
118
119
Args:
120
model_path (str): Path to OpenNMT-tf model checkpoint
121
"""
122
123
def convert(self, output_dir: str, vmap: str = None,
124
quantization: str = None, force: bool = False) -> str:
125
"""Convert the OpenNMT-tf model to CTranslate2 format."""
126
```
127
128
### Marian Converter
129
130
Convert Marian NMT models to CTranslate2 format.
131
132
```python { .api }
133
class MarianConverter:
134
def __init__(self, model_path: str):
135
"""
136
Initialize converter for Marian models.
137
138
Args:
139
model_path (str): Path to Marian model directory
140
"""
141
142
def convert(self, output_dir: str, vmap: str = None,
143
quantization: str = None, force: bool = False) -> str:
144
"""Convert the Marian model to CTranslate2 format."""
145
```
146
147
### OPUS-MT Converter
148
149
Convert OPUS-MT models to CTranslate2 format.
150
151
```python { .api }
152
class OpusMTConverter:
153
def __init__(self, model_name: str):
154
"""
155
Initialize converter for OPUS-MT models.
156
157
Args:
158
model_name (str): OPUS-MT model name from Hugging Face Hub
159
"""
160
161
def convert(self, output_dir: str, vmap: str = None,
162
quantization: str = None, force: bool = False) -> str:
163
"""Convert the OPUS-MT model to CTranslate2 format."""
164
```
165
166
### OpenAI GPT-2 Converter
167
168
Convert OpenAI GPT-2 models to CTranslate2 format.
169
170
```python { .api }
171
class OpenAIGPT2Converter:
172
def __init__(self, model_name: str = "124M"):
173
"""
174
Initialize converter for OpenAI GPT-2 models.
175
176
Args:
177
model_name (str): GPT-2 model size ("124M", "355M", "774M", "1558M")
178
"""
179
180
def convert(self, output_dir: str, vmap: str = None,
181
quantization: str = None, force: bool = False) -> str:
182
"""Convert the GPT-2 model to CTranslate2 format."""
183
```
184
185
### Base Converter Class
186
187
All converters inherit from this base class providing common functionality.
188
189
```python { .api }
190
class Converter:
191
"""Abstract base class for model converters."""
192
193
def convert(self, output_dir: str, vmap: str = None,
194
quantization: str = None, force: bool = False) -> str:
195
"""
196
Convert model to CTranslate2 format.
197
198
Args:
199
output_dir (str): Output directory for converted model
200
vmap (str): Path to vocabulary mapping file
201
quantization (str): Quantization type
202
force (bool): Overwrite output directory if it exists
203
204
Returns:
205
str: Path to the converted model directory
206
"""
207
208
def convert_from_args(self, args) -> str:
209
"""
210
Convert model using parsed command-line arguments.
211
212
Args:
213
args: Parsed arguments object with conversion parameters
214
215
Returns:
216
str: Path to the converted model directory
217
"""
218
219
@staticmethod
220
def declare_arguments(parser):
221
"""
222
Add common converter arguments to argument parser.
223
224
Args:
225
parser: ArgumentParser instance to modify
226
"""
227
```
228
229
## Console Scripts
230
231
CTranslate2 provides command-line tools for model conversion:
232
233
```python { .api }
234
# Available console scripts (entry points):
235
# ct2-transformers-converter - Convert Transformers models
236
# ct2-fairseq-converter - Convert Fairseq models
237
# ct2-opennmt-py-converter - Convert OpenNMT-py models
238
# ct2-opennmt-tf-converter - Convert OpenNMT-tf models
239
# ct2-marian-converter - Convert Marian models
240
# ct2-opus-mt-converter - Convert OPUS-MT models
241
# ct2-openai-gpt2-converter - Convert OpenAI GPT-2 models
242
```
243
244
## Conversion Utilities
245
246
Helper functions for model conversion and optimization.
247
248
```python { .api }
249
def fuse_linear(spec, layers: list):
250
"""
251
Fuse multiple linear layers for optimization.
252
253
Args:
254
spec: Model specification object
255
layers (list): List of linear layers to fuse
256
"""
257
258
def fuse_linear_prequant(spec, layers: list, axis: int):
259
"""
260
Fuse pre-quantized linear layers.
261
262
Args:
263
spec: Model specification object
264
layers (list): List of pre-quantized linear layers
265
axis (int): Axis along which to fuse
266
"""
267
268
def permute_for_sliced_rotary(weight, num_heads: int, rotary_dim: int = None):
269
"""
270
Permute weights for rotary position embeddings.
271
272
Args:
273
weight: Weight tensor to permute
274
num_heads (int): Number of attention heads
275
rotary_dim (int): Rotary embedding dimension
276
277
Returns:
278
Permuted weight tensor
279
"""
280
281
def smooth_activation(layer_norm, linear, activation_scales):
282
"""
283
Apply SmoothQuant activation smoothing technique.
284
285
Args:
286
layer_norm: Layer normalization module
287
linear: Linear layer module
288
activation_scales: Activation scaling factors
289
"""
290
```
291
292
## Usage Examples
293
294
### Converting Transformers Models
295
296
```python
297
import ctranslate2
298
299
# Convert a Hugging Face model
300
converter = ctranslate2.converters.TransformersConverter("microsoft/DialoGPT-medium")
301
converter.convert("ct2_model", quantization="int8")
302
303
# Convert with additional options
304
converter = ctranslate2.converters.TransformersConverter(
305
"t5-small",
306
copy_files=["config.json", "tokenizer.json"],
307
load_as_float16=True
308
)
309
converter.convert("t5_ct2", quantization="int8_float16")
310
311
# Convert local model
312
converter = ctranslate2.converters.TransformersConverter("/path/to/local/model")
313
converter.convert("output_dir", force=True)
314
```
315
316
### Converting Other Frameworks
317
318
```python
319
import ctranslate2
320
321
# Convert Fairseq model
322
fairseq_converter = ctranslate2.converters.FairseqConverter(
323
"checkpoint_best.pt",
324
data_dir="data-bin/wmt14_en_de"
325
)
326
fairseq_converter.convert("fairseq_ct2")
327
328
# Convert OpenNMT-py model
329
opennmt_converter = ctranslate2.converters.OpenNMTPyConverter("model.pt")
330
opennmt_converter.convert("opennmt_ct2")
331
332
# Convert OPUS-MT model
333
opus_converter = ctranslate2.converters.OpusMTConverter("Helsinki-NLP/opus-mt-en-de")
334
opus_converter.convert("opus_ct2")
335
```
336
337
### Using Command Line Tools
338
339
```bash
340
# Convert Transformers model
341
ct2-transformers-converter --model microsoft/DialoGPT-medium --output_dir ct2_model --quantization int8
342
343
# Convert with custom options
344
ct2-transformers-converter \
345
--model t5-small \
346
--output_dir t5_ct2 \
347
--quantization int8_float16 \
348
--copy_files config.json tokenizer.json \
349
--load_as_float16
350
351
# Convert Fairseq model
352
ct2-fairseq-converter \
353
--model_path checkpoint_best.pt \
354
--data_dir data-bin/wmt14_en_de \
355
--output_dir fairseq_ct2 \
356
--quantization int8
357
```
358
359
### Quantization Options
360
361
```python
362
# Available quantization types:
363
quantization_options = [
364
"int8", # 8-bit integer quantization
365
"int8_float16", # 8-bit weights, 16-bit activations
366
"int16", # 16-bit integer quantization
367
"float16", # 16-bit floating point
368
"int8_float32", # 8-bit weights, 32-bit activations
369
"int4", # 4-bit integer quantization (experimental)
370
]
371
372
# Example with different quantization levels
373
converter = ctranslate2.converters.TransformersConverter("gpt2")
374
375
# Fastest inference, smaller model
376
converter.convert("gpt2_int8", quantization="int8")
377
378
# Balanced speed/quality
379
converter.convert("gpt2_fp16", quantization="float16")
380
381
# Highest quality, larger model
382
converter.convert("gpt2_fp32") # No quantization (default)
383
```
384
385
## Types
386
387
```python { .api }
388
# Quantization types
389
class Quantization:
390
CT2: str # Standard CTranslate2 quantization
391
AWQ_GEMM: str # AWQ quantization with GEMM
392
AWQ_GEMV: str # AWQ quantization with GEMV
393
```