or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

converters.mdindex.mdinference.mdspecialized.mdspecifications.mdutilities.md

index.mddocs/

0

# CTranslate2

1

2

A high-performance C++ and Python library specifically designed for efficient inference with Transformer models across various architectures including encoder-decoder models (Transformer, BART, T5, Whisper), decoder-only models (GPT-2, Llama, Mistral), and encoder-only models (BERT, RoBERTa). The library implements a custom runtime that applies advanced performance optimization techniques such as weights quantization, layer fusion, batch reordering, and memory management to significantly accelerate inference and reduce memory usage on both CPU and GPU platforms.

3

4

## Package Information

5

6

- **Package Name**: ctranslate2

7

- **Package Type**: PyPI

8

- **Language**: Python (with C++ backend)

9

- **Installation**: `pip install ctranslate2`

10

11

## Core Imports

12

13

```python

14

import ctranslate2

15

```

16

17

Common usage patterns:

18

19

```python

20

from ctranslate2 import Translator, Generator, Encoder

21

from ctranslate2 import TransformersConverter, contains_model

22

```

23

24

## Basic Usage

25

26

```python

27

import ctranslate2

28

29

# Translation example (seq2seq models)

30

translator = ctranslate2.Translator("path/to/ct2_model", device="cpu")

31

results = translator.translate_batch([["Hello", "world"]])

32

print(results[0].hypotheses[0]) # Translated text

33

34

# Generation example (language models)

35

generator = ctranslate2.Generator("path/to/ct2_model", device="cpu")

36

results = generator.generate_batch([["The quick brown"]])

37

print(results[0].sequences[0]) # Generated continuation

38

39

# Model conversion example

40

converter = ctranslate2.converters.TransformersConverter("microsoft/DialoGPT-medium")

41

converter.convert("ct2_model_output")

42

```

43

44

## Architecture

45

46

CTranslate2 follows a modular architecture:

47

48

- **Core Inference Classes**: `Translator`, `Generator`, `Encoder` for different model types

49

- **Model Converters**: Framework-specific converters for Transformers, Fairseq, OpenNMT, etc.

50

- **Model Specifications**: Programmatic model definition classes for building models from scratch

51

- **Specialized Models**: Domain-specific classes like `Whisper` for speech recognition

52

- **Storage and Configuration**: `StorageView` for efficient tensor operations, device management

53

54

## Capabilities

55

56

### Model Inference

57

58

Core inference functionality for running Transformer models with high performance. Supports translation, generation, and encoding tasks with batching, streaming, and asynchronous processing.

59

60

```python { .api }

61

class Translator:

62

def __init__(self, model_path: str, device: str = "auto",

63

device_index: int = 0, compute_type: str = "default",

64

inter_threads: int = 1, intra_threads: int = 0,

65

max_queued_batches: int = 0, flash_attention: bool = False,

66

tensor_parallel: bool = False, files: dict = None): ...

67

68

def translate_batch(self, source: list, target_prefix: list = None, **kwargs) -> list: ...

69

def score_batch(self, source: list, target: list, **kwargs) -> list: ...

70

71

class Generator:

72

def __init__(self, model_path: str, device: str = "auto",

73

device_index: int = 0, compute_type: str = "default",

74

inter_threads: int = 1, intra_threads: int = 0,

75

max_queued_batches: int = 0, flash_attention: bool = False,

76

tensor_parallel: bool = False, files: dict = None): ...

77

78

def generate_batch(self, start_tokens: list, **kwargs) -> list: ...

79

def score_batch(self, tokens: list, **kwargs) -> list: ...

80

81

class Encoder:

82

def __init__(self, model_path: str, device: str = "auto",

83

device_index: int = 0, compute_type: str = "default",

84

inter_threads: int = 1, intra_threads: int = 0,

85

max_queued_batches: int = 0, files: dict = None): ...

86

87

def forward_batch(self, inputs: list, **kwargs) -> list: ...

88

```

89

90

[Model Inference](./inference.md)

91

92

### Model Conversion

93

94

Convert models from popular frameworks (Transformers, Fairseq, OpenNMT, etc.) to CTranslate2 format for optimized inference. Supports quantization, file copying, and various framework-specific options.

95

96

```python { .api }

97

class TransformersConverter:

98

def __init__(self, model_name_or_path: str, activation_scales: str = None,

99

copy_files: list = None, load_as_float16: bool = False,

100

revision: str = None, low_cpu_mem_usage: bool = False,

101

trust_remote_code: bool = False): ...

102

103

def convert(self, output_dir: str, vmap: str = None,

104

quantization: str = None, force: bool = False): ...

105

106

# Additional converters

107

class FairseqConverter: ...

108

class OpenNMTPyConverter: ...

109

class OpenNMTTFConverter: ...

110

class MarianConverter: ...

111

class OpusMTConverter: ...

112

class OpenAIGPT2Converter: ...

113

```

114

115

[Model Conversion](./converters.md)

116

117

### Model Specifications

118

119

Programmatically define and build Transformer model architectures from scratch. Supports various model types including sequence-to-sequence, decoder-only, and encoder-only models with extensive configuration options.

120

121

```python { .api }

122

class TransformerSpec:

123

def __init__(self, encoder: TransformerEncoderSpec, decoder: TransformerDecoderSpec): ...

124

@classmethod

125

def from_config(cls, num_layers: int, num_heads: int, **kwargs): ...

126

127

def save(self, output_dir: str): ...

128

def validate(self): ...

129

def optimize(self, quantization: str = None): ...

130

131

class TransformerDecoderModelSpec:

132

def __init__(self, decoder: TransformerDecoderSpec): ...

133

@classmethod

134

def from_config(cls, num_layers: int, num_heads: int, **kwargs): ...

135

136

class TransformerEncoderModelSpec:

137

def __init__(self, encoder: TransformerEncoderSpec, pooling_layer: bool = False): ...

138

```

139

140

[Model Specifications](./specifications.md)

141

142

### Specialized Models

143

144

Domain-specific model classes for speech recognition and audio processing tasks. Includes Whisper for speech-to-text and Wav2Vec2 for speech representation learning.

145

146

```python { .api }

147

class Whisper:

148

def __init__(self, model_path: str, device: str = "auto", **kwargs): ...

149

def transcribe(self, features: list, **kwargs) -> list: ...

150

def detect_language(self, features: list, **kwargs) -> list: ...

151

152

class Wav2Vec2:

153

def __init__(self, model_path: str, device: str = "auto", **kwargs): ...

154

def encode(self, features: list, **kwargs) -> list: ...

155

156

class Wav2Vec2Bert:

157

def __init__(self, model_path: str, device: str = "auto", **kwargs): ...

158

def encode(self, features: list, **kwargs) -> list: ...

159

```

160

161

[Specialized Models](./specialized.md)

162

163

### Utilities and Configuration

164

165

Helper functions for model management, device configuration, logging, and tensor operations. Includes utilities for checking model compatibility and managing computational resources.

166

167

```python { .api }

168

def contains_model(path: str) -> bool: ...

169

def get_cuda_device_count() -> int: ...

170

def get_supported_compute_types(device: str, device_index: int = 0) -> list: ...

171

def set_random_seed(seed: int): ...

172

def get_log_level() -> str: ...

173

def set_log_level(level: str): ...

174

175

class StorageView:

176

def __init__(self, array=None, dtype=None): ...

177

def numpy(self): ...

178

def copy(self): ...

179

def to(self, dtype: str): ...

180

181

@property

182

def shape(self) -> tuple: ...

183

@property

184

def size(self) -> int: ...

185

@property

186

def dtype(self) -> str: ...

187

```

188

189

[Utilities](./utilities.md)

190

191

## Types

192

193

```python { .api }

194

# Result classes

195

class TranslationResult:

196

hypotheses: list[str]

197

scores: list[float]

198

199

class GenerationResult:

200

sequences: list[list[str]]

201

scores: list[float]

202

203

class ScoringResult:

204

scores: list[float]

205

206

class GenerationStepResult:

207

token: str

208

token_id: int

209

is_last: bool

210

log_prob: float

211

212

class EncoderForwardOutput:

213

last_hidden_state: StorageView

214

pooler_output: StorageView

215

216

# Enumerations

217

class DataType:

218

FLOAT32: str

219

FLOAT16: str

220

INT8: str

221

INT16: str

222

INT32: str

223

224

class Device:

225

CPU: str

226

CUDA: str

227

AUTO: str

228

229

# Configuration classes

230

class ExecutionStats:

231

num_tokens: int

232

num_examples: int

233

total_time_in_ms: float

234

235

class MpiInfo:

236

rank: int

237

size: int

238

```