or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

audio.mdbatch.mdchat-completions.mdcode-interpreter.mdcompletions.mdembeddings.mdendpoints.mdevaluation.mdfiles.mdfine-tuning.mdimages.mdindex.mdmodels.mdrerank.md

chat-completions.mddocs/

0

# Chat Completions

1

2

Advanced conversational AI interface supporting text, image, and video inputs with streaming capabilities, comprehensive configuration options, and both synchronous and asynchronous operations.

3

4

## Capabilities

5

6

### Basic Chat Completion

7

8

Creates chat completions with conversational context and message history.

9

10

```python { .api }

11

def create(

12

*,

13

messages: List[Dict[str, Any]],

14

model: str,

15

max_tokens: Optional[int] = None,

16

stop: Optional[List[str]] = None,

17

temperature: Optional[float] = None,

18

top_p: Optional[float] = None,

19

top_k: Optional[int] = None,

20

repetition_penalty: Optional[float] = None,

21

presence_penalty: Optional[float] = None,

22

frequency_penalty: Optional[float] = None,

23

min_p: Optional[float] = None,

24

logit_bias: Optional[Dict[str, float]] = None,

25

seed: Optional[int] = None,

26

stream: bool = False,

27

logprobs: Optional[int] = None,

28

echo: Optional[bool] = None,

29

n: Optional[int] = None,

30

safety_model: Optional[str] = None,

31

response_format: Optional[Dict[str, Any]] = None,

32

tools: Optional[List[Dict[str, Any]]] = None,

33

tool_choice: Optional[Union[str, Dict[str, Union[str, Dict[str, str]]]]] = None,

34

**kwargs

35

) -> Union[ChatCompletionResponse, Iterator[ChatCompletionChunk]]:

36

"""

37

Create a chat completion with conversational messages.

38

39

Args:

40

messages: List of message objects with role and content (Dict[str, Any])

41

model: Model identifier for chat completion

42

max_tokens: Maximum tokens to generate in response

43

stop: List of stop sequences to end generation

44

temperature: Sampling temperature (0.0 to 2.0)

45

top_p: Nucleus sampling probability threshold

46

top_k: Top-k sampling parameter

47

repetition_penalty: Penalty for repeating tokens

48

presence_penalty: Penalty for token presence (-2.0 to 2.0)

49

frequency_penalty: Penalty for token frequency (-2.0 to 2.0)

50

min_p: Minimum percentage for token consideration (0.0 to 1.0)

51

logit_bias: Modify likelihood of specific tokens (-100 to 100)

52

seed: Seed for reproducible generation

53

stream: Enable streaming response chunks

54

logprobs: Number of log probabilities to return

55

echo: Include prompt in response with logprobs

56

n: Number of completion choices to generate

57

safety_model: Safety moderation model to apply

58

response_format: Output format specification

59

tools: List of tool definitions for function calling

60

tool_choice: Control tool selection behavior

61

62

Returns:

63

ChatCompletionResponse or Iterator[ChatCompletionChunk] when streaming

64

"""

65

```

66

67

### Multi-Modal Chat

68

69

Supports messages with text, images, and video content in conversational context.

70

71

```python { .api }

72

def create(

73

model: str,

74

messages: List[Dict[str, Union[str, List[dict]]]],

75

**kwargs

76

) -> ChatCompletionResponse:

77

"""

78

Create multi-modal chat completions with images and video.

79

80

Message content can be:

81

- String for text-only messages

82

- List of content objects for multi-modal messages

83

84

Content object types:

85

- {"type": "text", "text": str}

86

- {"type": "image_url", "image_url": {"url": str}}

87

- {"type": "video_url", "video_url": {"url": str}}

88

"""

89

```

90

91

### Streaming Chat

92

93

Real-time streaming of chat completion responses as they are generated.

94

95

```python { .api }

96

def create(

97

model: str,

98

messages: List[dict],

99

stream: bool = True,

100

**kwargs

101

) -> Iterator[ChatCompletionChunk]:

102

"""

103

Stream chat completion chunks in real-time.

104

105

Returns:

106

Iterator yielding ChatCompletionChunk objects

107

"""

108

```

109

110

### Async Chat Completion

111

112

Asynchronous chat completion operations for concurrent processing.

113

114

```python { .api }

115

async def create(

116

model: str,

117

messages: List[dict],

118

**kwargs

119

) -> ChatCompletionResponse:

120

"""

121

Asynchronously create chat completions.

122

123

Returns:

124

ChatCompletionResponse with generated content

125

"""

126

```

127

128

## Usage Examples

129

130

### Simple Text Chat

131

132

```python

133

from together import Together

134

135

client = Together()

136

137

response = client.chat.completions.create(

138

model="meta-llama/Llama-3.2-3B-Instruct-Turbo",

139

messages=[

140

{"role": "system", "content": "You are a helpful assistant."},

141

{"role": "user", "content": "Explain quantum computing in simple terms."}

142

],

143

max_tokens=300,

144

temperature=0.7

145

)

146

147

print(response.choices[0].message.content)

148

```

149

150

### Multi-Modal Chat with Image

151

152

```python

153

response = client.chat.completions.create(

154

model="meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo",

155

messages=[{

156

"role": "user",

157

"content": [

158

{

159

"type": "text",

160

"text": "What's in this image?"

161

},

162

{

163

"type": "image_url",

164

"image_url": {

165

"url": "https://example.com/image.jpg"

166

}

167

}

168

]

169

}],

170

max_tokens=200

171

)

172

173

print(response.choices[0].message.content)

174

```

175

176

### Video Analysis

177

178

```python

179

response = client.chat.completions.create(

180

model="Qwen/Qwen2.5-VL-72B-Instruct",

181

messages=[{

182

"role": "user",

183

"content": [

184

{

185

"type": "text",

186

"text": "Describe what happens in this video."

187

},

188

{

189

"type": "video_url",

190

"video_url": {

191

"url": "https://example.com/video.mp4"

192

}

193

}

194

]

195

}],

196

max_tokens=500

197

)

198

199

print(response.choices[0].message.content)

200

```

201

202

### Streaming Chat

203

204

```python

205

stream = client.chat.completions.create(

206

model="meta-llama/Llama-3.2-3B-Instruct-Turbo",

207

messages=[{"role": "user", "content": "Write a short story about AI"}],

208

stream=True,

209

max_tokens=500

210

)

211

212

for chunk in stream:

213

if chunk.choices[0].delta.content:

214

print(chunk.choices[0].delta.content, end="", flush=True)

215

```

216

217

### Async Chat Processing

218

219

```python

220

import asyncio

221

from together import AsyncTogether

222

223

async def process_chats():

224

client = AsyncTogether()

225

226

messages_list = [

227

[{"role": "user", "content": "Explain machine learning"}],

228

[{"role": "user", "content": "What is deep learning?"}],

229

[{"role": "user", "content": "How do neural networks work?"}]

230

]

231

232

tasks = [

233

client.chat.completions.create(

234

model="meta-llama/Llama-3.2-3B-Instruct-Turbo",

235

messages=messages,

236

max_tokens=200

237

)

238

for messages in messages_list

239

]

240

241

responses = await asyncio.gather(*tasks)

242

243

for i, response in enumerate(responses):

244

print(f"Response {i+1}: {response.choices[0].message.content}")

245

246

asyncio.run(process_chats())

247

```

248

249

### Logprobs Analysis

250

251

```python

252

response = client.chat.completions.create(

253

model="meta-llama/Llama-3.2-3B-Instruct-Turbo",

254

messages=[{"role": "user", "content": "The capital of France is"}],

255

logprobs=3,

256

max_tokens=10

257

)

258

259

logprobs_data = response.choices[0].logprobs

260

for token, logprob in zip(logprobs_data.tokens, logprobs_data.token_logprobs):

261

print(f"Token: '{token}', Log Probability: {logprob}")

262

```

263

264

## Types

265

266

### Request Types

267

268

```python { .api }

269

class ChatCompletionRequest:

270

model: str

271

messages: List[dict]

272

max_tokens: Optional[int] = None

273

temperature: Optional[float] = None

274

top_p: Optional[float] = None

275

top_k: Optional[int] = None

276

repetition_penalty: Optional[float] = None

277

stream: bool = False

278

logprobs: Optional[int] = None

279

echo: Optional[bool] = None

280

n: Optional[int] = None

281

presence_penalty: Optional[float] = None

282

frequency_penalty: Optional[float] = None

283

logit_bias: Optional[Dict[str, float]] = None

284

stop: Optional[Union[str, List[str]]] = None

285

safety_model: Optional[str] = None

286

```

287

288

### Response Types

289

290

```python { .api }

291

class ChatCompletionResponse:

292

id: str

293

object: str

294

created: int

295

model: str

296

choices: List[ChatChoice]

297

usage: Usage

298

299

class ChatChoice:

300

index: int

301

message: ChatMessage

302

finish_reason: Optional[str]

303

logprobs: Optional[Logprobs]

304

305

class ChatMessage:

306

role: str

307

content: Optional[str]

308

309

class Usage:

310

prompt_tokens: int

311

completion_tokens: int

312

total_tokens: int

313

314

class Logprobs:

315

tokens: List[str]

316

token_logprobs: List[Optional[float]]

317

top_logprobs: Optional[List[Dict[str, float]]]

318

```

319

320

### Streaming Types

321

322

```python { .api }

323

class ChatCompletionChunk:

324

id: str

325

object: str

326

created: int

327

model: str

328

choices: List[ChatChoiceDelta]

329

330

class ChatChoiceDelta:

331

index: int

332

delta: ChatDelta

333

finish_reason: Optional[str]

334

335

class ChatDelta:

336

role: Optional[str]

337

content: Optional[str]

338

```

339

340

### Message Content Types

341

342

```python { .api }

343

class TextContent:

344

type: Literal["text"]

345

text: str

346

347

class ImageContent:

348

type: Literal["image_url"]

349

image_url: ImageUrl

350

351

class VideoContent:

352

type: Literal["video_url"]

353

video_url: VideoUrl

354

355

class ImageUrl:

356

url: str

357

detail: Optional[Literal["low", "high", "auto"]] = None

358

359

class VideoUrl:

360

url: str

361

```