or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

assistants.mdaudio.mdbatches.mdchat-completions.mdchatkit.mdclient-initialization.mdcompletions.mdcontainers.mdconversations.mdembeddings.mdevals.mdfiles.mdfine-tuning.mdimages.mdindex.mdmodels.mdmoderations.mdrealtime.mdresponses.mdruns.mdthreads-messages.mduploads.mdvector-stores.mdvideos.mdwebhooks.md
KNOWN_ISSUES.md

completions.mddocs/

0

# Text Completions

1

2

Generate text completions using legacy completion models. This API is superseded by Chat Completions for most use cases, but remains available for compatibility with specific completion-optimized models.

3

4

## Capabilities

5

6

### Create Completion

7

8

Generate text completion for a given prompt.

9

10

```python { .api }

11

def create(

12

self,

13

*,

14

model: str,

15

prompt: str | list[str] | list[int] | list[list[int]],

16

best_of: int | Omit = omit,

17

echo: bool | Omit = omit,

18

frequency_penalty: float | Omit = omit,

19

logit_bias: dict[str, int] | Omit = omit,

20

logprobs: int | Omit = omit,

21

max_tokens: int | Omit = omit,

22

n: int | Omit = omit,

23

presence_penalty: float | Omit = omit,

24

seed: int | Omit = omit,

25

stop: str | list[str] | Omit = omit,

26

stream: bool | Omit = omit,

27

stream_options: dict | None | Omit = omit,

28

suffix: str | Omit = omit,

29

temperature: float | Omit = omit,

30

top_p: float | Omit = omit,

31

user: str | Omit = omit,

32

extra_headers: dict[str, str] | None = None,

33

extra_query: dict[str, object] | None = None,

34

extra_body: dict[str, object] | None = None,

35

timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,

36

) -> Completion | Stream[Completion]:

37

"""

38

Generate text completion for a prompt.

39

40

NOTE: Most use cases are better served by Chat Completions API.

41

This endpoint is primarily for completion-specific models like gpt-3.5-turbo-instruct.

42

43

Args:

44

model: Model ID. Supported models:

45

- "gpt-3.5-turbo-instruct": Instruction-tuned model optimized for completions

46

- "davinci-002": Legacy Davinci model

47

- "babbage-002": Legacy Babbage model

48

49

prompt: Text prompt(s) to complete. Can be:

50

- Single string: "Once upon a time"

51

- List of strings: ["Story 1", "Story 2"]

52

- Token array: [123, 456, 789]

53

- List of token arrays: [[123, 456], [789, 012]]

54

55

best_of: Generates best_of completions server-side, returns best one.

56

Must be greater than n. Used with temperature for better quality.

57

Cannot be used with stream.

58

59

echo: If true, echoes the prompt in addition to completion.

60

61

frequency_penalty: Number between -2.0 and 2.0. Penalizes tokens based on

62

their frequency in the text so far. Default 0.

63

64

logit_bias: Modify token probabilities. Maps token IDs to bias values

65

from -100 to 100.

66

67

logprobs: Include log probabilities of the most likely tokens.

68

Returns logprobs number of most likely tokens per position.

69

Maximum value is 5.

70

71

max_tokens: Maximum tokens to generate. Default 16.

72

73

n: Number of completions to generate. Default 1.

74

75

presence_penalty: Number between -2.0 and 2.0. Penalizes tokens based on

76

whether they appear in the text so far. Default 0.

77

78

seed: For deterministic sampling (Beta). Same seed + parameters should

79

return same result. Not guaranteed.

80

81

stop: Up to 4 sequences where generation stops. Can be string or list.

82

83

stream: If true, returns SSE stream of partial completions.

84

85

stream_options: Streaming configuration. Accepts dict with:

86

- "include_usage": bool - If true, includes token usage in final chunk

87

- "include_obfuscation": bool - If true (default), adds random characters

88

to obfuscation field on streaming delta events to normalize payload sizes

89

as mitigation for side-channel attacks. Set to false to optimize bandwidth.

90

91

suffix: Text that comes after the completion. Useful for inserting text.

92

93

temperature: Sampling temperature between 0 and 2. Higher values make

94

output more random. Default 1. Alter this or top_p, not both.

95

96

top_p: Nucleus sampling parameter between 0 and 1. Default 1.

97

Alter this or temperature, not both.

98

99

user: Unique end-user identifier for abuse monitoring.

100

101

extra_headers: Additional HTTP headers.

102

extra_query: Additional query parameters.

103

extra_body: Additional JSON fields.

104

timeout: Request timeout in seconds.

105

106

Returns:

107

Completion: If stream=False (default), returns complete response.

108

Stream[Completion]: If stream=True, returns streaming response.

109

110

Raises:

111

BadRequestError: Invalid parameters

112

AuthenticationError: Invalid API key

113

RateLimitError: Rate limit exceeded

114

"""

115

```

116

117

Usage examples:

118

119

```python

120

from openai import OpenAI

121

122

client = OpenAI()

123

124

# Basic completion

125

response = client.completions.create(

126

model="gpt-3.5-turbo-instruct",

127

prompt="Once upon a time",

128

max_tokens=50

129

)

130

131

print(response.choices[0].text)

132

133

# Multiple prompts

134

response = client.completions.create(

135

model="gpt-3.5-turbo-instruct",

136

prompt=[

137

"The capital of France is",

138

"The largest ocean on Earth is"

139

],

140

max_tokens=10

141

)

142

143

for i, choice in enumerate(response.choices):

144

print(f"Completion {i + 1}: {choice.text}")

145

146

# With suffix for text insertion

147

response = client.completions.create(

148

model="gpt-3.5-turbo-instruct",

149

prompt="def calculate_sum(a, b):\n \"\"\"",

150

suffix="\n return a + b",

151

max_tokens=50

152

)

153

154

print(response.choices[0].text)

155

156

# With logprobs

157

response = client.completions.create(

158

model="gpt-3.5-turbo-instruct",

159

prompt="The weather today is",

160

max_tokens=5,

161

logprobs=2,

162

echo=True

163

)

164

165

# Access log probabilities

166

for token, logprob in zip(response.choices[0].logprobs.tokens,

167

response.choices[0].logprobs.token_logprobs):

168

print(f"Token: {token}, Log Prob: {logprob}")

169

170

# Streaming completion

171

stream = client.completions.create(

172

model="gpt-3.5-turbo-instruct",

173

prompt="Write a short poem about coding:",

174

max_tokens=100,

175

stream=True

176

)

177

178

for chunk in stream:

179

if chunk.choices[0].text:

180

print(chunk.choices[0].text, end="", flush=True)

181

182

# With best_of for higher quality

183

response = client.completions.create(

184

model="gpt-3.5-turbo-instruct",

185

prompt="Explain quantum computing in simple terms:",

186

max_tokens=100,

187

best_of=3,

188

n=1,

189

temperature=0.8

190

)

191

192

# With stop sequences

193

response = client.completions.create(

194

model="gpt-3.5-turbo-instruct",

195

prompt="List three colors:\n1.",

196

max_tokens=50,

197

stop=["\n4.", "\n\n"]

198

)

199

200

# Deterministic completion with seed

201

response = client.completions.create(

202

model="gpt-3.5-turbo-instruct",

203

prompt="Generate a random number:",

204

seed=42,

205

max_tokens=10

206

)

207

```

208

209

## Types

210

211

```python { .api }

212

from typing import Literal

213

from pydantic import BaseModel

214

215

class Completion(BaseModel):

216

"""Completion response."""

217

id: str

218

choices: list[CompletionChoice]

219

created: int

220

model: str

221

object: Literal["text_completion"]

222

system_fingerprint: str | None

223

usage: CompletionUsage | None

224

225

class CompletionChoice(BaseModel):

226

"""Single completion choice."""

227

finish_reason: Literal["stop", "length", "content_filter"]

228

index: int

229

logprobs: Logprobs | None

230

text: str

231

232

class Logprobs(BaseModel):

233

"""Log probability information."""

234

text_offset: list[int]

235

token_logprobs: list[float | None]

236

tokens: list[str]

237

top_logprobs: list[dict[str, float]] | None

238

239

class CompletionUsage(BaseModel):

240

"""Token usage statistics."""

241

completion_tokens: int

242

prompt_tokens: int

243

total_tokens: int

244

245

# Stream wrapper

246

class Stream(Generic[T]):

247

def __iter__(self) -> Iterator[T]: ...

248

def __next__(self) -> T: ...

249

def __enter__(self) -> Stream[T]: ...

250

def __exit__(self, *args) -> None: ...

251

```

252

253

## Migration to Chat Completions

254

255

For most use cases, Chat Completions API is recommended:

256

257

```python

258

# Legacy Completions

259

response = client.completions.create(

260

model="gpt-3.5-turbo-instruct",

261

prompt="Translate to French: Hello"

262

)

263

264

# Equivalent Chat Completion

265

response = client.chat.completions.create(

266

model="gpt-3.5-turbo",

267

messages=[

268

{"role": "user", "content": "Translate to French: Hello"}

269

]

270

)

271

```

272

273

## Async Usage

274

275

```python

276

import asyncio

277

from openai import AsyncOpenAI

278

279

async def complete_text():

280

client = AsyncOpenAI()

281

282

response = await client.completions.create(

283

model="gpt-3.5-turbo-instruct",

284

prompt="Once upon a time",

285

max_tokens=50

286

)

287

288

return response.choices[0].text

289

290

text = asyncio.run(complete_text())

291

```

292