or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

chat-completions.mdclient-management.mdindex.mdlegacy-completions.mdmodels.mdtypes-and-configuration.md

legacy-completions.mddocs/

0

# Legacy Completions

1

2

Legacy text completion API for traditional completion-style interactions. Supports text generation with various parameters including temperature, top-p sampling, frequency penalties, and custom stop sequences. This API follows the traditional completion format where the model continues from a given prompt.

3

4

## Capabilities

5

6

### Text Completion Creation

7

8

Creates text completions using the traditional prompt-based format with extensive configuration options for controlling generation behavior.

9

10

```python { .api }

11

def create(

12

self,

13

*,

14

model: str,

15

best_of: Optional[int] = NOT_GIVEN,

16

echo: Optional[bool] = NOT_GIVEN,

17

frequency_penalty: Optional[float] = NOT_GIVEN,

18

logit_bias: Optional[Dict[str, int]] = NOT_GIVEN,

19

logprobs: Optional[int] = NOT_GIVEN,

20

max_tokens: Optional[int] = NOT_GIVEN,

21

n: Optional[int] = NOT_GIVEN,

22

presence_penalty: Optional[float] = NOT_GIVEN,

23

prompt: Union[str, List[str], List[int], List[List[int]], None] = NOT_GIVEN,

24

seed: Optional[int] = NOT_GIVEN,

25

stop: Union[Optional[str], List[str], None] = NOT_GIVEN,

26

stream: Optional[Literal[False]] | NotGiven = NOT_GIVEN,

27

stream_options: Optional[completion_create_params.StreamOptions] | NotGiven = NOT_GIVEN,

28

suffix: Optional[str] = NOT_GIVEN,

29

temperature: Optional[float] = NOT_GIVEN,

30

top_p: Optional[float] = NOT_GIVEN,

31

user: str | NotGiven = NOT_GIVEN,

32

grammar_root: Optional[str] = NOT_GIVEN,

33

return_raw_tokens: Optional[bool] = NOT_GIVEN,

34

**kwargs

35

) -> Completion:

36

"""

37

Create a text completion.

38

39

Parameters:

40

- model: ID of the model to use (e.g., "llama3.1-70b")

41

- best_of: Generate N completions server-side and return the best one

42

- echo: Echo back the prompt in addition to the completion

43

- frequency_penalty: Penalty for frequent token usage (-2.0 to 2.0)

44

- logit_bias: Modify likelihood of specific tokens appearing

45

- logprobs: Include log probabilities on most likely tokens (0-5)

46

- max_tokens: Maximum number of tokens to generate

47

- n: Number of completion choices to generate

48

- presence_penalty: Penalty for token presence (-2.0 to 2.0)

49

- prompt: Text prompt(s) to complete (string, list of strings, or token arrays)

50

- seed: Random seed for deterministic generation

51

- stop: Sequences where generation should stop

52

- stream: Enable streaming response (use stream=True for streaming)

53

- stream_options: Additional streaming options

54

- suffix: Text that comes after the completion (for insertion tasks)

55

- temperature: Sampling temperature (0.0 to 2.0)

56

- top_p: Nucleus sampling parameter (0.0 to 1.0)

57

- user: Unique identifier for the end-user

58

- grammar_root: Grammar rule for structured output generation

59

- return_raw_tokens: Return raw tokens instead of decoded text

60

61

Returns:

62

Completion object with generated text

63

"""

64

```

65

66

### Streaming Text Completion

67

68

Creates streaming text completions for real-time token generation.

69

70

```python { .api }

71

def create(

72

self,

73

*,

74

model: str,

75

prompt: Union[str, List[str], List[int], List[List[int]], None],

76

stream: Literal[True],

77

**kwargs

78

) -> Stream[CompletionChunk]:

79

"""

80

Create a streaming text completion.

81

82

Parameters:

83

- stream: Must be True for streaming responses

84

- All other parameters same as non-streaming create()

85

86

Returns:

87

Stream object yielding CompletionChunk objects

88

"""

89

```

90

91

### Resource Classes

92

93

Synchronous and asynchronous resource classes that provide the completions API methods.

94

95

```python { .api }

96

class CompletionsResource(SyncAPIResource):

97

"""Synchronous completions resource."""

98

99

@cached_property

100

def with_raw_response(self) -> CompletionsResourceWithRawResponse: ...

101

102

@cached_property

103

def with_streaming_response(self) -> CompletionsResourceWithStreamingResponse: ...

104

105

class AsyncCompletionsResource(AsyncAPIResource):

106

"""Asynchronous completions resource."""

107

108

@cached_property

109

def with_raw_response(self) -> AsyncCompletionsResourceWithRawResponse: ...

110

111

@cached_property

112

def with_streaming_response(self) -> AsyncCompletionsResourceWithStreamingResponse: ...

113

```

114

115

## Parameter Types

116

117

### Completion Parameters

118

119

```python { .api }

120

class CompletionCreateParams(TypedDict, total=False):

121

"""Parameters for creating text completions."""

122

model: Required[str]

123

124

best_of: Optional[int]

125

echo: Optional[bool]

126

frequency_penalty: Optional[float]

127

logit_bias: Optional[Dict[str, int]]

128

logprobs: Optional[int]

129

max_tokens: Optional[int]

130

n: Optional[int]

131

presence_penalty: Optional[float]

132

prompt: Union[str, List[str], List[int], List[List[int]], None]

133

seed: Optional[int]

134

stop: Union[Optional[str], List[str], None]

135

stream: Optional[bool]

136

stream_options: Optional[StreamOptions]

137

suffix: Optional[str]

138

temperature: Optional[float]

139

top_p: Optional[float]

140

user: Optional[str]

141

142

class StreamOptions(TypedDict, total=False):

143

"""Options for streaming completions."""

144

include_usage: Optional[bool]

145

```

146

147

## Response Types

148

149

### Completion Response

150

151

```python { .api }

152

class Completion(BaseModel):

153

"""Complete text completion response."""

154

id: str

155

choices: List[CompletionChoice]

156

created: int

157

model: str

158

object: Literal["text_completion"]

159

system_fingerprint: Optional[str]

160

usage: Optional[CompletionUsage]

161

162

class CompletionChoice(BaseModel):

163

"""Individual completion choice."""

164

finish_reason: Optional[Literal["stop", "length", "content_filter"]]

165

index: int

166

logprobs: Optional[CompletionLogprobs]

167

text: str

168

169

class CompletionUsage(BaseModel):

170

"""Token usage information."""

171

completion_tokens: int

172

prompt_tokens: int

173

total_tokens: int

174

175

class CompletionLogprobs(BaseModel):

176

"""Log probability information."""

177

text_offset: List[int]

178

token_logprobs: List[Optional[float]]

179

tokens: List[str]

180

top_logprobs: Optional[List[Dict[str, float]]]

181

```

182

183

### Streaming Response Types

184

185

```python { .api }

186

class CompletionChunk(BaseModel):

187

"""Streaming chunk in text completion."""

188

id: str

189

choices: List[CompletionChunkChoice]

190

created: int

191

model: str

192

object: Literal["text_completion"]

193

system_fingerprint: Optional[str]

194

usage: Optional[CompletionUsage]

195

196

class CompletionChunkChoice(BaseModel):

197

"""Choice in streaming chunk."""

198

finish_reason: Optional[Literal["stop", "length", "content_filter"]]

199

index: int

200

logprobs: Optional[CompletionLogprobs]

201

text: str

202

```

203

204

## Usage Examples

205

206

### Basic Text Completion

207

208

```python

209

from cerebras.cloud.sdk import Cerebras

210

211

client = Cerebras()

212

213

response = client.completions.create(

214

model="llama3.1-70b",

215

prompt="The future of artificial intelligence is",

216

max_tokens=100,

217

temperature=0.7,

218

stop=["\n", "."]

219

)

220

221

print(response.choices[0].text)

222

print(f"Used {response.usage.total_tokens} tokens")

223

```

224

225

### Text Completion with Multiple Choices

226

227

```python

228

from cerebras.cloud.sdk import Cerebras

229

230

client = Cerebras()

231

232

response = client.completions.create(

233

model="llama3.1-70b",

234

prompt="Complete this sentence: The most important skill in programming is",

235

max_tokens=50,

236

n=3, # Generate 3 different completions

237

temperature=0.8

238

)

239

240

for i, choice in enumerate(response.choices):

241

print(f"Option {i+1}: {choice.text.strip()}")

242

```

243

244

### Streaming Text Completion

245

246

```python

247

from cerebras.cloud.sdk import Cerebras

248

249

client = Cerebras()

250

251

stream = client.completions.create(

252

model="llama3.1-70b",

253

prompt="Write a short poem about machine learning:",

254

max_tokens=200,

255

stream=True,

256

temperature=0.8

257

)

258

259

print("Poem:", end="")

260

for chunk in stream:

261

if chunk.choices[0].text:

262

print(chunk.choices[0].text, end="", flush=True)

263

print()

264

```

265

266

### Text Completion with Log Probabilities

267

268

```python

269

from cerebras.cloud.sdk import Cerebras

270

271

client = Cerebras()

272

273

response = client.completions.create(

274

model="llama3.1-70b",

275

prompt="The capital of France is",

276

max_tokens=10,

277

logprobs=5, # Return top 5 log probabilities

278

temperature=0.1

279

)

280

281

choice = response.choices[0]

282

print(f"Generated text: {choice.text}")

283

284

if choice.logprobs:

285

print("\nToken probabilities:")

286

for token, logprob in zip(choice.logprobs.tokens, choice.logprobs.token_logprobs):

287

if logprob is not None:

288

probability = round(100 * (2.71828 ** logprob), 2)

289

print(f" '{token}': {probability}%")

290

```

291

292

### Text Insertion (with Suffix)

293

294

```python

295

from cerebras.cloud.sdk import Cerebras

296

297

client = Cerebras()

298

299

# Complete text in the middle of a sentence

300

response = client.completions.create(

301

model="llama3.1-70b",

302

prompt="def fibonacci(n):\n ",

303

suffix="\n return result",

304

max_tokens=100,

305

temperature=0.3

306

)

307

308

print("Generated code:")

309

print(response.choices[0].text)

310

```

311

312

### Best-of Sampling

313

314

```python

315

from cerebras.cloud.sdk import Cerebras

316

317

client = Cerebras()

318

319

response = client.completions.create(

320

model="llama3.1-70b",

321

prompt="Explain quantum computing in simple terms:",

322

max_tokens=150,

323

best_of=5, # Generate 5 completions, return the best one

324

n=1, # Return only the best completion

325

temperature=0.8

326

)

327

328

print("Best completion:")

329

print(response.choices[0].text)

330

```

331

332

### Async Text Completion

333

334

```python

335

import asyncio

336

from cerebras.cloud.sdk import AsyncCerebras

337

338

async def complete_text():

339

client = AsyncCerebras()

340

341

response = await client.completions.create(

342

model="llama3.1-70b",

343

prompt="The benefits of renewable energy include",

344

max_tokens=100,

345

temperature=0.6

346

)

347

348

print(response.choices[0].text)

349

await client.aclose()

350

351

asyncio.run(complete_text())

352

```

353

354

### Batch Completions

355

356

```python

357

from cerebras.cloud.sdk import Cerebras

358

359

client = Cerebras()

360

361

prompts = [

362

"The advantages of solar power are",

363

"Wind energy is beneficial because",

364

"Hydroelectric power works by"

365

]

366

367

response = client.completions.create(

368

model="llama3.1-70b",

369

prompt=prompts, # Multiple prompts

370

max_tokens=50,

371

temperature=0.5

372

)

373

374

for i, choice in enumerate(response.choices):

375

print(f"Prompt {i+1} completion: {choice.text.strip()}")

376

```

377

378

### Frequency and Presence Penalties

379

380

```python

381

from cerebras.cloud.sdk import Cerebras

382

383

client = Cerebras()

384

385

response = client.completions.create(

386

model="llama3.1-70b",

387

prompt="List the planets in our solar system:",

388

max_tokens=100,

389

frequency_penalty=0.5, # Reduce repetition

390

presence_penalty=0.3, # Encourage new topics

391

temperature=0.7

392

)

393

394

print(response.choices[0].text)

395

```

396

397

### Stop Sequences

398

399

```python

400

from cerebras.cloud.sdk import Cerebras

401

402

client = Cerebras()

403

404

response = client.completions.create(

405

model="llama3.1-70b",

406

prompt="Q: What is photosynthesis?\nA:",

407

max_tokens=200,

408

stop=["Q:", "\n\n"], # Stop at next question or double newline

409

temperature=0.5

410

)

411

412

print(f"Answer: {response.choices[0].text.strip()}")

413

```