0
# Legacy Completions
1
2
Legacy text completion API for traditional completion-style interactions. Supports text generation with various parameters including temperature, top-p sampling, frequency penalties, and custom stop sequences. This API follows the traditional completion format where the model continues from a given prompt.
3
4
## Capabilities
5
6
### Text Completion Creation
7
8
Creates text completions using the traditional prompt-based format with extensive configuration options for controlling generation behavior.
9
10
```python { .api }
11
def create(
12
self,
13
*,
14
model: str,
15
best_of: Optional[int] = NOT_GIVEN,
16
echo: Optional[bool] = NOT_GIVEN,
17
frequency_penalty: Optional[float] = NOT_GIVEN,
18
logit_bias: Optional[Dict[str, int]] = NOT_GIVEN,
19
logprobs: Optional[int] = NOT_GIVEN,
20
max_tokens: Optional[int] = NOT_GIVEN,
21
n: Optional[int] = NOT_GIVEN,
22
presence_penalty: Optional[float] = NOT_GIVEN,
23
prompt: Union[str, List[str], List[int], List[List[int]], None] = NOT_GIVEN,
24
seed: Optional[int] = NOT_GIVEN,
25
stop: Union[Optional[str], List[str], None] = NOT_GIVEN,
26
stream: Optional[Literal[False]] | NotGiven = NOT_GIVEN,
27
stream_options: Optional[completion_create_params.StreamOptions] | NotGiven = NOT_GIVEN,
28
suffix: Optional[str] = NOT_GIVEN,
29
temperature: Optional[float] = NOT_GIVEN,
30
top_p: Optional[float] = NOT_GIVEN,
31
user: str | NotGiven = NOT_GIVEN,
32
grammar_root: Optional[str] = NOT_GIVEN,
33
return_raw_tokens: Optional[bool] = NOT_GIVEN,
34
**kwargs
35
) -> Completion:
36
"""
37
Create a text completion.
38
39
Parameters:
40
- model: ID of the model to use (e.g., "llama3.1-70b")
41
- best_of: Generate N completions server-side and return the best one
42
- echo: Echo back the prompt in addition to the completion
43
- frequency_penalty: Penalty for frequent token usage (-2.0 to 2.0)
44
- logit_bias: Modify likelihood of specific tokens appearing
45
- logprobs: Include log probabilities on most likely tokens (0-5)
46
- max_tokens: Maximum number of tokens to generate
47
- n: Number of completion choices to generate
48
- presence_penalty: Penalty for token presence (-2.0 to 2.0)
49
- prompt: Text prompt(s) to complete (string, list of strings, or token arrays)
50
- seed: Random seed for deterministic generation
51
- stop: Sequences where generation should stop
52
- stream: Enable streaming response (use stream=True for streaming)
53
- stream_options: Additional streaming options
54
- suffix: Text that comes after the completion (for insertion tasks)
55
- temperature: Sampling temperature (0.0 to 2.0)
56
- top_p: Nucleus sampling parameter (0.0 to 1.0)
57
- user: Unique identifier for the end-user
58
- grammar_root: Grammar rule for structured output generation
59
- return_raw_tokens: Return raw tokens instead of decoded text
60
61
Returns:
62
Completion object with generated text
63
"""
64
```
65
66
### Streaming Text Completion
67
68
Creates streaming text completions for real-time token generation.
69
70
```python { .api }
71
def create(
72
self,
73
*,
74
model: str,
75
prompt: Union[str, List[str], List[int], List[List[int]], None],
76
stream: Literal[True],
77
**kwargs
78
) -> Stream[CompletionChunk]:
79
"""
80
Create a streaming text completion.
81
82
Parameters:
83
- stream: Must be True for streaming responses
84
- All other parameters same as non-streaming create()
85
86
Returns:
87
Stream object yielding CompletionChunk objects
88
"""
89
```
90
91
### Resource Classes
92
93
Synchronous and asynchronous resource classes that provide the completions API methods.
94
95
```python { .api }
96
class CompletionsResource(SyncAPIResource):
97
"""Synchronous completions resource."""
98
99
@cached_property
100
def with_raw_response(self) -> CompletionsResourceWithRawResponse: ...
101
102
@cached_property
103
def with_streaming_response(self) -> CompletionsResourceWithStreamingResponse: ...
104
105
class AsyncCompletionsResource(AsyncAPIResource):
106
"""Asynchronous completions resource."""
107
108
@cached_property
109
def with_raw_response(self) -> AsyncCompletionsResourceWithRawResponse: ...
110
111
@cached_property
112
def with_streaming_response(self) -> AsyncCompletionsResourceWithStreamingResponse: ...
113
```
114
115
## Parameter Types
116
117
### Completion Parameters
118
119
```python { .api }
120
class CompletionCreateParams(TypedDict, total=False):
121
"""Parameters for creating text completions."""
122
model: Required[str]
123
124
best_of: Optional[int]
125
echo: Optional[bool]
126
frequency_penalty: Optional[float]
127
logit_bias: Optional[Dict[str, int]]
128
logprobs: Optional[int]
129
max_tokens: Optional[int]
130
n: Optional[int]
131
presence_penalty: Optional[float]
132
prompt: Union[str, List[str], List[int], List[List[int]], None]
133
seed: Optional[int]
134
stop: Union[Optional[str], List[str], None]
135
stream: Optional[bool]
136
stream_options: Optional[StreamOptions]
137
suffix: Optional[str]
138
temperature: Optional[float]
139
top_p: Optional[float]
140
user: Optional[str]
141
142
class StreamOptions(TypedDict, total=False):
143
"""Options for streaming completions."""
144
include_usage: Optional[bool]
145
```
146
147
## Response Types
148
149
### Completion Response
150
151
```python { .api }
152
class Completion(BaseModel):
153
"""Complete text completion response."""
154
id: str
155
choices: List[CompletionChoice]
156
created: int
157
model: str
158
object: Literal["text_completion"]
159
system_fingerprint: Optional[str]
160
usage: Optional[CompletionUsage]
161
162
class CompletionChoice(BaseModel):
163
"""Individual completion choice."""
164
finish_reason: Optional[Literal["stop", "length", "content_filter"]]
165
index: int
166
logprobs: Optional[CompletionLogprobs]
167
text: str
168
169
class CompletionUsage(BaseModel):
170
"""Token usage information."""
171
completion_tokens: int
172
prompt_tokens: int
173
total_tokens: int
174
175
class CompletionLogprobs(BaseModel):
176
"""Log probability information."""
177
text_offset: List[int]
178
token_logprobs: List[Optional[float]]
179
tokens: List[str]
180
top_logprobs: Optional[List[Dict[str, float]]]
181
```
182
183
### Streaming Response Types
184
185
```python { .api }
186
class CompletionChunk(BaseModel):
187
"""Streaming chunk in text completion."""
188
id: str
189
choices: List[CompletionChunkChoice]
190
created: int
191
model: str
192
object: Literal["text_completion"]
193
system_fingerprint: Optional[str]
194
usage: Optional[CompletionUsage]
195
196
class CompletionChunkChoice(BaseModel):
197
"""Choice in streaming chunk."""
198
finish_reason: Optional[Literal["stop", "length", "content_filter"]]
199
index: int
200
logprobs: Optional[CompletionLogprobs]
201
text: str
202
```
203
204
## Usage Examples
205
206
### Basic Text Completion
207
208
```python
209
from cerebras.cloud.sdk import Cerebras
210
211
client = Cerebras()
212
213
response = client.completions.create(
214
model="llama3.1-70b",
215
prompt="The future of artificial intelligence is",
216
max_tokens=100,
217
temperature=0.7,
218
stop=["\n", "."]
219
)
220
221
print(response.choices[0].text)
222
print(f"Used {response.usage.total_tokens} tokens")
223
```
224
225
### Text Completion with Multiple Choices
226
227
```python
228
from cerebras.cloud.sdk import Cerebras
229
230
client = Cerebras()
231
232
response = client.completions.create(
233
model="llama3.1-70b",
234
prompt="Complete this sentence: The most important skill in programming is",
235
max_tokens=50,
236
n=3, # Generate 3 different completions
237
temperature=0.8
238
)
239
240
for i, choice in enumerate(response.choices):
241
print(f"Option {i+1}: {choice.text.strip()}")
242
```
243
244
### Streaming Text Completion
245
246
```python
247
from cerebras.cloud.sdk import Cerebras
248
249
client = Cerebras()
250
251
stream = client.completions.create(
252
model="llama3.1-70b",
253
prompt="Write a short poem about machine learning:",
254
max_tokens=200,
255
stream=True,
256
temperature=0.8
257
)
258
259
print("Poem:", end="")
260
for chunk in stream:
261
if chunk.choices[0].text:
262
print(chunk.choices[0].text, end="", flush=True)
263
print()
264
```
265
266
### Text Completion with Log Probabilities
267
268
```python
269
from cerebras.cloud.sdk import Cerebras
270
271
client = Cerebras()
272
273
response = client.completions.create(
274
model="llama3.1-70b",
275
prompt="The capital of France is",
276
max_tokens=10,
277
logprobs=5, # Return top 5 log probabilities
278
temperature=0.1
279
)
280
281
choice = response.choices[0]
282
print(f"Generated text: {choice.text}")
283
284
if choice.logprobs:
285
print("\nToken probabilities:")
286
for token, logprob in zip(choice.logprobs.tokens, choice.logprobs.token_logprobs):
287
if logprob is not None:
288
probability = round(100 * (2.71828 ** logprob), 2)
289
print(f" '{token}': {probability}%")
290
```
291
292
### Text Insertion (with Suffix)
293
294
```python
295
from cerebras.cloud.sdk import Cerebras
296
297
client = Cerebras()
298
299
# Complete text in the middle of a sentence
300
response = client.completions.create(
301
model="llama3.1-70b",
302
prompt="def fibonacci(n):\n ",
303
suffix="\n return result",
304
max_tokens=100,
305
temperature=0.3
306
)
307
308
print("Generated code:")
309
print(response.choices[0].text)
310
```
311
312
### Best-of Sampling
313
314
```python
315
from cerebras.cloud.sdk import Cerebras
316
317
client = Cerebras()
318
319
response = client.completions.create(
320
model="llama3.1-70b",
321
prompt="Explain quantum computing in simple terms:",
322
max_tokens=150,
323
best_of=5, # Generate 5 completions, return the best one
324
n=1, # Return only the best completion
325
temperature=0.8
326
)
327
328
print("Best completion:")
329
print(response.choices[0].text)
330
```
331
332
### Async Text Completion
333
334
```python
335
import asyncio
336
from cerebras.cloud.sdk import AsyncCerebras
337
338
async def complete_text():
339
client = AsyncCerebras()
340
341
response = await client.completions.create(
342
model="llama3.1-70b",
343
prompt="The benefits of renewable energy include",
344
max_tokens=100,
345
temperature=0.6
346
)
347
348
print(response.choices[0].text)
349
await client.aclose()
350
351
asyncio.run(complete_text())
352
```
353
354
### Batch Completions
355
356
```python
357
from cerebras.cloud.sdk import Cerebras
358
359
client = Cerebras()
360
361
prompts = [
362
"The advantages of solar power are",
363
"Wind energy is beneficial because",
364
"Hydroelectric power works by"
365
]
366
367
response = client.completions.create(
368
model="llama3.1-70b",
369
prompt=prompts, # Multiple prompts
370
max_tokens=50,
371
temperature=0.5
372
)
373
374
for i, choice in enumerate(response.choices):
375
print(f"Prompt {i+1} completion: {choice.text.strip()}")
376
```
377
378
### Frequency and Presence Penalties
379
380
```python
381
from cerebras.cloud.sdk import Cerebras
382
383
client = Cerebras()
384
385
response = client.completions.create(
386
model="llama3.1-70b",
387
prompt="List the planets in our solar system:",
388
max_tokens=100,
389
frequency_penalty=0.5, # Reduce repetition
390
presence_penalty=0.3, # Encourage new topics
391
temperature=0.7
392
)
393
394
print(response.choices[0].text)
395
```
396
397
### Stop Sequences
398
399
```python
400
from cerebras.cloud.sdk import Cerebras
401
402
client = Cerebras()
403
404
response = client.completions.create(
405
model="llama3.1-70b",
406
prompt="Q: What is photosynthesis?\nA:",
407
max_tokens=200,
408
stop=["Q:", "\n\n"], # Stop at next question or double newline
409
temperature=0.5
410
)
411
412
print(f"Answer: {response.choices[0].text.strip()}")
413
```