The official Python library for the cerebras API
npx @tessl/cli install tessl/pypi-cerebras-cloud-sdk@1.50.00
# Cerebras Cloud SDK
1
2
The official Python library for the Cerebras Cloud API, providing access to Cerebras' Wafer-Scale Engine-3 (WSE-3) powered AI inference capabilities. The SDK offers both synchronous and asynchronous clients with comprehensive type definitions, streaming support, and built-in retry mechanisms for high-throughput AI inference workloads.
3
4
## Package Information
5
6
- **Package Name**: cerebras_cloud_sdk
7
- **Language**: Python
8
- **Installation**: `pip install cerebras_cloud_sdk`
9
- **Python Requirements**: Python 3.8+
10
11
## Core Imports
12
13
```python
14
import cerebras.cloud.sdk as cerebras
15
```
16
17
Most common imports:
18
19
```python
20
from cerebras.cloud.sdk import Cerebras, AsyncCerebras
21
```
22
23
For type annotations:
24
25
```python
26
from cerebras.cloud.sdk.types.chat import ChatCompletion, CompletionCreateParams
27
from cerebras.cloud.sdk import types
28
```
29
30
Complete import options:
31
32
```python
33
# Main client classes
34
from cerebras.cloud.sdk import Cerebras, AsyncCerebras, Client, AsyncClient
35
36
# Core types and utilities
37
from cerebras.cloud.sdk import BaseModel, NOT_GIVEN, NotGiven, Omit, NoneType
38
from cerebras.cloud.sdk import Timeout, RequestOptions, Transport, ProxiesTypes
39
40
# Streaming classes
41
from cerebras.cloud.sdk import Stream, AsyncStream
42
43
# Response wrappers
44
from cerebras.cloud.sdk import APIResponse, AsyncAPIResponse
45
46
# Exception handling
47
from cerebras.cloud.sdk import (
48
CerebrasError, APIError, APIStatusError, APITimeoutError,
49
APIConnectionError, APIResponseValidationError,
50
BadRequestError, AuthenticationError, PermissionDeniedError,
51
NotFoundError, ConflictError, UnprocessableEntityError,
52
RateLimitError, InternalServerError
53
)
54
55
# Configuration constants
56
from cerebras.cloud.sdk import DEFAULT_TIMEOUT, DEFAULT_MAX_RETRIES, DEFAULT_CONNECTION_LIMITS
57
58
# HTTP clients
59
from cerebras.cloud.sdk import DefaultHttpxClient, DefaultAsyncHttpxClient, DefaultAioHttpClient
60
61
# Utility functions
62
from cerebras.cloud.sdk import file_from_path
63
64
# Direct resources access (alternative)
65
from cerebras.cloud.sdk import resources
66
```
67
68
## Basic Usage
69
70
```python
71
import os
72
from cerebras.cloud.sdk import Cerebras
73
74
# Initialize client (API key from CEREBRAS_API_KEY env var)
75
client = Cerebras(api_key=os.getenv("CEREBRAS_API_KEY"))
76
77
# Simple chat completion
78
response = client.chat.completions.create(
79
model="llama3.1-70b",
80
messages=[
81
{"role": "user", "content": "What is machine learning?"}
82
],
83
max_tokens=100
84
)
85
86
print(response.choices[0].message.content)
87
88
# Async usage
89
import asyncio
90
from cerebras.cloud.sdk import AsyncCerebras
91
92
async def main():
93
client = AsyncCerebras()
94
response = await client.chat.completions.create(
95
model="llama3.1-70b",
96
messages=[{"role": "user", "content": "Hello!"}],
97
max_tokens=50
98
)
99
print(response.choices[0].message.content)
100
101
asyncio.run(main())
102
```
103
104
## Architecture
105
106
The SDK follows a resource-based architecture:
107
108
- **Client Classes**: `Cerebras` (sync) and `AsyncCerebras` (async) as main entry points
109
- **Resource Objects**: Organized API endpoints (`chat`, `completions`, `models`)
110
- **Type System**: Comprehensive Pydantic models for requests and responses
111
- **Streaming Support**: Real-time response handling with `Stream` and `AsyncStream`
112
- **Error Handling**: Complete HTTP status code exception hierarchy
113
- **Response Wrappers**: Raw response and streaming response access patterns
114
115
This design enables both simple usage patterns and advanced customization while maintaining full type safety and async/await compatibility.
116
117
## Capabilities
118
119
### Client Management
120
121
Client initialization, configuration, and authentication for both synchronous and asynchronous usage patterns. Supports environment variable configuration, custom timeouts, retry policies, and HTTP client customization.
122
123
```python { .api }
124
class Cerebras:
125
def __init__(
126
self,
127
*,
128
api_key: str | None = None,
129
base_url: str | httpx.URL | None = None,
130
timeout: Union[float, Timeout, None, NotGiven] = NOT_GIVEN,
131
max_retries: int = DEFAULT_MAX_RETRIES,
132
default_headers: Mapping[str, str] | None = None,
133
default_query: Mapping[str, object] | None = None,
134
http_client: httpx.Client | None = None,
135
_strict_response_validation: bool = False,
136
warm_tcp_connection: bool = True,
137
) -> None: ...
138
139
class AsyncCerebras:
140
def __init__(
141
self,
142
*,
143
api_key: str | None = None,
144
base_url: str | httpx.URL | None = None,
145
timeout: Union[float, Timeout, None, NotGiven] = NOT_GIVEN,
146
max_retries: int = DEFAULT_MAX_RETRIES,
147
default_headers: Mapping[str, str] | None = None,
148
default_query: Mapping[str, object] | None = None,
149
http_client: httpx.AsyncClient | None = None,
150
_strict_response_validation: bool = False,
151
warm_tcp_connection: bool = True,
152
) -> None: ...
153
```
154
155
[Client Management](./client-management.md)
156
157
### Chat Completions
158
159
Modern chat completion API for conversational AI applications. Supports system messages, user messages, assistant messages, streaming responses, function calling, and comprehensive response metadata including token usage and timing information.
160
161
```python { .api }
162
def create(
163
self,
164
*,
165
messages: Iterable[completion_create_params.Message],
166
model: str,
167
max_completion_tokens: Optional[int] | NotGiven = NOT_GIVEN,
168
max_tokens: Optional[int] | NotGiven = NOT_GIVEN,
169
min_completion_tokens: Optional[int] | NotGiven = NOT_GIVEN,
170
parallel_tool_calls: Optional[bool] | NotGiven = NOT_GIVEN,
171
reasoning_effort: Optional[Literal["low", "medium", "high"]] | NotGiven = NOT_GIVEN,
172
service_tier: Optional[Literal["auto", "default"]] | NotGiven = NOT_GIVEN,
173
temperature: Optional[float] | NotGiven = NOT_GIVEN,
174
tool_choice: Optional[completion_create_params.ToolChoice] | NotGiven = NOT_GIVEN,
175
tools: Optional[Iterable[completion_create_params.Tool]] | NotGiven = NOT_GIVEN,
176
# ... additional parameters including cf_ray, x_amz_cf_id, extra_headers, etc.
177
) -> ChatCompletion | Stream[ChatCompletion]: ...
178
```
179
180
[Chat Completions](./chat-completions.md)
181
182
### Models
183
184
Model listing and information retrieval for discovering available models and their capabilities. Provides access to model metadata, supported features, and configuration options.
185
186
```python { .api }
187
def list(
188
self,
189
*
190
) -> ModelListResponse: ...
191
192
def retrieve(
193
self,
194
model_id: str,
195
*
196
) -> ModelRetrieveResponse: ...
197
```
198
199
[Models](./models.md)
200
201
### Legacy Completions
202
203
Legacy text completion API for traditional completion-style interactions. Supports text generation with various parameters including temperature, top-p sampling, frequency penalties, and custom stop sequences.
204
205
```python { .api }
206
def create(
207
self,
208
*,
209
model: str,
210
best_of: Optional[int] = NOT_GIVEN,
211
echo: Optional[bool] = NOT_GIVEN,
212
frequency_penalty: Optional[float] = NOT_GIVEN,
213
logit_bias: Optional[Dict[str, int]] = NOT_GIVEN,
214
logprobs: Optional[int] = NOT_GIVEN,
215
max_tokens: Optional[int] = NOT_GIVEN,
216
n: Optional[int] = NOT_GIVEN,
217
presence_penalty: Optional[float] = NOT_GIVEN,
218
prompt: Union[str, List[str], List[int], List[List[int]], None] = NOT_GIVEN,
219
seed: Optional[int] = NOT_GIVEN,
220
stop: Union[Optional[str], List[str], None] = NOT_GIVEN,
221
stream: Optional[Literal[False]] | NotGiven = NOT_GIVEN,
222
stream_options: Optional[completion_create_params.StreamOptions] | NotGiven = NOT_GIVEN,
223
suffix: Optional[str] = NOT_GIVEN,
224
temperature: Optional[float] = NOT_GIVEN,
225
top_p: Optional[float] = NOT_GIVEN,
226
user: str | NotGiven = NOT_GIVEN,
227
**kwargs
228
) -> Completion: ...
229
```
230
231
[Legacy Completions](./legacy-completions.md)
232
233
### Types and Configuration
234
235
Comprehensive type system, exception handling, and configuration utilities. Includes Pydantic models for all API responses, TypedDict parameter classes, complete exception hierarchy, and utility functions for file handling and configuration.
236
237
```python { .api }
238
# Core types
239
class BaseModel: ...
240
class NotGiven: ...
241
NOT_GIVEN: NotGiven
242
243
# Exception hierarchy
244
class CerebrasError(Exception): ...
245
class APIError(CerebrasError): ...
246
class APIStatusError(APIError): ...
247
class BadRequestError(APIStatusError): ...
248
class AuthenticationError(APIStatusError): ...
249
class RateLimitError(APIStatusError): ...
250
251
# Configuration types
252
Timeout: TypeAlias
253
Transport: TypeAlias
254
ProxiesTypes: TypeAlias
255
RequestOptions: TypeAlias
256
257
# Streaming classes
258
class Stream: ...
259
class AsyncStream: ...
260
261
# Response wrappers
262
class APIResponse: ...
263
class AsyncAPIResponse: ...
264
```
265
266
[Types and Configuration](./types-and-configuration.md)
267
268
## Response Format Patterns
269
270
All API methods return structured response objects with consistent patterns:
271
272
- **Chat Completions**: `ChatCompletion` objects with choices, usage metadata, and timing information
273
- **Legacy Completions**: `Completion` objects with generated text and token information
274
- **Model Operations**: `ModelListResponse` and `ModelRetrieveResponse` with model metadata
275
- **Error Responses**: Structured exception objects with detailed error information
276
277
## Streaming Support
278
279
Both chat and legacy completions support streaming responses for real-time token generation:
280
281
```python
282
# Streaming chat completion
283
stream = client.chat.completions.create(
284
model="llama3.1-70b",
285
messages=[{"role": "user", "content": "Tell me a story"}],
286
stream=True
287
)
288
289
for chunk in stream:
290
if chunk.choices[0].delta.content:
291
print(chunk.choices[0].delta.content, end="")
292
293
## Alternative Resource Access
294
295
The SDK provides an alternative way to access resources directly through the resources module:
296
297
```python
298
from cerebras.cloud.sdk import resources
299
300
# Direct resource access (not bound to a client instance)
301
# Note: Still requires a configured client context
302
```