or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-cerebras-cloud-sdk

The official Python library for the cerebras API

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/cerebras-cloud-sdk@1.50.x

To install, run

npx @tessl/cli install tessl/pypi-cerebras-cloud-sdk@1.50.0

0

# Cerebras Cloud SDK

1

2

The official Python library for the Cerebras Cloud API, providing access to Cerebras' Wafer-Scale Engine-3 (WSE-3) powered AI inference capabilities. The SDK offers both synchronous and asynchronous clients with comprehensive type definitions, streaming support, and built-in retry mechanisms for high-throughput AI inference workloads.

3

4

## Package Information

5

6

- **Package Name**: cerebras_cloud_sdk

7

- **Language**: Python

8

- **Installation**: `pip install cerebras_cloud_sdk`

9

- **Python Requirements**: Python 3.8+

10

11

## Core Imports

12

13

```python

14

import cerebras.cloud.sdk as cerebras

15

```

16

17

Most common imports:

18

19

```python

20

from cerebras.cloud.sdk import Cerebras, AsyncCerebras

21

```

22

23

For type annotations:

24

25

```python

26

from cerebras.cloud.sdk.types.chat import ChatCompletion, CompletionCreateParams

27

from cerebras.cloud.sdk import types

28

```

29

30

Complete import options:

31

32

```python

33

# Main client classes

34

from cerebras.cloud.sdk import Cerebras, AsyncCerebras, Client, AsyncClient

35

36

# Core types and utilities

37

from cerebras.cloud.sdk import BaseModel, NOT_GIVEN, NotGiven, Omit, NoneType

38

from cerebras.cloud.sdk import Timeout, RequestOptions, Transport, ProxiesTypes

39

40

# Streaming classes

41

from cerebras.cloud.sdk import Stream, AsyncStream

42

43

# Response wrappers

44

from cerebras.cloud.sdk import APIResponse, AsyncAPIResponse

45

46

# Exception handling

47

from cerebras.cloud.sdk import (

48

CerebrasError, APIError, APIStatusError, APITimeoutError,

49

APIConnectionError, APIResponseValidationError,

50

BadRequestError, AuthenticationError, PermissionDeniedError,

51

NotFoundError, ConflictError, UnprocessableEntityError,

52

RateLimitError, InternalServerError

53

)

54

55

# Configuration constants

56

from cerebras.cloud.sdk import DEFAULT_TIMEOUT, DEFAULT_MAX_RETRIES, DEFAULT_CONNECTION_LIMITS

57

58

# HTTP clients

59

from cerebras.cloud.sdk import DefaultHttpxClient, DefaultAsyncHttpxClient, DefaultAioHttpClient

60

61

# Utility functions

62

from cerebras.cloud.sdk import file_from_path

63

64

# Direct resources access (alternative)

65

from cerebras.cloud.sdk import resources

66

```

67

68

## Basic Usage

69

70

```python

71

import os

72

from cerebras.cloud.sdk import Cerebras

73

74

# Initialize client (API key from CEREBRAS_API_KEY env var)

75

client = Cerebras(api_key=os.getenv("CEREBRAS_API_KEY"))

76

77

# Simple chat completion

78

response = client.chat.completions.create(

79

model="llama3.1-70b",

80

messages=[

81

{"role": "user", "content": "What is machine learning?"}

82

],

83

max_tokens=100

84

)

85

86

print(response.choices[0].message.content)

87

88

# Async usage

89

import asyncio

90

from cerebras.cloud.sdk import AsyncCerebras

91

92

async def main():

93

client = AsyncCerebras()

94

response = await client.chat.completions.create(

95

model="llama3.1-70b",

96

messages=[{"role": "user", "content": "Hello!"}],

97

max_tokens=50

98

)

99

print(response.choices[0].message.content)

100

101

asyncio.run(main())

102

```

103

104

## Architecture

105

106

The SDK follows a resource-based architecture:

107

108

- **Client Classes**: `Cerebras` (sync) and `AsyncCerebras` (async) as main entry points

109

- **Resource Objects**: Organized API endpoints (`chat`, `completions`, `models`)

110

- **Type System**: Comprehensive Pydantic models for requests and responses

111

- **Streaming Support**: Real-time response handling with `Stream` and `AsyncStream`

112

- **Error Handling**: Complete HTTP status code exception hierarchy

113

- **Response Wrappers**: Raw response and streaming response access patterns

114

115

This design enables both simple usage patterns and advanced customization while maintaining full type safety and async/await compatibility.

116

117

## Capabilities

118

119

### Client Management

120

121

Client initialization, configuration, and authentication for both synchronous and asynchronous usage patterns. Supports environment variable configuration, custom timeouts, retry policies, and HTTP client customization.

122

123

```python { .api }

124

class Cerebras:

125

def __init__(

126

self,

127

*,

128

api_key: str | None = None,

129

base_url: str | httpx.URL | None = None,

130

timeout: Union[float, Timeout, None, NotGiven] = NOT_GIVEN,

131

max_retries: int = DEFAULT_MAX_RETRIES,

132

default_headers: Mapping[str, str] | None = None,

133

default_query: Mapping[str, object] | None = None,

134

http_client: httpx.Client | None = None,

135

_strict_response_validation: bool = False,

136

warm_tcp_connection: bool = True,

137

) -> None: ...

138

139

class AsyncCerebras:

140

def __init__(

141

self,

142

*,

143

api_key: str | None = None,

144

base_url: str | httpx.URL | None = None,

145

timeout: Union[float, Timeout, None, NotGiven] = NOT_GIVEN,

146

max_retries: int = DEFAULT_MAX_RETRIES,

147

default_headers: Mapping[str, str] | None = None,

148

default_query: Mapping[str, object] | None = None,

149

http_client: httpx.AsyncClient | None = None,

150

_strict_response_validation: bool = False,

151

warm_tcp_connection: bool = True,

152

) -> None: ...

153

```

154

155

[Client Management](./client-management.md)

156

157

### Chat Completions

158

159

Modern chat completion API for conversational AI applications. Supports system messages, user messages, assistant messages, streaming responses, function calling, and comprehensive response metadata including token usage and timing information.

160

161

```python { .api }

162

def create(

163

self,

164

*,

165

messages: Iterable[completion_create_params.Message],

166

model: str,

167

max_completion_tokens: Optional[int] | NotGiven = NOT_GIVEN,

168

max_tokens: Optional[int] | NotGiven = NOT_GIVEN,

169

min_completion_tokens: Optional[int] | NotGiven = NOT_GIVEN,

170

parallel_tool_calls: Optional[bool] | NotGiven = NOT_GIVEN,

171

reasoning_effort: Optional[Literal["low", "medium", "high"]] | NotGiven = NOT_GIVEN,

172

service_tier: Optional[Literal["auto", "default"]] | NotGiven = NOT_GIVEN,

173

temperature: Optional[float] | NotGiven = NOT_GIVEN,

174

tool_choice: Optional[completion_create_params.ToolChoice] | NotGiven = NOT_GIVEN,

175

tools: Optional[Iterable[completion_create_params.Tool]] | NotGiven = NOT_GIVEN,

176

# ... additional parameters including cf_ray, x_amz_cf_id, extra_headers, etc.

177

) -> ChatCompletion | Stream[ChatCompletion]: ...

178

```

179

180

[Chat Completions](./chat-completions.md)

181

182

### Models

183

184

Model listing and information retrieval for discovering available models and their capabilities. Provides access to model metadata, supported features, and configuration options.

185

186

```python { .api }

187

def list(

188

self,

189

*

190

) -> ModelListResponse: ...

191

192

def retrieve(

193

self,

194

model_id: str,

195

*

196

) -> ModelRetrieveResponse: ...

197

```

198

199

[Models](./models.md)

200

201

### Legacy Completions

202

203

Legacy text completion API for traditional completion-style interactions. Supports text generation with various parameters including temperature, top-p sampling, frequency penalties, and custom stop sequences.

204

205

```python { .api }

206

def create(

207

self,

208

*,

209

model: str,

210

best_of: Optional[int] = NOT_GIVEN,

211

echo: Optional[bool] = NOT_GIVEN,

212

frequency_penalty: Optional[float] = NOT_GIVEN,

213

logit_bias: Optional[Dict[str, int]] = NOT_GIVEN,

214

logprobs: Optional[int] = NOT_GIVEN,

215

max_tokens: Optional[int] = NOT_GIVEN,

216

n: Optional[int] = NOT_GIVEN,

217

presence_penalty: Optional[float] = NOT_GIVEN,

218

prompt: Union[str, List[str], List[int], List[List[int]], None] = NOT_GIVEN,

219

seed: Optional[int] = NOT_GIVEN,

220

stop: Union[Optional[str], List[str], None] = NOT_GIVEN,

221

stream: Optional[Literal[False]] | NotGiven = NOT_GIVEN,

222

stream_options: Optional[completion_create_params.StreamOptions] | NotGiven = NOT_GIVEN,

223

suffix: Optional[str] = NOT_GIVEN,

224

temperature: Optional[float] = NOT_GIVEN,

225

top_p: Optional[float] = NOT_GIVEN,

226

user: str | NotGiven = NOT_GIVEN,

227

**kwargs

228

) -> Completion: ...

229

```

230

231

[Legacy Completions](./legacy-completions.md)

232

233

### Types and Configuration

234

235

Comprehensive type system, exception handling, and configuration utilities. Includes Pydantic models for all API responses, TypedDict parameter classes, complete exception hierarchy, and utility functions for file handling and configuration.

236

237

```python { .api }

238

# Core types

239

class BaseModel: ...

240

class NotGiven: ...

241

NOT_GIVEN: NotGiven

242

243

# Exception hierarchy

244

class CerebrasError(Exception): ...

245

class APIError(CerebrasError): ...

246

class APIStatusError(APIError): ...

247

class BadRequestError(APIStatusError): ...

248

class AuthenticationError(APIStatusError): ...

249

class RateLimitError(APIStatusError): ...

250

251

# Configuration types

252

Timeout: TypeAlias

253

Transport: TypeAlias

254

ProxiesTypes: TypeAlias

255

RequestOptions: TypeAlias

256

257

# Streaming classes

258

class Stream: ...

259

class AsyncStream: ...

260

261

# Response wrappers

262

class APIResponse: ...

263

class AsyncAPIResponse: ...

264

```

265

266

[Types and Configuration](./types-and-configuration.md)

267

268

## Response Format Patterns

269

270

All API methods return structured response objects with consistent patterns:

271

272

- **Chat Completions**: `ChatCompletion` objects with choices, usage metadata, and timing information

273

- **Legacy Completions**: `Completion` objects with generated text and token information

274

- **Model Operations**: `ModelListResponse` and `ModelRetrieveResponse` with model metadata

275

- **Error Responses**: Structured exception objects with detailed error information

276

277

## Streaming Support

278

279

Both chat and legacy completions support streaming responses for real-time token generation:

280

281

```python

282

# Streaming chat completion

283

stream = client.chat.completions.create(

284

model="llama3.1-70b",

285

messages=[{"role": "user", "content": "Tell me a story"}],

286

stream=True

287

)

288

289

for chunk in stream:

290

if chunk.choices[0].delta.content:

291

print(chunk.choices[0].delta.content, end="")

292

293

## Alternative Resource Access

294

295

The SDK provides an alternative way to access resources directly through the resources module:

296

297

```python

298

from cerebras.cloud.sdk import resources

299

300

# Direct resource access (not bound to a client instance)

301

# Note: Still requires a configured client context

302

```