or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

clients.mdconvenience-functions.mddata-types.mdindex.md

clients.mddocs/

0

# Client Operations

1

2

Complete synchronous and asynchronous client classes providing the full Ollama API with configurable hosts, custom headers, timeouts, and comprehensive error handling.

3

4

**Type Imports**: The signatures in this documentation use these typing imports:

5

```python

6

from typing import Union, Sequence, Mapping, Callable, Literal, Any, Iterator

7

from pydantic.json_schema import JsonSchemaValue

8

```

9

10

## Capabilities

11

12

### Client Class (Synchronous)

13

14

Synchronous HTTP client for Ollama API operations with configurable connection settings.

15

16

```python { .api }

17

class Client:

18

def __init__(

19

self,

20

host: str = None,

21

*,

22

follow_redirects: bool = True,

23

timeout: Any = None,

24

headers: dict[str, str] = None,

25

**kwargs

26

):

27

"""

28

Create a synchronous Ollama client.

29

30

Parameters:

31

- host (str, optional): Ollama server host URL. Defaults to OLLAMA_HOST env var or localhost:11434

32

- follow_redirects (bool): Whether to follow HTTP redirects. Default: True

33

- timeout: Request timeout configuration

34

- headers (dict): Custom HTTP headers

35

- **kwargs: Additional httpx client arguments

36

"""

37

```

38

39

#### Text Generation

40

41

Generate text completions from prompts with extensive configuration options.

42

43

```python { .api }

44

def generate(

45

self,

46

model: str = '',

47

prompt: str = '',

48

suffix: str = None,

49

*,

50

system: str = None,

51

template: str = None,

52

context: Sequence[int] = None,

53

stream: bool = False,

54

think: bool = None,

55

raw: bool = None,

56

format: str = None,

57

images: Sequence[Union[str, bytes, Image]] = None,

58

options: Union[Mapping[str, Any], Options] = None,

59

keep_alive: Union[float, str] = None

60

) -> Union[GenerateResponse, Iterator[GenerateResponse]]:

61

"""

62

Generate text from a prompt.

63

64

Parameters:

65

- model (str): Model name to use for generation. Default: ''

66

- prompt (str, optional): Text prompt for generation. Default: None

67

- suffix (str, optional): Text to append after generation

68

- system (str, optional): System message to set context

69

- template (str, optional): Custom prompt template

70

- context (list[int], optional): Token context from previous generation

71

- stream (bool): Return streaming responses. Default: False

72

- think (bool): Enable thinking mode for reasoning models

73

- raw (bool): Use raw mode (no template processing)

74

- format (str, optional): Response format ('json', etc.)

75

- images (list[Image], optional): Images for multimodal models

76

- options (Options, optional): Model configuration options

77

- keep_alive (str, optional): Keep model loaded duration

78

79

Returns:

80

GenerateResponse or Iterator[GenerateResponse] if streaming

81

"""

82

```

83

84

#### Chat Operations

85

86

Conduct multi-turn conversations with context preservation and tool calling support.

87

88

```python { .api }

89

def chat(

90

self,

91

model: str = '',

92

messages: Sequence[Union[Mapping[str, Any], Message]] = None,

93

*,

94

tools: Sequence[Union[Mapping[str, Any], Tool, Callable]] = None,

95

stream: bool = False,

96

think: Union[bool, Literal['low', 'medium', 'high']] = None,

97

format: Union[Literal['', 'json'], JsonSchemaValue] = None,

98

options: Union[Mapping[str, Any], Options] = None,

99

keep_alive: Union[float, str] = None

100

) -> Union[ChatResponse, Iterator[ChatResponse]]:

101

"""

102

Chat with a model using conversation history.

103

104

Parameters:

105

- model (str): Model name to use for chat. Default: ''

106

- messages (Sequence[Union[Mapping, Message]], optional): Conversation messages. Default: None

107

- tools (Sequence[Union[Mapping, Tool, Callable]], optional): Available tools for function calling

108

- stream (bool): Return streaming responses. Default: False

109

- think (Union[bool, Literal['low', 'medium', 'high']], optional): Enable thinking mode for reasoning models

110

- format (str, optional): Response format ('json', etc.)

111

- options (Options, optional): Model configuration options

112

- keep_alive (str, optional): Keep model loaded duration

113

114

Returns:

115

ChatResponse or Iterator[ChatResponse] if streaming

116

"""

117

```

118

119

#### Embeddings

120

121

Generate vector embeddings from text inputs for semantic similarity and search applications.

122

123

```python { .api }

124

def embed(

125

self,

126

model: str = '',

127

input: Union[str, Sequence[str]] = '',

128

truncate: bool = None,

129

options: Options = None,

130

keep_alive: str = None

131

) -> EmbedResponse:

132

"""

133

Generate embeddings for input text(s).

134

135

Parameters:

136

- model (str): Embedding model name

137

- input (str | list[str]): Text or list of texts to embed

138

- truncate (bool, optional): Truncate inputs that exceed model limits

139

- options (Options, optional): Model configuration options

140

- keep_alive (str, optional): Keep model loaded duration

141

142

Returns:

143

EmbedResponse containing embedding vectors

144

"""

145

146

def embeddings(

147

self,

148

model: str,

149

prompt: str,

150

options: Options = None,

151

keep_alive: str = None

152

) -> EmbeddingsResponse:

153

"""

154

Generate embeddings (deprecated - use embed instead).

155

156

Parameters:

157

- model (str): Embedding model name

158

- prompt (str): Text to embed

159

- options (Options, optional): Model configuration options

160

- keep_alive (str, optional): Keep model loaded duration

161

162

Returns:

163

EmbeddingsResponse containing single embedding vector

164

"""

165

```

166

167

#### Model Management

168

169

Download, upload, create, and manage Ollama models with progress tracking.

170

171

```python { .api }

172

def pull(

173

self,

174

model: str,

175

*,

176

insecure: bool = False,

177

stream: bool = False

178

) -> ProgressResponse | Iterator[ProgressResponse]:

179

"""

180

Download a model from a model library.

181

182

Parameters:

183

- model (str): Model name to download

184

- insecure (bool): Allow insecure connections. Default: False

185

- stream (bool): Return streaming progress. Default: False

186

187

Returns:

188

ProgressResponse or Iterator[ProgressResponse] if streaming

189

"""

190

191

def push(

192

self,

193

model: str,

194

*,

195

insecure: bool = False,

196

stream: bool = False

197

) -> ProgressResponse | Iterator[ProgressResponse]:

198

"""

199

Upload a model to a model library.

200

201

Parameters:

202

- model (str): Model name to upload

203

- insecure (bool): Allow insecure connections. Default: False

204

- stream (bool): Return streaming progress. Default: False

205

206

Returns:

207

ProgressResponse or Iterator[ProgressResponse] if streaming

208

"""

209

210

def create(

211

self,

212

model: str,

213

quantize: str = None,

214

from_: str = None,

215

files: dict = None,

216

adapters: dict[str, str] = None,

217

template: str = None,

218

license: Union[str, list[str]] = None,

219

system: str = None,

220

parameters: dict = None,

221

messages: list[Message] = None,

222

*,

223

stream: bool = False

224

) -> ProgressResponse | Iterator[ProgressResponse]:

225

"""

226

Create a new model from a Modelfile.

227

228

Parameters:

229

- model (str): Name for the new model

230

- quantize (str, optional): Quantization method

231

- from_ (str, optional): Base model to inherit from

232

- files (dict, optional): Additional files to include

233

- adapters (list[str], optional): Model adapters to apply

234

- template (str, optional): Prompt template

235

- license (str, optional): Model license

236

- system (str, optional): System message template

237

- parameters (dict, optional): Model parameters

238

- messages (list[Message], optional): Example messages

239

- stream (bool): Return streaming progress. Default: False

240

241

Returns:

242

ProgressResponse or Iterator[ProgressResponse] if streaming

243

"""

244

245

def create_blob(

246

self,

247

path: Union[str, Path]

248

) -> str:

249

"""

250

Create a blob from a file for model creation.

251

252

Parameters:

253

- path (str | Path): Path to file to create blob from

254

255

Returns:

256

str: Blob digest hash

257

"""

258

259

def delete(

260

self,

261

model: str

262

) -> StatusResponse:

263

"""

264

Delete a model.

265

266

Parameters:

267

- model (str): Name of model to delete

268

269

Returns:

270

StatusResponse with deletion status

271

"""

272

273

def copy(

274

self,

275

source: str,

276

destination: str

277

) -> StatusResponse:

278

"""

279

Copy a model.

280

281

Parameters:

282

- source (str): Source model name

283

- destination (str): Destination model name

284

285

Returns:

286

StatusResponse with copy status

287

"""

288

```

289

290

#### Model Information

291

292

Retrieve information about available and running models.

293

294

```python { .api }

295

def list(

296

self

297

) -> ListResponse:

298

"""

299

List available models.

300

301

Returns:

302

ListResponse containing model information

303

"""

304

305

def show(

306

self,

307

model: str

308

) -> ShowResponse:

309

"""

310

Show information about a specific model.

311

312

Parameters:

313

- model (str): Model name to show information for

314

315

Returns:

316

ShowResponse with detailed model information

317

"""

318

319

def ps(

320

self

321

) -> ProcessResponse:

322

"""

323

List running models and their resource usage.

324

325

Returns:

326

ProcessResponse with currently running models

327

"""

328

```

329

330

### AsyncClient Class (Asynchronous)

331

332

Asynchronous HTTP client for Ollama API operations with the same interface as Client but using async/await patterns.

333

334

```python { .api }

335

class AsyncClient:

336

def __init__(

337

self,

338

host: str = None,

339

*,

340

follow_redirects: bool = True,

341

timeout: Any = None,

342

headers: dict[str, str] = None,

343

**kwargs

344

):

345

"""

346

Create an asynchronous Ollama client.

347

348

Parameters: Same as Client class

349

"""

350

351

async def generate(self, model: str = '', prompt: str = '', **kwargs):

352

"""Async version of Client.generate()"""

353

354

async def chat(self, model: str = '', messages: Sequence[Union[Mapping, Message]] = None, **kwargs):

355

"""Async version of Client.chat()"""

356

357

async def embed(self, model: str = '', input: Union[str, Sequence[str]] = '', **kwargs):

358

"""Async version of Client.embed()"""

359

360

async def embeddings(self, model: str, prompt: str, **kwargs):

361

"""Async version of Client.embeddings() (deprecated)"""

362

363

async def pull(self, model: str, **kwargs):

364

"""Async version of Client.pull()"""

365

366

async def push(self, model: str, **kwargs):

367

"""Async version of Client.push()"""

368

369

async def create(self, model: str, **kwargs):

370

"""Async version of Client.create()"""

371

372

async def create_blob(self, path: Union[str, Path]) -> str:

373

"""Async version of Client.create_blob()"""

374

375

async def delete(self, model: str) -> StatusResponse:

376

"""Async version of Client.delete()"""

377

378

async def copy(self, source: str, destination: str) -> StatusResponse:

379

"""Async version of Client.copy()"""

380

381

async def list(self) -> ListResponse:

382

"""Async version of Client.list()"""

383

384

async def show(self, model: str) -> ShowResponse:

385

"""Async version of Client.show()"""

386

387

async def ps(self) -> ProcessResponse:

388

"""Async version of Client.ps()"""

389

```

390

391

## Usage Examples

392

393

### Custom Client Configuration

394

395

```python

396

from ollama import Client

397

import httpx

398

399

# Custom client with authentication

400

client = Client(

401

host='https://my-ollama-server.com',

402

headers={'Authorization': 'Bearer token'},

403

timeout=httpx.Timeout(30.0)

404

)

405

406

# Generate with custom client

407

response = client.generate(

408

model='custom-model',

409

prompt='Hello, world!',

410

options={'temperature': 0.7}

411

)

412

```

413

414

### Streaming with Progress Tracking

415

416

```python

417

from ollama import Client

418

419

client = Client()

420

421

# Stream text generation

422

print("Generating story...")

423

for chunk in client.generate(

424

model='llama3.2',

425

prompt='Write a short story about a robot',

426

stream=True

427

):

428

if chunk.get('response'):

429

print(chunk['response'], end='', flush=True)

430

431

print("\n\nPulling model...")

432

# Stream model download progress

433

for progress in client.pull('phi3', stream=True):

434

if progress.get('completed') and progress.get('total'):

435

percent = (progress['completed'] / progress['total']) * 100

436

print(f"Progress: {percent:.1f}%")

437

```

438

439

### Async Context Management

440

441

```python

442

import asyncio

443

from ollama import AsyncClient

444

445

async def main():

446

async with AsyncClient() as client:

447

# Concurrent requests

448

tasks = [

449

client.generate(model='llama3.2', prompt=f'Story {i}')

450

for i in range(3)

451

]

452

453

responses = await asyncio.gather(*tasks)

454

for i, response in enumerate(responses):

455

print(f"Story {i}: {response['response'][:100]}...")

456

457

asyncio.run(main())

458

```