0
# Client Operations
1
2
Complete synchronous and asynchronous client classes providing the full Ollama API with configurable hosts, custom headers, timeouts, and comprehensive error handling.
3
4
**Type Imports**: The signatures in this documentation use these typing imports:
5
```python
6
from typing import Union, Sequence, Mapping, Callable, Literal, Any, Iterator
7
from pydantic.json_schema import JsonSchemaValue
8
```
9
10
## Capabilities
11
12
### Client Class (Synchronous)
13
14
Synchronous HTTP client for Ollama API operations with configurable connection settings.
15
16
```python { .api }
17
class Client:
18
def __init__(
19
self,
20
host: str = None,
21
*,
22
follow_redirects: bool = True,
23
timeout: Any = None,
24
headers: dict[str, str] = None,
25
**kwargs
26
):
27
"""
28
Create a synchronous Ollama client.
29
30
Parameters:
31
- host (str, optional): Ollama server host URL. Defaults to OLLAMA_HOST env var or localhost:11434
32
- follow_redirects (bool): Whether to follow HTTP redirects. Default: True
33
- timeout: Request timeout configuration
34
- headers (dict): Custom HTTP headers
35
- **kwargs: Additional httpx client arguments
36
"""
37
```
38
39
#### Text Generation
40
41
Generate text completions from prompts with extensive configuration options.
42
43
```python { .api }
44
def generate(
45
self,
46
model: str = '',
47
prompt: str = '',
48
suffix: str = None,
49
*,
50
system: str = None,
51
template: str = None,
52
context: Sequence[int] = None,
53
stream: bool = False,
54
think: bool = None,
55
raw: bool = None,
56
format: str = None,
57
images: Sequence[Union[str, bytes, Image]] = None,
58
options: Union[Mapping[str, Any], Options] = None,
59
keep_alive: Union[float, str] = None
60
) -> Union[GenerateResponse, Iterator[GenerateResponse]]:
61
"""
62
Generate text from a prompt.
63
64
Parameters:
65
- model (str): Model name to use for generation. Default: ''
66
- prompt (str, optional): Text prompt for generation. Default: None
67
- suffix (str, optional): Text to append after generation
68
- system (str, optional): System message to set context
69
- template (str, optional): Custom prompt template
70
- context (list[int], optional): Token context from previous generation
71
- stream (bool): Return streaming responses. Default: False
72
- think (bool): Enable thinking mode for reasoning models
73
- raw (bool): Use raw mode (no template processing)
74
- format (str, optional): Response format ('json', etc.)
75
- images (list[Image], optional): Images for multimodal models
76
- options (Options, optional): Model configuration options
77
- keep_alive (str, optional): Keep model loaded duration
78
79
Returns:
80
GenerateResponse or Iterator[GenerateResponse] if streaming
81
"""
82
```
83
84
#### Chat Operations
85
86
Conduct multi-turn conversations with context preservation and tool calling support.
87
88
```python { .api }
89
def chat(
90
self,
91
model: str = '',
92
messages: Sequence[Union[Mapping[str, Any], Message]] = None,
93
*,
94
tools: Sequence[Union[Mapping[str, Any], Tool, Callable]] = None,
95
stream: bool = False,
96
think: Union[bool, Literal['low', 'medium', 'high']] = None,
97
format: Union[Literal['', 'json'], JsonSchemaValue] = None,
98
options: Union[Mapping[str, Any], Options] = None,
99
keep_alive: Union[float, str] = None
100
) -> Union[ChatResponse, Iterator[ChatResponse]]:
101
"""
102
Chat with a model using conversation history.
103
104
Parameters:
105
- model (str): Model name to use for chat. Default: ''
106
- messages (Sequence[Union[Mapping, Message]], optional): Conversation messages. Default: None
107
- tools (Sequence[Union[Mapping, Tool, Callable]], optional): Available tools for function calling
108
- stream (bool): Return streaming responses. Default: False
109
- think (Union[bool, Literal['low', 'medium', 'high']], optional): Enable thinking mode for reasoning models
110
- format (str, optional): Response format ('json', etc.)
111
- options (Options, optional): Model configuration options
112
- keep_alive (str, optional): Keep model loaded duration
113
114
Returns:
115
ChatResponse or Iterator[ChatResponse] if streaming
116
"""
117
```
118
119
#### Embeddings
120
121
Generate vector embeddings from text inputs for semantic similarity and search applications.
122
123
```python { .api }
124
def embed(
125
self,
126
model: str = '',
127
input: Union[str, Sequence[str]] = '',
128
truncate: bool = None,
129
options: Options = None,
130
keep_alive: str = None
131
) -> EmbedResponse:
132
"""
133
Generate embeddings for input text(s).
134
135
Parameters:
136
- model (str): Embedding model name
137
- input (str | list[str]): Text or list of texts to embed
138
- truncate (bool, optional): Truncate inputs that exceed model limits
139
- options (Options, optional): Model configuration options
140
- keep_alive (str, optional): Keep model loaded duration
141
142
Returns:
143
EmbedResponse containing embedding vectors
144
"""
145
146
def embeddings(
147
self,
148
model: str,
149
prompt: str,
150
options: Options = None,
151
keep_alive: str = None
152
) -> EmbeddingsResponse:
153
"""
154
Generate embeddings (deprecated - use embed instead).
155
156
Parameters:
157
- model (str): Embedding model name
158
- prompt (str): Text to embed
159
- options (Options, optional): Model configuration options
160
- keep_alive (str, optional): Keep model loaded duration
161
162
Returns:
163
EmbeddingsResponse containing single embedding vector
164
"""
165
```
166
167
#### Model Management
168
169
Download, upload, create, and manage Ollama models with progress tracking.
170
171
```python { .api }
172
def pull(
173
self,
174
model: str,
175
*,
176
insecure: bool = False,
177
stream: bool = False
178
) -> ProgressResponse | Iterator[ProgressResponse]:
179
"""
180
Download a model from a model library.
181
182
Parameters:
183
- model (str): Model name to download
184
- insecure (bool): Allow insecure connections. Default: False
185
- stream (bool): Return streaming progress. Default: False
186
187
Returns:
188
ProgressResponse or Iterator[ProgressResponse] if streaming
189
"""
190
191
def push(
192
self,
193
model: str,
194
*,
195
insecure: bool = False,
196
stream: bool = False
197
) -> ProgressResponse | Iterator[ProgressResponse]:
198
"""
199
Upload a model to a model library.
200
201
Parameters:
202
- model (str): Model name to upload
203
- insecure (bool): Allow insecure connections. Default: False
204
- stream (bool): Return streaming progress. Default: False
205
206
Returns:
207
ProgressResponse or Iterator[ProgressResponse] if streaming
208
"""
209
210
def create(
211
self,
212
model: str,
213
quantize: str = None,
214
from_: str = None,
215
files: dict = None,
216
adapters: dict[str, str] = None,
217
template: str = None,
218
license: Union[str, list[str]] = None,
219
system: str = None,
220
parameters: dict = None,
221
messages: list[Message] = None,
222
*,
223
stream: bool = False
224
) -> ProgressResponse | Iterator[ProgressResponse]:
225
"""
226
Create a new model from a Modelfile.
227
228
Parameters:
229
- model (str): Name for the new model
230
- quantize (str, optional): Quantization method
231
- from_ (str, optional): Base model to inherit from
232
- files (dict, optional): Additional files to include
233
- adapters (list[str], optional): Model adapters to apply
234
- template (str, optional): Prompt template
235
- license (str, optional): Model license
236
- system (str, optional): System message template
237
- parameters (dict, optional): Model parameters
238
- messages (list[Message], optional): Example messages
239
- stream (bool): Return streaming progress. Default: False
240
241
Returns:
242
ProgressResponse or Iterator[ProgressResponse] if streaming
243
"""
244
245
def create_blob(
246
self,
247
path: Union[str, Path]
248
) -> str:
249
"""
250
Create a blob from a file for model creation.
251
252
Parameters:
253
- path (str | Path): Path to file to create blob from
254
255
Returns:
256
str: Blob digest hash
257
"""
258
259
def delete(
260
self,
261
model: str
262
) -> StatusResponse:
263
"""
264
Delete a model.
265
266
Parameters:
267
- model (str): Name of model to delete
268
269
Returns:
270
StatusResponse with deletion status
271
"""
272
273
def copy(
274
self,
275
source: str,
276
destination: str
277
) -> StatusResponse:
278
"""
279
Copy a model.
280
281
Parameters:
282
- source (str): Source model name
283
- destination (str): Destination model name
284
285
Returns:
286
StatusResponse with copy status
287
"""
288
```
289
290
#### Model Information
291
292
Retrieve information about available and running models.
293
294
```python { .api }
295
def list(
296
self
297
) -> ListResponse:
298
"""
299
List available models.
300
301
Returns:
302
ListResponse containing model information
303
"""
304
305
def show(
306
self,
307
model: str
308
) -> ShowResponse:
309
"""
310
Show information about a specific model.
311
312
Parameters:
313
- model (str): Model name to show information for
314
315
Returns:
316
ShowResponse with detailed model information
317
"""
318
319
def ps(
320
self
321
) -> ProcessResponse:
322
"""
323
List running models and their resource usage.
324
325
Returns:
326
ProcessResponse with currently running models
327
"""
328
```
329
330
### AsyncClient Class (Asynchronous)
331
332
Asynchronous HTTP client for Ollama API operations with the same interface as Client but using async/await patterns.
333
334
```python { .api }
335
class AsyncClient:
336
def __init__(
337
self,
338
host: str = None,
339
*,
340
follow_redirects: bool = True,
341
timeout: Any = None,
342
headers: dict[str, str] = None,
343
**kwargs
344
):
345
"""
346
Create an asynchronous Ollama client.
347
348
Parameters: Same as Client class
349
"""
350
351
async def generate(self, model: str = '', prompt: str = '', **kwargs):
352
"""Async version of Client.generate()"""
353
354
async def chat(self, model: str = '', messages: Sequence[Union[Mapping, Message]] = None, **kwargs):
355
"""Async version of Client.chat()"""
356
357
async def embed(self, model: str = '', input: Union[str, Sequence[str]] = '', **kwargs):
358
"""Async version of Client.embed()"""
359
360
async def embeddings(self, model: str, prompt: str, **kwargs):
361
"""Async version of Client.embeddings() (deprecated)"""
362
363
async def pull(self, model: str, **kwargs):
364
"""Async version of Client.pull()"""
365
366
async def push(self, model: str, **kwargs):
367
"""Async version of Client.push()"""
368
369
async def create(self, model: str, **kwargs):
370
"""Async version of Client.create()"""
371
372
async def create_blob(self, path: Union[str, Path]) -> str:
373
"""Async version of Client.create_blob()"""
374
375
async def delete(self, model: str) -> StatusResponse:
376
"""Async version of Client.delete()"""
377
378
async def copy(self, source: str, destination: str) -> StatusResponse:
379
"""Async version of Client.copy()"""
380
381
async def list(self) -> ListResponse:
382
"""Async version of Client.list()"""
383
384
async def show(self, model: str) -> ShowResponse:
385
"""Async version of Client.show()"""
386
387
async def ps(self) -> ProcessResponse:
388
"""Async version of Client.ps()"""
389
```
390
391
## Usage Examples
392
393
### Custom Client Configuration
394
395
```python
396
from ollama import Client
397
import httpx
398
399
# Custom client with authentication
400
client = Client(
401
host='https://my-ollama-server.com',
402
headers={'Authorization': 'Bearer token'},
403
timeout=httpx.Timeout(30.0)
404
)
405
406
# Generate with custom client
407
response = client.generate(
408
model='custom-model',
409
prompt='Hello, world!',
410
options={'temperature': 0.7}
411
)
412
```
413
414
### Streaming with Progress Tracking
415
416
```python
417
from ollama import Client
418
419
client = Client()
420
421
# Stream text generation
422
print("Generating story...")
423
for chunk in client.generate(
424
model='llama3.2',
425
prompt='Write a short story about a robot',
426
stream=True
427
):
428
if chunk.get('response'):
429
print(chunk['response'], end='', flush=True)
430
431
print("\n\nPulling model...")
432
# Stream model download progress
433
for progress in client.pull('phi3', stream=True):
434
if progress.get('completed') and progress.get('total'):
435
percent = (progress['completed'] / progress['total']) * 100
436
print(f"Progress: {percent:.1f}%")
437
```
438
439
### Async Context Management
440
441
```python
442
import asyncio
443
from ollama import AsyncClient
444
445
async def main():
446
async with AsyncClient() as client:
447
# Concurrent requests
448
tasks = [
449
client.generate(model='llama3.2', prompt=f'Story {i}')
450
for i in range(3)
451
]
452
453
responses = await asyncio.gather(*tasks)
454
for i, response in enumerate(responses):
455
print(f"Story {i}: {response['response'][:100]}...")
456
457
asyncio.run(main())
458
```