0
# LLM Integration
1
2
Multi-provider language model support with consistent interfaces for OpenAI, Anthropic, Google, Groq, Azure OpenAI, and Ollama models. All chat models implement the BaseChatModel protocol for seamless integration with browser-use agents.
3
4
## Capabilities
5
6
### OpenAI Integration
7
8
OpenAI GPT model integration with support for GPT-4, GPT-3.5, and other OpenAI models.
9
10
```python { .api }
11
class ChatOpenAI:
12
def __init__(
13
self,
14
model: str = "gpt-4o-mini",
15
temperature: float = 0.2,
16
frequency_penalty: float = 0.3,
17
presence_penalty: float = 0.0,
18
max_tokens: int = None,
19
api_key: str = None,
20
base_url: str = None,
21
timeout: float = 60.0
22
):
23
"""
24
Initialize OpenAI chat model.
25
26
Parameters:
27
- model: OpenAI model name (e.g., "gpt-4o", "gpt-4o-mini", "gpt-3.5-turbo")
28
- temperature: Randomness in generation (0.0-2.0)
29
- frequency_penalty: Penalty for frequent tokens (-2.0 to 2.0)
30
- presence_penalty: Penalty for token presence (-2.0 to 2.0)
31
- max_tokens: Maximum tokens in response
32
- api_key: OpenAI API key (uses OPENAI_API_KEY env var if not provided)
33
- base_url: Custom API base URL
34
- timeout: Request timeout in seconds
35
"""
36
37
model: str
38
provider: str = "openai"
39
40
async def ainvoke(
41
self,
42
messages: list[BaseMessage],
43
output_format: type[T] = None
44
) -> ChatInvokeCompletion:
45
"""
46
Invoke OpenAI model with messages.
47
48
Parameters:
49
- messages: List of conversation messages
50
- output_format: Optional Pydantic model for structured output
51
52
Returns:
53
ChatInvokeCompletion: Model response with content and metadata
54
"""
55
```
56
57
### Anthropic Integration
58
59
Anthropic Claude model integration with support for Claude 3 family models.
60
61
```python { .api }
62
class ChatAnthropic:
63
def __init__(
64
self,
65
model: str = "claude-3-sonnet-20240229",
66
temperature: float = 0.2,
67
max_tokens: int = 4096,
68
api_key: str = None,
69
timeout: float = 60.0
70
):
71
"""
72
Initialize Anthropic Claude model.
73
74
Parameters:
75
- model: Claude model name (e.g., "claude-3-sonnet-20240229", "claude-3-haiku-20240307")
76
- temperature: Randomness in generation (0.0-1.0)
77
- max_tokens: Maximum tokens in response
78
- api_key: Anthropic API key (uses ANTHROPIC_API_KEY env var if not provided)
79
- timeout: Request timeout in seconds
80
"""
81
82
model: str
83
provider: str = "anthropic"
84
85
async def ainvoke(
86
self,
87
messages: list[BaseMessage],
88
output_format: type[T] = None
89
) -> ChatInvokeCompletion:
90
"""Invoke Claude model with messages."""
91
```
92
93
### Google Integration
94
95
Google Gemini model integration with support for Gemini Pro and other Google models.
96
97
```python { .api }
98
class ChatGoogle:
99
def __init__(
100
self,
101
model: str = "gemini-pro",
102
temperature: float = 0.2,
103
max_tokens: int = None,
104
api_key: str = None,
105
timeout: float = 60.0
106
):
107
"""
108
Initialize Google Gemini model.
109
110
Parameters:
111
- model: Gemini model name (e.g., "gemini-pro", "gemini-pro-vision")
112
- temperature: Randomness in generation (0.0-1.0)
113
- max_tokens: Maximum tokens in response
114
- api_key: Google API key (uses GOOGLE_API_KEY env var if not provided)
115
- timeout: Request timeout in seconds
116
"""
117
118
model: str
119
provider: str = "google"
120
121
async def ainvoke(
122
self,
123
messages: list[BaseMessage],
124
output_format: type[T] = None
125
) -> ChatInvokeCompletion:
126
"""Invoke Gemini model with messages."""
127
```
128
129
### Groq Integration
130
131
Groq model integration for fast inference with Llama, Mixtral, and other supported models.
132
133
```python { .api }
134
class ChatGroq:
135
def __init__(
136
self,
137
model: str = "llama3-70b-8192",
138
temperature: float = 0.2,
139
max_tokens: int = None,
140
api_key: str = None,
141
timeout: float = 60.0
142
):
143
"""
144
Initialize Groq model.
145
146
Parameters:
147
- model: Groq model name (e.g., "llama3-70b-8192", "mixtral-8x7b-32768")
148
- temperature: Randomness in generation (0.0-2.0)
149
- max_tokens: Maximum tokens in response
150
- api_key: Groq API key (uses GROQ_API_KEY env var if not provided)
151
- timeout: Request timeout in seconds
152
"""
153
154
model: str
155
provider: str = "groq"
156
157
async def ainvoke(
158
self,
159
messages: list[BaseMessage],
160
output_format: type[T] = None
161
) -> ChatInvokeCompletion:
162
"""Invoke Groq model with messages."""
163
```
164
165
### Azure OpenAI Integration
166
167
Azure OpenAI service integration for enterprise OpenAI model deployment.
168
169
```python { .api }
170
class ChatAzureOpenAI:
171
def __init__(
172
self,
173
model: str,
174
azure_endpoint: str,
175
api_version: str = "2024-02-15-preview",
176
temperature: float = 0.2,
177
frequency_penalty: float = 0.3,
178
max_tokens: int = None,
179
api_key: str = None,
180
timeout: float = 60.0
181
):
182
"""
183
Initialize Azure OpenAI model.
184
185
Parameters:
186
- model: Azure deployment name
187
- azure_endpoint: Azure OpenAI endpoint URL
188
- api_version: Azure OpenAI API version
189
- temperature: Randomness in generation (0.0-2.0)
190
- frequency_penalty: Penalty for frequent tokens (-2.0 to 2.0)
191
- max_tokens: Maximum tokens in response
192
- api_key: Azure OpenAI API key
193
- timeout: Request timeout in seconds
194
"""
195
196
model: str
197
provider: str = "azure_openai"
198
199
async def ainvoke(
200
self,
201
messages: list[BaseMessage],
202
output_format: type[T] = None
203
) -> ChatInvokeCompletion:
204
"""Invoke Azure OpenAI model with messages."""
205
```
206
207
### Ollama Integration
208
209
Local model integration using Ollama for running models locally.
210
211
```python { .api }
212
class ChatOllama:
213
def __init__(
214
self,
215
model: str = "llama2",
216
temperature: float = 0.2,
217
base_url: str = "http://localhost:11434",
218
timeout: float = 120.0
219
):
220
"""
221
Initialize Ollama local model.
222
223
Parameters:
224
- model: Ollama model name (e.g., "llama2", "codellama", "mistral")
225
- temperature: Randomness in generation (0.0-1.0)
226
- base_url: Ollama server URL
227
- timeout: Request timeout in seconds
228
"""
229
230
model: str
231
provider: str = "ollama"
232
233
async def ainvoke(
234
self,
235
messages: list[BaseMessage],
236
output_format: type[T] = None
237
) -> ChatInvokeCompletion:
238
"""Invoke local Ollama model with messages."""
239
```
240
241
### Base Chat Model Protocol
242
243
Protocol defining the interface that all chat models must implement.
244
245
```python { .api }
246
from typing import Protocol, TypeVar
247
from abc import abstractmethod
248
249
T = TypeVar('T')
250
251
class BaseChatModel(Protocol):
252
"""Protocol for chat model implementations."""
253
254
model: str
255
provider: str
256
257
@abstractmethod
258
async def ainvoke(
259
self,
260
messages: list[BaseMessage],
261
output_format: type[T] = None
262
) -> ChatInvokeCompletion:
263
"""
264
Invoke the chat model with messages.
265
266
Parameters:
267
- messages: Conversation messages
268
- output_format: Optional structured output format
269
270
Returns:
271
ChatInvokeCompletion: Model response
272
"""
273
```
274
275
### Message Types
276
277
Message types for structured conversation handling.
278
279
```python { .api }
280
class BaseMessage:
281
"""Base class for conversation messages."""
282
content: str
283
role: str
284
285
class SystemMessage(BaseMessage):
286
"""System message for model prompting."""
287
role: str = "system"
288
289
class HumanMessage(BaseMessage):
290
"""Human/user message."""
291
role: str = "user"
292
293
class AIMessage(BaseMessage):
294
"""AI assistant message."""
295
role: str = "assistant"
296
297
class ChatInvokeCompletion:
298
"""Chat model response."""
299
content: str
300
model: str
301
usage: dict[str, int]
302
finish_reason: str
303
```
304
305
## Usage Examples
306
307
### Basic Model Usage
308
309
```python
310
from browser_use import Agent, ChatOpenAI, ChatAnthropic, ChatGoogle
311
312
# OpenAI GPT-4
313
agent = Agent(
314
task="Search for Python tutorials",
315
llm=ChatOpenAI(model="gpt-4o", temperature=0.1)
316
)
317
318
# Anthropic Claude
319
agent = Agent(
320
task="Analyze web page content",
321
llm=ChatAnthropic(model="claude-3-sonnet-20240229")
322
)
323
324
# Google Gemini
325
agent = Agent(
326
task="Extract structured data",
327
llm=ChatGoogle(model="gemini-pro")
328
)
329
```
330
331
### Custom Model Configuration
332
333
```python
334
from browser_use import ChatOpenAI, ChatGroq, ChatOllama
335
336
# Custom OpenAI configuration
337
openai_model = ChatOpenAI(
338
model="gpt-4o",
339
temperature=0.0, # Deterministic output
340
frequency_penalty=0.5, # Reduce repetition
341
max_tokens=2000,
342
timeout=30.0
343
)
344
345
# Fast inference with Groq
346
groq_model = ChatGroq(
347
model="llama3-70b-8192",
348
temperature=0.3,
349
max_tokens=4000
350
)
351
352
# Local model with Ollama
353
local_model = ChatOllama(
354
model="codellama:13b",
355
temperature=0.1,
356
base_url="http://localhost:11434"
357
)
358
```
359
360
### Azure OpenAI Enterprise Setup
361
362
```python
363
from browser_use import ChatAzureOpenAI, Agent
364
365
# Azure OpenAI configuration
366
azure_model = ChatAzureOpenAI(
367
model="gpt-4-deployment", # Your Azure deployment name
368
azure_endpoint="https://your-resource.openai.azure.com/",
369
api_version="2024-02-15-preview",
370
api_key="your-azure-api-key",
371
temperature=0.2
372
)
373
374
agent = Agent(
375
task="Enterprise browser automation task",
376
llm=azure_model
377
)
378
```
379
380
### Model Comparison Workflow
381
382
```python
383
from browser_use import Agent, ChatOpenAI, ChatAnthropic, ChatGoogle
384
385
task = "Analyze this webpage and extract key information"
386
387
# Test with different models
388
models = [
389
ChatOpenAI(model="gpt-4o"),
390
ChatAnthropic(model="claude-3-sonnet-20240229"),
391
ChatGoogle(model="gemini-pro")
392
]
393
394
results = []
395
for model in models:
396
agent = Agent(task=task, llm=model)
397
result = agent.run_sync()
398
results.append({
399
'provider': model.provider,
400
'model': model.model,
401
'result': result.final_result(),
402
'success': result.is_successful()
403
})
404
405
# Compare results
406
for result in results:
407
print(f"{result['provider']}: {result['success']}")
408
```
409
410
### Structured Output with Models
411
412
```python
413
from browser_use import Agent, ChatOpenAI
414
from pydantic import BaseModel
415
416
class WebPageInfo(BaseModel):
417
title: str
418
main_content: str
419
links: list[str]
420
images: list[str]
421
422
# Model with structured output
423
agent = Agent(
424
task="Extract structured information from webpage",
425
llm=ChatOpenAI(model="gpt-4o"),
426
output_model_schema=WebPageInfo
427
)
428
429
result = agent.run_sync()
430
webpage_info = result.final_result() # Returns WebPageInfo instance
431
print(f"Title: {webpage_info.title}")
432
print(f"Links found: {len(webpage_info.links)}")
433
```
434
435
### Error Handling and Fallbacks
436
437
```python
438
from browser_use import Agent, ChatOpenAI, ChatAnthropic, LLMException
439
440
primary_model = ChatOpenAI(model="gpt-4o")
441
fallback_model = ChatAnthropic(model="claude-3-haiku-20240307")
442
443
try:
444
agent = Agent(task="Complex task", llm=primary_model)
445
result = agent.run_sync()
446
except LLMException as e:
447
print(f"Primary model failed: {e}")
448
# Fallback to alternative model
449
agent = Agent(task="Complex task", llm=fallback_model)
450
result = agent.run_sync()
451
```
452
453
### Local Model Setup
454
455
```python
456
from browser_use import ChatOllama, Agent
457
458
# Ensure Ollama is running: ollama serve
459
# Pull model: ollama pull llama2
460
461
local_model = ChatOllama(
462
model="llama2:13b",
463
temperature=0.1,
464
base_url="http://localhost:11434"
465
)
466
467
agent = Agent(
468
task="Local browser automation task",
469
llm=local_model
470
)
471
472
# Works offline with local inference
473
result = agent.run_sync()
474
```
475
476
## Model Selection Guidelines
477
478
### Performance Characteristics
479
480
- **GPT-4o**: Excellent reasoning, vision capabilities, reliable
481
- **Claude-3**: Strong analysis, long context, good at following instructions
482
- **Gemini Pro**: Good vision, fast inference, cost-effective
483
- **Groq**: Very fast inference, good for simple tasks
484
- **Local (Ollama)**: Privacy, offline operation, no API costs
485
486
### Use Case Recommendations
487
488
- **Complex reasoning**: GPT-4o, Claude-3 Sonnet
489
- **Fast simple tasks**: Groq, Gemini Pro
490
- **Privacy/offline**: Ollama local models
491
- **Enterprise**: Azure OpenAI
492
- **Cost optimization**: GPT-4o-mini, Claude-3 Haiku
493
494
### Configuration Best Practices
495
496
- Use low temperature (0.0-0.3) for deterministic browser automation
497
- Set appropriate timeouts for model response times
498
- Configure max_tokens based on expected response length
499
- Use frequency_penalty to reduce repetitive actions