0
# Agent Orchestration
1
2
Core agent functionality for autonomous browser task execution. The Agent class serves as the main orchestrator, coordinating language models, browser sessions, and action execution to complete complex web automation tasks.
3
4
## Capabilities
5
6
### Agent Creation and Configuration
7
8
The Agent class provides comprehensive configuration options for task execution, browser control, and LLM integration.
9
10
```python { .api }
11
class Agent:
12
def __init__(
13
self,
14
task: str,
15
llm: BaseChatModel = ChatOpenAI(model='gpt-4o-mini'),
16
# Optional browser parameters
17
browser_profile: BrowserProfile = None,
18
browser_session: BrowserSession = None,
19
browser: BrowserSession = None, # Alias for browser_session
20
tools: Tools = None,
21
controller: Tools = None, # Alias for tools
22
# Initial agent run parameters
23
sensitive_data: dict[str, str | dict[str, str]] = None,
24
initial_actions: list[dict[str, dict[str, Any]]] = None,
25
# Cloud callbacks
26
register_new_step_callback: Callable = None,
27
register_done_callback: Callable = None,
28
register_external_agent_status_raise_error_callback: Callable[[], Awaitable[bool]] = None,
29
# Agent settings
30
output_model_schema: type[AgentStructuredOutput] = None,
31
use_vision: bool = True,
32
save_conversation_path: str | Path = None,
33
save_conversation_path_encoding: str = 'utf-8',
34
max_failures: int = 3,
35
override_system_message: str = None,
36
extend_system_message: str = None,
37
generate_gif: bool | str = False,
38
available_file_paths: list[str] = None,
39
include_attributes: list[str] = None,
40
max_actions_per_step: int = 10,
41
use_thinking: bool = True,
42
flash_mode: bool = False,
43
max_history_items: int = None,
44
page_extraction_llm: BaseChatModel = None,
45
# Advanced parameters
46
injected_agent_state: AgentState = None,
47
source: str = None,
48
file_system_path: str = None,
49
task_id: str = None,
50
cloud_sync: CloudSync = None,
51
calculate_cost: bool = False,
52
display_files_in_done_text: bool = True,
53
include_tool_call_examples: bool = False,
54
vision_detail_level: Literal['auto', 'low', 'high'] = 'auto',
55
llm_timeout: int = 90,
56
step_timeout: int = 120,
57
directly_open_url: bool = True,
58
include_recent_events: bool = False,
59
**kwargs
60
):
61
"""
62
Create an AI agent for browser automation tasks.
63
64
Parameters:
65
- task: Description of the task to be performed
66
- llm: Language model instance (defaults to ChatOpenAI(model='gpt-4o-mini'))
67
- browser_profile: Browser configuration settings
68
- browser_session: Existing browser session to use
69
- browser: Alias for browser_session parameter
70
- tools: Custom tools/actions registry
71
- controller: Alias for tools parameter
72
- sensitive_data: Credentials and sensitive information for the agent
73
- initial_actions: Actions to execute before main task
74
- register_new_step_callback: Callback for new step events
75
- register_done_callback: Callback for task completion events
76
- register_external_agent_status_raise_error_callback: Callback for external status checks
77
- output_model_schema: Schema for structured output
78
- use_vision: Enable vision capabilities for screenshot analysis
79
- save_conversation_path: Path to save conversation history
80
- save_conversation_path_encoding: Encoding for saved conversation files
81
- max_failures: Maximum consecutive failures before stopping
82
- override_system_message: Replace default system prompt
83
- extend_system_message: Add to default system prompt
84
- generate_gif: Generate GIF recording of agent actions
85
- available_file_paths: Files available to the agent
86
- include_attributes: DOM attributes to include in element descriptions
87
- max_actions_per_step: Maximum actions per execution step
88
- use_thinking: Enable internal reasoning mode
89
- flash_mode: Enable faster execution mode with reduced prompting
90
- max_history_items: Maximum history items to keep in memory
91
- page_extraction_llm: Separate LLM for page content extraction
92
- injected_agent_state: Pre-configured agent state for advanced usage
93
- source: Source identifier for tracking
94
- file_system_path: Path to agent file system
95
- task_id: Unique identifier for the task
96
- cloud_sync: Cloud synchronization service instance
97
- calculate_cost: Calculate and track API costs
98
- display_files_in_done_text: Show files in completion messages
99
- include_tool_call_examples: Include examples in tool calls
100
- vision_detail_level: Vision processing detail level ('auto', 'low', 'high')
101
- llm_timeout: LLM request timeout in seconds
102
- step_timeout: Step execution timeout in seconds
103
- directly_open_url: Open URLs directly without confirmation
104
- include_recent_events: Include recent browser events in context
105
- **kwargs: Additional configuration parameters
106
"""
107
```
108
109
### Task Execution
110
111
Primary methods for running agent tasks with both asynchronous and synchronous interfaces.
112
113
```python { .api }
114
async def run(self, max_steps: int = 100) -> AgentHistoryList:
115
"""
116
Execute the agent task asynchronously.
117
118
Parameters:
119
- max_steps: Maximum number of execution steps
120
121
Returns:
122
AgentHistoryList: Complete execution history with results
123
"""
124
125
def run_sync(self, max_steps: int = 100) -> AgentHistoryList:
126
"""
127
Execute the agent task synchronously.
128
129
Parameters:
130
- max_steps: Maximum number of execution steps
131
132
Returns:
133
AgentHistoryList: Complete execution history with results
134
"""
135
```
136
137
### Step-by-Step Execution
138
139
Fine-grained control over agent execution for debugging and custom workflows.
140
141
```python { .api }
142
async def step(self, step_info: AgentStepInfo = None) -> None:
143
"""
144
Execute a single step of the agent task.
145
146
Parameters:
147
- step_info: Optional step information for context
148
"""
149
150
async def take_step(self, step_info: AgentStepInfo = None) -> tuple[bool, bool]:
151
"""
152
Take a step and return completion status.
153
154
Parameters:
155
- step_info: Optional step information for context
156
157
Returns:
158
tuple[bool, bool]: (is_done, is_valid)
159
"""
160
```
161
162
### Task Management
163
164
Methods for dynamic task modification and execution control.
165
166
```python { .api }
167
def add_new_task(self, new_task: str) -> None:
168
"""
169
Add a new task to the agent's task list.
170
171
Parameters:
172
- new_task: Additional task description
173
"""
174
175
def pause() -> None:
176
"""Pause agent execution."""
177
178
def resume() -> None:
179
"""Resume paused agent execution."""
180
181
def stop() -> None:
182
"""Stop agent execution immediately."""
183
```
184
185
### History and State Management
186
187
Methods for saving, loading, and managing execution history.
188
189
```python { .api }
190
def save_history(self, file_path: str | Path = None) -> None:
191
"""
192
Save execution history to file.
193
194
Parameters:
195
- file_path: Path to save history (optional)
196
"""
197
198
async def load_and_rerun(
199
self,
200
history_file: str | Path = None
201
) -> list[ActionResult]:
202
"""
203
Load and replay execution history.
204
205
Parameters:
206
- history_file: Path to history file to replay
207
208
Returns:
209
list[ActionResult]: Results from replayed actions
210
"""
211
212
async def close(self) -> None:
213
"""Clean up resources and close connections."""
214
```
215
216
### System Prompt Management
217
218
Advanced prompt engineering capabilities for customizing agent behavior.
219
220
```python { .api }
221
class SystemPrompt:
222
def __init__(
223
self,
224
action_description: str,
225
max_actions_per_step: int = 10,
226
override_system_message: str = None,
227
extend_system_message: str = None,
228
use_thinking: bool = True,
229
flash_mode: bool = False
230
):
231
"""
232
Manage system prompts for agent behavior.
233
234
Parameters:
235
- action_description: Description of available actions
236
- max_actions_per_step: Maximum actions per step
237
- override_system_message: Replace default system message
238
- extend_system_message: Add to default system message
239
- use_thinking: Enable thinking mode
240
- flash_mode: Enable flash mode
241
"""
242
243
def get_system_message(self) -> SystemMessage:
244
"""Get formatted system prompt message."""
245
```
246
247
## Usage Examples
248
249
### Basic Agent Usage
250
251
```python
252
from browser_use import Agent, ChatOpenAI
253
254
# Simple task execution
255
agent = Agent(
256
task="Go to Google and search for 'Python programming'",
257
llm=ChatOpenAI(model="gpt-4o")
258
)
259
260
result = agent.run_sync()
261
print(f"Task completed: {result.is_done()}")
262
print(f"Final result: {result.final_result()}")
263
```
264
265
### Advanced Configuration
266
267
```python
268
from browser_use import Agent, BrowserProfile, Tools, ChatAnthropic
269
270
# Custom browser profile
271
profile = BrowserProfile(
272
headless=False,
273
user_data_dir="/tmp/browser-data",
274
allowed_domains=["*.github.com", "*.stackoverflow.com"]
275
)
276
277
# Custom tools with exclusions
278
tools = Tools(exclude_actions=["search_google"])
279
280
# Agent with advanced configuration
281
agent = Agent(
282
task="Navigate to GitHub and find Python repositories",
283
llm=ChatAnthropic(model="claude-3-sonnet-20240229"),
284
browser_profile=profile,
285
tools=tools,
286
use_vision=True,
287
max_failures=5,
288
generate_gif=True,
289
extend_system_message="Be extra careful with form submissions."
290
)
291
292
result = await agent.run(max_steps=50)
293
```
294
295
### Structured Output
296
297
```python
298
from pydantic import BaseModel
299
from browser_use import Agent
300
301
class SearchResult(BaseModel):
302
title: str
303
url: str
304
description: str
305
306
agent = Agent(
307
task="Search for AI research papers and extract details",
308
output_model_schema=SearchResult
309
)
310
311
result = agent.run_sync()
312
structured_data = result.final_result() # Returns SearchResult instance
313
```
314
315
### Step-by-Step Execution
316
317
```python
318
from browser_use import Agent
319
320
agent = Agent(task="Multi-step web scraping task")
321
322
# Execute step by step for debugging
323
while not agent.is_done():
324
await agent.step()
325
print(f"Current step: {agent.current_step}")
326
if agent.has_error():
327
print(f"Error: {agent.last_error}")
328
break
329
330
# Save progress
331
agent.save_history("execution_log.json")
332
```
333
334
### History Replay
335
336
```python
337
from browser_use import Agent
338
339
agent = Agent(task="Replay previous execution")
340
results = await agent.load_and_rerun("execution_log.json")
341
342
for result in results:
343
print(f"Action: {result.action}, Success: {result.success}")
344
```
345
346
## Type Definitions
347
348
```python { .api }
349
from typing import Any, Optional
350
from pathlib import Path
351
352
class AgentStepInfo:
353
"""Information context for agent step execution."""
354
pass
355
356
class SystemMessage:
357
"""Formatted system message for LLM prompting."""
358
content: str
359
```