Tessl Tile for pypi/browser-use@0.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

agent-orchestration.md browser-actions.md browser-session.md dom-processing.md index.md llm-integration.md task-results.md

agent-orchestration.mddocs/

0
# Agent Orchestration
1

2
Core agent functionality for autonomous browser task execution. The Agent class serves as the main orchestrator, coordinating language models, browser sessions, and action execution to complete complex web automation tasks.
3

4
## Capabilities
5

6
### Agent Creation and Configuration
7

8
The Agent class provides comprehensive configuration options for task execution, browser control, and LLM integration.
9

10
```python { .api }
11
class Agent:
12
    def __init__(
13
        self,
14
        task: str,
15
        llm: BaseChatModel = ChatOpenAI(model='gpt-4o-mini'),
16
        # Optional browser parameters
17
        browser_profile: BrowserProfile = None,
18
        browser_session: BrowserSession = None,
19
        browser: BrowserSession = None,  # Alias for browser_session
20
        tools: Tools = None,
21
        controller: Tools = None,  # Alias for tools
22
        # Initial agent run parameters
23
        sensitive_data: dict[str, str | dict[str, str]] = None,
24
        initial_actions: list[dict[str, dict[str, Any]]] = None,
25
        # Cloud callbacks
26
        register_new_step_callback: Callable = None,
27
        register_done_callback: Callable = None,
28
        register_external_agent_status_raise_error_callback: Callable[[], Awaitable[bool]] = None,
29
        # Agent settings
30
        output_model_schema: type[AgentStructuredOutput] = None,
31
        use_vision: bool = True,
32
        save_conversation_path: str | Path = None,
33
        save_conversation_path_encoding: str = 'utf-8',
34
        max_failures: int = 3,
35
        override_system_message: str = None,
36
        extend_system_message: str = None,
37
        generate_gif: bool | str = False,
38
        available_file_paths: list[str] = None,
39
        include_attributes: list[str] = None,
40
        max_actions_per_step: int = 10,
41
        use_thinking: bool = True,
42
        flash_mode: bool = False,
43
        max_history_items: int = None,
44
        page_extraction_llm: BaseChatModel = None,
45
        # Advanced parameters
46
        injected_agent_state: AgentState = None,
47
        source: str = None,
48
        file_system_path: str = None,
49
        task_id: str = None,
50
        cloud_sync: CloudSync = None,
51
        calculate_cost: bool = False,
52
        display_files_in_done_text: bool = True,
53
        include_tool_call_examples: bool = False,
54
        vision_detail_level: Literal['auto', 'low', 'high'] = 'auto',
55
        llm_timeout: int = 90,
56
        step_timeout: int = 120,
57
        directly_open_url: bool = True,
58
        include_recent_events: bool = False,
59
        **kwargs
60
    ):
61
        """
62
        Create an AI agent for browser automation tasks.
63

64
        Parameters:
65
        - task: Description of the task to be performed
66
        - llm: Language model instance (defaults to ChatOpenAI(model='gpt-4o-mini'))
67
        - browser_profile: Browser configuration settings
68
        - browser_session: Existing browser session to use
69
        - browser: Alias for browser_session parameter
70
        - tools: Custom tools/actions registry
71
        - controller: Alias for tools parameter
72
        - sensitive_data: Credentials and sensitive information for the agent
73
        - initial_actions: Actions to execute before main task
74
        - register_new_step_callback: Callback for new step events
75
        - register_done_callback: Callback for task completion events
76
        - register_external_agent_status_raise_error_callback: Callback for external status checks
77
        - output_model_schema: Schema for structured output
78
        - use_vision: Enable vision capabilities for screenshot analysis
79
        - save_conversation_path: Path to save conversation history
80
        - save_conversation_path_encoding: Encoding for saved conversation files
81
        - max_failures: Maximum consecutive failures before stopping
82
        - override_system_message: Replace default system prompt
83
        - extend_system_message: Add to default system prompt
84
        - generate_gif: Generate GIF recording of agent actions
85
        - available_file_paths: Files available to the agent
86
        - include_attributes: DOM attributes to include in element descriptions
87
        - max_actions_per_step: Maximum actions per execution step
88
        - use_thinking: Enable internal reasoning mode
89
        - flash_mode: Enable faster execution mode with reduced prompting
90
        - max_history_items: Maximum history items to keep in memory
91
        - page_extraction_llm: Separate LLM for page content extraction
92
        - injected_agent_state: Pre-configured agent state for advanced usage
93
        - source: Source identifier for tracking
94
        - file_system_path: Path to agent file system
95
        - task_id: Unique identifier for the task
96
        - cloud_sync: Cloud synchronization service instance
97
        - calculate_cost: Calculate and track API costs
98
        - display_files_in_done_text: Show files in completion messages
99
        - include_tool_call_examples: Include examples in tool calls
100
        - vision_detail_level: Vision processing detail level ('auto', 'low', 'high')
101
        - llm_timeout: LLM request timeout in seconds
102
        - step_timeout: Step execution timeout in seconds
103
        - directly_open_url: Open URLs directly without confirmation
104
        - include_recent_events: Include recent browser events in context
105
        - **kwargs: Additional configuration parameters
106
        """
107
```
108

109
### Task Execution
110

111
Primary methods for running agent tasks with both asynchronous and synchronous interfaces.
112

113
```python { .api }
114
async def run(self, max_steps: int = 100) -> AgentHistoryList:
115
    """
116
    Execute the agent task asynchronously.
117

118
    Parameters:
119
    - max_steps: Maximum number of execution steps
120

121
    Returns:
122
    AgentHistoryList: Complete execution history with results
123
    """
124

125
def run_sync(self, max_steps: int = 100) -> AgentHistoryList:
126
    """
127
    Execute the agent task synchronously.
128

129
    Parameters:
130
    - max_steps: Maximum number of execution steps
131

132
    Returns:
133
    AgentHistoryList: Complete execution history with results
134
    """
135
```
136

137
### Step-by-Step Execution
138

139
Fine-grained control over agent execution for debugging and custom workflows.
140

141
```python { .api }
142
async def step(self, step_info: AgentStepInfo = None) -> None:
143
    """
144
    Execute a single step of the agent task.
145

146
    Parameters:
147
    - step_info: Optional step information for context
148
    """
149

150
async def take_step(self, step_info: AgentStepInfo = None) -> tuple[bool, bool]:
151
    """
152
    Take a step and return completion status.
153

154
    Parameters:
155
    - step_info: Optional step information for context
156

157
    Returns:
158
    tuple[bool, bool]: (is_done, is_valid)
159
    """
160
```
161

162
### Task Management
163

164
Methods for dynamic task modification and execution control.
165

166
```python { .api }
167
def add_new_task(self, new_task: str) -> None:
168
    """
169
    Add a new task to the agent's task list.
170

171
    Parameters:
172
    - new_task: Additional task description
173
    """
174

175
def pause() -> None:
176
    """Pause agent execution."""
177

178
def resume() -> None:
179
    """Resume paused agent execution."""
180

181
def stop() -> None:
182
    """Stop agent execution immediately."""
183
```
184

185
### History and State Management
186

187
Methods for saving, loading, and managing execution history.
188

189
```python { .api }
190
def save_history(self, file_path: str | Path = None) -> None:
191
    """
192
    Save execution history to file.
193

194
    Parameters:
195
    - file_path: Path to save history (optional)
196
    """
197

198
async def load_and_rerun(
199
    self,
200
    history_file: str | Path = None
201
) -> list[ActionResult]:
202
    """
203
    Load and replay execution history.
204

205
    Parameters:
206
    - history_file: Path to history file to replay
207

208
    Returns:
209
    list[ActionResult]: Results from replayed actions
210
    """
211

212
async def close(self) -> None:
213
    """Clean up resources and close connections."""
214
```
215

216
### System Prompt Management
217

218
Advanced prompt engineering capabilities for customizing agent behavior.
219

220
```python { .api }
221
class SystemPrompt:
222
    def __init__(
223
        self,
224
        action_description: str,
225
        max_actions_per_step: int = 10,
226
        override_system_message: str = None,
227
        extend_system_message: str = None,
228
        use_thinking: bool = True,
229
        flash_mode: bool = False
230
    ):
231
        """
232
        Manage system prompts for agent behavior.
233

234
        Parameters:
235
        - action_description: Description of available actions
236
        - max_actions_per_step: Maximum actions per step
237
        - override_system_message: Replace default system message
238
        - extend_system_message: Add to default system message
239
        - use_thinking: Enable thinking mode
240
        - flash_mode: Enable flash mode
241
        """
242

243
    def get_system_message(self) -> SystemMessage:
244
        """Get formatted system prompt message."""
245
```
246

247
## Usage Examples
248

249
### Basic Agent Usage
250

251
```python
252
from browser_use import Agent, ChatOpenAI
253

254
# Simple task execution
255
agent = Agent(
256
    task="Go to Google and search for 'Python programming'",
257
    llm=ChatOpenAI(model="gpt-4o")
258
)
259

260
result = agent.run_sync()
261
print(f"Task completed: {result.is_done()}")
262
print(f"Final result: {result.final_result()}")
263
```
264

265
### Advanced Configuration
266

267
```python
268
from browser_use import Agent, BrowserProfile, Tools, ChatAnthropic
269

270
# Custom browser profile
271
profile = BrowserProfile(
272
    headless=False,
273
    user_data_dir="/tmp/browser-data",
274
    allowed_domains=["*.github.com", "*.stackoverflow.com"]
275
)
276

277
# Custom tools with exclusions
278
tools = Tools(exclude_actions=["search_google"])
279

280
# Agent with advanced configuration
281
agent = Agent(
282
    task="Navigate to GitHub and find Python repositories",
283
    llm=ChatAnthropic(model="claude-3-sonnet-20240229"),
284
    browser_profile=profile,
285
    tools=tools,
286
    use_vision=True,
287
    max_failures=5,
288
    generate_gif=True,
289
    extend_system_message="Be extra careful with form submissions."
290
)
291

292
result = await agent.run(max_steps=50)
293
```
294

295
### Structured Output
296

297
```python
298
from pydantic import BaseModel
299
from browser_use import Agent
300

301
class SearchResult(BaseModel):
302
    title: str
303
    url: str
304
    description: str
305

306
agent = Agent(
307
    task="Search for AI research papers and extract details",
308
    output_model_schema=SearchResult
309
)
310

311
result = agent.run_sync()
312
structured_data = result.final_result()  # Returns SearchResult instance
313
```
314

315
### Step-by-Step Execution
316

317
```python
318
from browser_use import Agent
319

320
agent = Agent(task="Multi-step web scraping task")
321

322
# Execute step by step for debugging
323
while not agent.is_done():
324
    await agent.step()
325
    print(f"Current step: {agent.current_step}")
326
    if agent.has_error():
327
        print(f"Error: {agent.last_error}")
328
        break
329

330
# Save progress
331
agent.save_history("execution_log.json")
332
```
333

334
### History Replay
335

336
```python
337
from browser_use import Agent
338

339
agent = Agent(task="Replay previous execution")
340
results = await agent.load_and_rerun("execution_log.json")
341

342
for result in results:
343
    print(f"Action: {result.action}, Success: {result.success}")
344
```
345

346
## Type Definitions
347

348
```python { .api }
349
from typing import Any, Optional
350
from pathlib import Path
351

352
class AgentStepInfo:
353
    """Information context for agent step execution."""
354
    pass
355

356
class SystemMessage:
357
    """Formatted system message for LLM prompting."""
358
    content: str
359
```

Version

Tile

Files

agent-orchestration.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

agent-orchestration.mddocs/