AI-powered browser automation library that enables language models to control web browsers for automated tasks
npx @tessl/cli install tessl/pypi-browser-use@0.7.00
# Browser-Use
1
2
A comprehensive Python library that enables AI agents to control web browsers for automated tasks. Browser-use provides an intelligent agent framework combining browser automation capabilities with language model integration, supporting multiple LLM providers and offering sophisticated DOM manipulation, real-time browser control, and task execution features.
3
4
## Package Information
5
6
- **Package Name**: browser-use
7
- **Language**: Python
8
- **Installation**: `pip install browser-use`
9
- **Repository**: https://github.com/browser-use/browser-use
10
- **Documentation**: https://docs.browser-use.com
11
12
## Core Imports
13
14
```python
15
import browser_use
16
```
17
18
Common patterns for agent-based automation:
19
20
```python
21
from browser_use import Agent, BrowserSession, ChatOpenAI
22
```
23
24
Individual component imports:
25
26
```python
27
from browser_use import (
28
Agent, BrowserSession, BrowserProfile, Tools,
29
SystemPrompt, ActionResult, AgentHistoryList,
30
ChatOpenAI, ChatAnthropic, ChatGoogle
31
)
32
```
33
34
## Basic Usage
35
36
```python
37
from browser_use import Agent, ChatOpenAI
38
39
# Create an agent with a task
40
agent = Agent(
41
task="Search for weather in New York and extract the temperature",
42
llm=ChatOpenAI(model="gpt-4o")
43
)
44
45
# Run the agent task (async)
46
result = await agent.run()
47
48
# Run the agent task (sync)
49
result = agent.run_sync()
50
51
# Check if task completed successfully
52
if result.is_successful():
53
print(f"Task completed: {result.final_result()}")
54
else:
55
print(f"Task failed: {result.errors()}")
56
```
57
58
```python
59
from browser_use import Agent, BrowserProfile, BrowserSession
60
61
# Custom browser configuration
62
profile = BrowserProfile(
63
headless=True,
64
allowed_domains=["*.google.com", "*.wikipedia.org"],
65
downloads_path="/tmp/downloads"
66
)
67
68
# Create browser session with custom profile
69
session = BrowserSession(browser_profile=profile)
70
71
# Agent with custom browser
72
agent = Agent(
73
task="Search Wikipedia for Python programming language",
74
browser_session=session
75
)
76
77
result = agent.run_sync()
78
```
79
80
## Architecture
81
82
Browser-use implements a multi-layered architecture for AI-powered browser automation:
83
84
- **Agent Layer**: High-level task orchestration using language models for decision-making
85
- **Browser Session**: CDP-based browser control and state management
86
- **Tools Registry**: Extensible action system with built-in browser automation actions
87
- **DOM Service**: Intelligent DOM extraction, serialization, and element indexing
88
- **LLM Integration**: Multi-provider support (OpenAI, Anthropic, Google, Groq, Azure, Ollama)
89
- **Observation System**: Screenshot capture, text extraction, and state tracking
90
91
This design enables AI agents to understand web pages visually and semantically, make intelligent decisions about interactions, and execute complex multi-step browser workflows autonomously.
92
93
## Capabilities
94
95
### Agent Orchestration
96
97
Core agent functionality for task execution, including the main Agent class, execution control, history management, and task configuration options.
98
99
```python { .api }
100
class Agent:
101
def __init__(
102
self,
103
task: str,
104
llm: BaseChatModel = ChatOpenAI(model='gpt-4o-mini'),
105
browser_session: BrowserSession = None,
106
tools: Tools = None,
107
use_vision: bool = True,
108
max_failures: int = 3,
109
**kwargs
110
): ...
111
112
async def run(self, max_steps: int = 100) -> AgentHistoryList: ...
113
def run_sync(self, max_steps: int = 100) -> AgentHistoryList: ...
114
async def step(self, step_info: AgentStepInfo = None) -> None: ...
115
```
116
117
[Agent Orchestration](./agent-orchestration.md)
118
119
### Browser Session Management
120
121
Browser session creation, configuration, and control including profile management, browser lifecycle, and basic navigation capabilities.
122
123
```python { .api }
124
class BrowserSession:
125
async def get_browser_state_summary(self, **kwargs) -> BrowserStateSummary: ...
126
async def get_tabs(self) -> list[TabInfo]: ...
127
async def get_element_by_index(self, index: int) -> EnhancedDOMTreeNode | None: ...
128
async def get_current_page_url(self) -> str: ...
129
async def get_current_page_title(self) -> str: ...
130
131
class BrowserProfile:
132
def __init__(
133
self,
134
headless: bool = False,
135
user_data_dir: str = None,
136
allowed_domains: list[str] = None,
137
proxy: ProxySettings = None,
138
**kwargs
139
): ...
140
```
141
142
[Browser Session Management](./browser-session.md)
143
144
### Browser Actions and Tools
145
146
Extensible action system with built-in browser automation capabilities including navigation, element interaction, form handling, and custom action registration.
147
148
```python { .api }
149
class Tools:
150
def __init__(
151
self,
152
exclude_actions: list[str] = None,
153
output_model: type = None
154
): ...
155
156
async def act(
157
self,
158
action: ActionModel,
159
browser_session: BrowserSession,
160
**kwargs
161
) -> ActionResult: ...
162
163
# Built-in actions available
164
def search_google(query: str): ...
165
def go_to_url(url: str): ...
166
def click_element(index: int): ...
167
def input_text(index: int, text: str): ...
168
def scroll(down: bool, num_pages: float): ...
169
def done(text: str): ...
170
```
171
172
[Browser Actions and Tools](./browser-actions.md)
173
174
### LLM Integration
175
176
Multi-provider language model support with consistent interfaces for OpenAI, Anthropic, Google, Groq, Azure OpenAI, and Ollama models.
177
178
```python { .api }
179
class ChatOpenAI:
180
def __init__(
181
self,
182
model: str = "gpt-4o-mini",
183
temperature: float = 0.2,
184
frequency_penalty: float = 0.3
185
): ...
186
187
class ChatAnthropic:
188
def __init__(self, model: str = "claude-3-sonnet-20240229"): ...
189
190
class ChatGoogle:
191
def __init__(self, model: str = "gemini-pro"): ...
192
```
193
194
[LLM Integration](./llm-integration.md)
195
196
### DOM Processing and Element Interaction
197
198
Advanced DOM extraction, serialization, element indexing, and interaction capabilities for intelligent web page understanding.
199
200
```python { .api }
201
class DomService:
202
def __init__(
203
self,
204
browser_session: BrowserSession,
205
cross_origin_iframes: bool = False
206
): ...
207
```
208
209
[DOM Processing](./dom-processing.md)
210
211
### Task Results and History
212
213
Comprehensive result tracking, history management, and execution analysis including success/failure detection, error handling, and workflow replay capabilities.
214
215
```python { .api }
216
class ActionResult:
217
is_done: bool = None
218
success: bool = None
219
error: str = None
220
extracted_content: str = None
221
attachments: list[str] = None
222
223
class AgentHistoryList:
224
def is_done(self) -> bool: ...
225
def is_successful(self) -> bool: ...
226
def final_result(self) -> str: ...
227
def errors(self) -> list[str]: ...
228
def save_to_file(self, filepath: str) -> None: ...
229
```
230
231
[Task Results and History](./task-results.md)
232
233
## Configuration and Error Handling
234
235
Global configuration management and exception classes for robust error handling in browser automation workflows.
236
237
```python { .api }
238
from browser_use.config import CONFIG
239
from browser_use.exceptions import LLMException
240
241
# Configuration properties
242
CONFIG.BROWSER_USE_LOGGING_LEVEL
243
CONFIG.ANONYMIZED_TELEMETRY
244
CONFIG.OPENAI_API_KEY
245
CONFIG.ANTHROPIC_API_KEY
246
```
247
248
## Type Definitions
249
250
```python { .api }
251
from typing import Protocol, TypeVar
252
from pydantic import BaseModel
253
254
T = TypeVar('T')
255
256
class BaseChatModel(Protocol):
257
model: str
258
provider: str
259
260
async def ainvoke(
261
self,
262
messages: list[BaseMessage],
263
output_format: type[T] = None
264
) -> ChatInvokeCompletion: ...
265
266
class AgentStructuredOutput(Protocol):
267
"""Base protocol for structured output models."""
268
pass
269
270
class TabInfo(BaseModel):
271
"""Browser tab information."""
272
url: str
273
title: str
274
target_id: str # Tab identifier
275
parent_target_id: str | None = None
276
277
class EnhancedDOMTreeNode(BaseModel):
278
"""Enhanced DOM tree node with interaction capabilities."""
279
tag: str
280
text: str | None = None
281
attributes: dict[str, str] = {}
282
index: int
283
284
class AgentState(BaseModel):
285
"""Agent state for advanced configuration."""
286
pass
287
288
class CloudSync(BaseModel):
289
"""Cloud synchronization service."""
290
pass
291
```