Tessl Tile for pypi/browser-use@0.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-browser-use

AI-powered browser automation library that enables language models to control web browsers for automated tasks

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/browser-use@0.7.x

To install, run

npx @tessl/cli install tessl/pypi-browser-use@0.7.0

0
# Browser-Use
1

2
A comprehensive Python library that enables AI agents to control web browsers for automated tasks. Browser-use provides an intelligent agent framework combining browser automation capabilities with language model integration, supporting multiple LLM providers and offering sophisticated DOM manipulation, real-time browser control, and task execution features.
3

4
## Package Information
5

6
- **Package Name**: browser-use
7
- **Language**: Python
8
- **Installation**: `pip install browser-use`
9
- **Repository**: https://github.com/browser-use/browser-use
10
- **Documentation**: https://docs.browser-use.com
11

12
## Core Imports
13

14
```python
15
import browser_use
16
```
17

18
Common patterns for agent-based automation:
19

20
```python
21
from browser_use import Agent, BrowserSession, ChatOpenAI
22
```
23

24
Individual component imports:
25

26
```python
27
from browser_use import (
28
    Agent, BrowserSession, BrowserProfile, Tools, 
29
    SystemPrompt, ActionResult, AgentHistoryList,
30
    ChatOpenAI, ChatAnthropic, ChatGoogle
31
)
32
```
33

34
## Basic Usage
35

36
```python
37
from browser_use import Agent, ChatOpenAI
38

39
# Create an agent with a task
40
agent = Agent(
41
    task="Search for weather in New York and extract the temperature",
42
    llm=ChatOpenAI(model="gpt-4o")
43
)
44

45
# Run the agent task (async)
46
result = await agent.run()
47

48
# Run the agent task (sync)
49
result = agent.run_sync()
50

51
# Check if task completed successfully
52
if result.is_successful():
53
    print(f"Task completed: {result.final_result()}")
54
else:
55
    print(f"Task failed: {result.errors()}")
56
```
57

58
```python
59
from browser_use import Agent, BrowserProfile, BrowserSession
60

61
# Custom browser configuration
62
profile = BrowserProfile(
63
    headless=True,
64
    allowed_domains=["*.google.com", "*.wikipedia.org"],
65
    downloads_path="/tmp/downloads"
66
)
67

68
# Create browser session with custom profile
69
session = BrowserSession(browser_profile=profile)
70

71
# Agent with custom browser
72
agent = Agent(
73
    task="Search Wikipedia for Python programming language",
74
    browser_session=session
75
)
76

77
result = agent.run_sync()
78
```
79

80
## Architecture
81

82
Browser-use implements a multi-layered architecture for AI-powered browser automation:
83

84
- **Agent Layer**: High-level task orchestration using language models for decision-making
85
- **Browser Session**: CDP-based browser control and state management
86
- **Tools Registry**: Extensible action system with built-in browser automation actions
87
- **DOM Service**: Intelligent DOM extraction, serialization, and element indexing
88
- **LLM Integration**: Multi-provider support (OpenAI, Anthropic, Google, Groq, Azure, Ollama)
89
- **Observation System**: Screenshot capture, text extraction, and state tracking
90

91
This design enables AI agents to understand web pages visually and semantically, make intelligent decisions about interactions, and execute complex multi-step browser workflows autonomously.
92

93
## Capabilities
94

95
### Agent Orchestration
96

97
Core agent functionality for task execution, including the main Agent class, execution control, history management, and task configuration options.
98

99
```python { .api }
100
class Agent:
101
    def __init__(
102
        self,
103
        task: str,
104
        llm: BaseChatModel = ChatOpenAI(model='gpt-4o-mini'),
105
        browser_session: BrowserSession = None,
106
        tools: Tools = None,
107
        use_vision: bool = True,
108
        max_failures: int = 3,
109
        **kwargs
110
    ): ...
111
    
112
    async def run(self, max_steps: int = 100) -> AgentHistoryList: ...
113
    def run_sync(self, max_steps: int = 100) -> AgentHistoryList: ...
114
    async def step(self, step_info: AgentStepInfo = None) -> None: ...
115
```
116

117
[Agent Orchestration](./agent-orchestration.md)
118

119
### Browser Session Management
120

121
Browser session creation, configuration, and control including profile management, browser lifecycle, and basic navigation capabilities.
122

123
```python { .api }
124
class BrowserSession:
125
    async def get_browser_state_summary(self, **kwargs) -> BrowserStateSummary: ...
126
    async def get_tabs(self) -> list[TabInfo]: ...
127
    async def get_element_by_index(self, index: int) -> EnhancedDOMTreeNode | None: ...
128
    async def get_current_page_url(self) -> str: ...
129
    async def get_current_page_title(self) -> str: ...
130

131
class BrowserProfile:
132
    def __init__(
133
        self,
134
        headless: bool = False,
135
        user_data_dir: str = None,
136
        allowed_domains: list[str] = None,
137
        proxy: ProxySettings = None,
138
        **kwargs
139
    ): ...
140
```
141

142
[Browser Session Management](./browser-session.md)
143

144
### Browser Actions and Tools
145

146
Extensible action system with built-in browser automation capabilities including navigation, element interaction, form handling, and custom action registration.
147

148
```python { .api }
149
class Tools:
150
    def __init__(
151
        self,
152
        exclude_actions: list[str] = None,
153
        output_model: type = None
154
    ): ...
155
    
156
    async def act(
157
        self,
158
        action: ActionModel,
159
        browser_session: BrowserSession,
160
        **kwargs
161
    ) -> ActionResult: ...
162

163
# Built-in actions available
164
def search_google(query: str): ...
165
def go_to_url(url: str): ...
166
def click_element(index: int): ...
167
def input_text(index: int, text: str): ...
168
def scroll(down: bool, num_pages: float): ...
169
def done(text: str): ...
170
```
171

172
[Browser Actions and Tools](./browser-actions.md)
173

174
### LLM Integration
175

176
Multi-provider language model support with consistent interfaces for OpenAI, Anthropic, Google, Groq, Azure OpenAI, and Ollama models.
177

178
```python { .api }
179
class ChatOpenAI:
180
    def __init__(
181
        self,
182
        model: str = "gpt-4o-mini",
183
        temperature: float = 0.2,
184
        frequency_penalty: float = 0.3
185
    ): ...
186

187
class ChatAnthropic:
188
    def __init__(self, model: str = "claude-3-sonnet-20240229"): ...
189

190
class ChatGoogle:
191
    def __init__(self, model: str = "gemini-pro"): ...
192
```
193

194
[LLM Integration](./llm-integration.md)
195

196
### DOM Processing and Element Interaction
197

198
Advanced DOM extraction, serialization, element indexing, and interaction capabilities for intelligent web page understanding.
199

200
```python { .api }
201
class DomService:
202
    def __init__(
203
        self,
204
        browser_session: BrowserSession,
205
        cross_origin_iframes: bool = False
206
    ): ...
207
```
208

209
[DOM Processing](./dom-processing.md)
210

211
### Task Results and History
212

213
Comprehensive result tracking, history management, and execution analysis including success/failure detection, error handling, and workflow replay capabilities.
214

215
```python { .api }
216
class ActionResult:
217
    is_done: bool = None
218
    success: bool = None
219
    error: str = None
220
    extracted_content: str = None
221
    attachments: list[str] = None
222

223
class AgentHistoryList:
224
    def is_done(self) -> bool: ...
225
    def is_successful(self) -> bool: ...
226
    def final_result(self) -> str: ...
227
    def errors(self) -> list[str]: ...
228
    def save_to_file(self, filepath: str) -> None: ...
229
```
230

231
[Task Results and History](./task-results.md)
232

233
## Configuration and Error Handling
234

235
Global configuration management and exception classes for robust error handling in browser automation workflows.
236

237
```python { .api }
238
from browser_use.config import CONFIG
239
from browser_use.exceptions import LLMException
240

241
# Configuration properties
242
CONFIG.BROWSER_USE_LOGGING_LEVEL
243
CONFIG.ANONYMIZED_TELEMETRY
244
CONFIG.OPENAI_API_KEY
245
CONFIG.ANTHROPIC_API_KEY
246
```
247

248
## Type Definitions
249

250
```python { .api }
251
from typing import Protocol, TypeVar
252
from pydantic import BaseModel
253

254
T = TypeVar('T')
255

256
class BaseChatModel(Protocol):
257
    model: str
258
    provider: str
259
    
260
    async def ainvoke(
261
        self,
262
        messages: list[BaseMessage],
263
        output_format: type[T] = None
264
    ) -> ChatInvokeCompletion: ...
265

266
class AgentStructuredOutput(Protocol):
267
    """Base protocol for structured output models."""
268
    pass
269

270
class TabInfo(BaseModel):
271
    """Browser tab information."""
272
    url: str
273
    title: str
274
    target_id: str  # Tab identifier
275
    parent_target_id: str | None = None
276

277
class EnhancedDOMTreeNode(BaseModel):
278
    """Enhanced DOM tree node with interaction capabilities."""
279
    tag: str
280
    text: str | None = None
281
    attributes: dict[str, str] = {}
282
    index: int
283
    
284
class AgentState(BaseModel):
285
    """Agent state for advanced configuration."""
286
    pass
287

288
class CloudSync(BaseModel):
289
    """Cloud synchronization service."""
290
    pass
291
```