or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-browser-use

AI-powered browser automation library that enables language models to control web browsers for automated tasks

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/browser-use@0.7.x

To install, run

npx @tessl/cli install tessl/pypi-browser-use@0.7.0

0

# Browser-Use

1

2

A comprehensive Python library that enables AI agents to control web browsers for automated tasks. Browser-use provides an intelligent agent framework combining browser automation capabilities with language model integration, supporting multiple LLM providers and offering sophisticated DOM manipulation, real-time browser control, and task execution features.

3

4

## Package Information

5

6

- **Package Name**: browser-use

7

- **Language**: Python

8

- **Installation**: `pip install browser-use`

9

- **Repository**: https://github.com/browser-use/browser-use

10

- **Documentation**: https://docs.browser-use.com

11

12

## Core Imports

13

14

```python

15

import browser_use

16

```

17

18

Common patterns for agent-based automation:

19

20

```python

21

from browser_use import Agent, BrowserSession, ChatOpenAI

22

```

23

24

Individual component imports:

25

26

```python

27

from browser_use import (

28

Agent, BrowserSession, BrowserProfile, Tools,

29

SystemPrompt, ActionResult, AgentHistoryList,

30

ChatOpenAI, ChatAnthropic, ChatGoogle

31

)

32

```

33

34

## Basic Usage

35

36

```python

37

from browser_use import Agent, ChatOpenAI

38

39

# Create an agent with a task

40

agent = Agent(

41

task="Search for weather in New York and extract the temperature",

42

llm=ChatOpenAI(model="gpt-4o")

43

)

44

45

# Run the agent task (async)

46

result = await agent.run()

47

48

# Run the agent task (sync)

49

result = agent.run_sync()

50

51

# Check if task completed successfully

52

if result.is_successful():

53

print(f"Task completed: {result.final_result()}")

54

else:

55

print(f"Task failed: {result.errors()}")

56

```

57

58

```python

59

from browser_use import Agent, BrowserProfile, BrowserSession

60

61

# Custom browser configuration

62

profile = BrowserProfile(

63

headless=True,

64

allowed_domains=["*.google.com", "*.wikipedia.org"],

65

downloads_path="/tmp/downloads"

66

)

67

68

# Create browser session with custom profile

69

session = BrowserSession(browser_profile=profile)

70

71

# Agent with custom browser

72

agent = Agent(

73

task="Search Wikipedia for Python programming language",

74

browser_session=session

75

)

76

77

result = agent.run_sync()

78

```

79

80

## Architecture

81

82

Browser-use implements a multi-layered architecture for AI-powered browser automation:

83

84

- **Agent Layer**: High-level task orchestration using language models for decision-making

85

- **Browser Session**: CDP-based browser control and state management

86

- **Tools Registry**: Extensible action system with built-in browser automation actions

87

- **DOM Service**: Intelligent DOM extraction, serialization, and element indexing

88

- **LLM Integration**: Multi-provider support (OpenAI, Anthropic, Google, Groq, Azure, Ollama)

89

- **Observation System**: Screenshot capture, text extraction, and state tracking

90

91

This design enables AI agents to understand web pages visually and semantically, make intelligent decisions about interactions, and execute complex multi-step browser workflows autonomously.

92

93

## Capabilities

94

95

### Agent Orchestration

96

97

Core agent functionality for task execution, including the main Agent class, execution control, history management, and task configuration options.

98

99

```python { .api }

100

class Agent:

101

def __init__(

102

self,

103

task: str,

104

llm: BaseChatModel = ChatOpenAI(model='gpt-4o-mini'),

105

browser_session: BrowserSession = None,

106

tools: Tools = None,

107

use_vision: bool = True,

108

max_failures: int = 3,

109

**kwargs

110

): ...

111

112

async def run(self, max_steps: int = 100) -> AgentHistoryList: ...

113

def run_sync(self, max_steps: int = 100) -> AgentHistoryList: ...

114

async def step(self, step_info: AgentStepInfo = None) -> None: ...

115

```

116

117

[Agent Orchestration](./agent-orchestration.md)

118

119

### Browser Session Management

120

121

Browser session creation, configuration, and control including profile management, browser lifecycle, and basic navigation capabilities.

122

123

```python { .api }

124

class BrowserSession:

125

async def get_browser_state_summary(self, **kwargs) -> BrowserStateSummary: ...

126

async def get_tabs(self) -> list[TabInfo]: ...

127

async def get_element_by_index(self, index: int) -> EnhancedDOMTreeNode | None: ...

128

async def get_current_page_url(self) -> str: ...

129

async def get_current_page_title(self) -> str: ...

130

131

class BrowserProfile:

132

def __init__(

133

self,

134

headless: bool = False,

135

user_data_dir: str = None,

136

allowed_domains: list[str] = None,

137

proxy: ProxySettings = None,

138

**kwargs

139

): ...

140

```

141

142

[Browser Session Management](./browser-session.md)

143

144

### Browser Actions and Tools

145

146

Extensible action system with built-in browser automation capabilities including navigation, element interaction, form handling, and custom action registration.

147

148

```python { .api }

149

class Tools:

150

def __init__(

151

self,

152

exclude_actions: list[str] = None,

153

output_model: type = None

154

): ...

155

156

async def act(

157

self,

158

action: ActionModel,

159

browser_session: BrowserSession,

160

**kwargs

161

) -> ActionResult: ...

162

163

# Built-in actions available

164

def search_google(query: str): ...

165

def go_to_url(url: str): ...

166

def click_element(index: int): ...

167

def input_text(index: int, text: str): ...

168

def scroll(down: bool, num_pages: float): ...

169

def done(text: str): ...

170

```

171

172

[Browser Actions and Tools](./browser-actions.md)

173

174

### LLM Integration

175

176

Multi-provider language model support with consistent interfaces for OpenAI, Anthropic, Google, Groq, Azure OpenAI, and Ollama models.

177

178

```python { .api }

179

class ChatOpenAI:

180

def __init__(

181

self,

182

model: str = "gpt-4o-mini",

183

temperature: float = 0.2,

184

frequency_penalty: float = 0.3

185

): ...

186

187

class ChatAnthropic:

188

def __init__(self, model: str = "claude-3-sonnet-20240229"): ...

189

190

class ChatGoogle:

191

def __init__(self, model: str = "gemini-pro"): ...

192

```

193

194

[LLM Integration](./llm-integration.md)

195

196

### DOM Processing and Element Interaction

197

198

Advanced DOM extraction, serialization, element indexing, and interaction capabilities for intelligent web page understanding.

199

200

```python { .api }

201

class DomService:

202

def __init__(

203

self,

204

browser_session: BrowserSession,

205

cross_origin_iframes: bool = False

206

): ...

207

```

208

209

[DOM Processing](./dom-processing.md)

210

211

### Task Results and History

212

213

Comprehensive result tracking, history management, and execution analysis including success/failure detection, error handling, and workflow replay capabilities.

214

215

```python { .api }

216

class ActionResult:

217

is_done: bool = None

218

success: bool = None

219

error: str = None

220

extracted_content: str = None

221

attachments: list[str] = None

222

223

class AgentHistoryList:

224

def is_done(self) -> bool: ...

225

def is_successful(self) -> bool: ...

226

def final_result(self) -> str: ...

227

def errors(self) -> list[str]: ...

228

def save_to_file(self, filepath: str) -> None: ...

229

```

230

231

[Task Results and History](./task-results.md)

232

233

## Configuration and Error Handling

234

235

Global configuration management and exception classes for robust error handling in browser automation workflows.

236

237

```python { .api }

238

from browser_use.config import CONFIG

239

from browser_use.exceptions import LLMException

240

241

# Configuration properties

242

CONFIG.BROWSER_USE_LOGGING_LEVEL

243

CONFIG.ANONYMIZED_TELEMETRY

244

CONFIG.OPENAI_API_KEY

245

CONFIG.ANTHROPIC_API_KEY

246

```

247

248

## Type Definitions

249

250

```python { .api }

251

from typing import Protocol, TypeVar

252

from pydantic import BaseModel

253

254

T = TypeVar('T')

255

256

class BaseChatModel(Protocol):

257

model: str

258

provider: str

259

260

async def ainvoke(

261

self,

262

messages: list[BaseMessage],

263

output_format: type[T] = None

264

) -> ChatInvokeCompletion: ...

265

266

class AgentStructuredOutput(Protocol):

267

"""Base protocol for structured output models."""

268

pass

269

270

class TabInfo(BaseModel):

271

"""Browser tab information."""

272

url: str

273

title: str

274

target_id: str # Tab identifier

275

parent_target_id: str | None = None

276

277

class EnhancedDOMTreeNode(BaseModel):

278

"""Enhanced DOM tree node with interaction capabilities."""

279

tag: str

280

text: str | None = None

281

attributes: dict[str, str] = {}

282

index: int

283

284

class AgentState(BaseModel):

285

"""Agent state for advanced configuration."""

286

pass

287

288

class CloudSync(BaseModel):

289

"""Cloud synchronization service."""

290

pass

291

```