or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

agent-orchestration.mdbrowser-actions.mdbrowser-session.mddom-processing.mdindex.mdllm-integration.mdtask-results.md

agent-orchestration.mddocs/

0

# Agent Orchestration

1

2

Core agent functionality for autonomous browser task execution. The Agent class serves as the main orchestrator, coordinating language models, browser sessions, and action execution to complete complex web automation tasks.

3

4

## Capabilities

5

6

### Agent Creation and Configuration

7

8

The Agent class provides comprehensive configuration options for task execution, browser control, and LLM integration.

9

10

```python { .api }

11

class Agent:

12

def __init__(

13

self,

14

task: str,

15

llm: BaseChatModel = ChatOpenAI(model='gpt-4o-mini'),

16

# Optional browser parameters

17

browser_profile: BrowserProfile = None,

18

browser_session: BrowserSession = None,

19

browser: BrowserSession = None, # Alias for browser_session

20

tools: Tools = None,

21

controller: Tools = None, # Alias for tools

22

# Initial agent run parameters

23

sensitive_data: dict[str, str | dict[str, str]] = None,

24

initial_actions: list[dict[str, dict[str, Any]]] = None,

25

# Cloud callbacks

26

register_new_step_callback: Callable = None,

27

register_done_callback: Callable = None,

28

register_external_agent_status_raise_error_callback: Callable[[], Awaitable[bool]] = None,

29

# Agent settings

30

output_model_schema: type[AgentStructuredOutput] = None,

31

use_vision: bool = True,

32

save_conversation_path: str | Path = None,

33

save_conversation_path_encoding: str = 'utf-8',

34

max_failures: int = 3,

35

override_system_message: str = None,

36

extend_system_message: str = None,

37

generate_gif: bool | str = False,

38

available_file_paths: list[str] = None,

39

include_attributes: list[str] = None,

40

max_actions_per_step: int = 10,

41

use_thinking: bool = True,

42

flash_mode: bool = False,

43

max_history_items: int = None,

44

page_extraction_llm: BaseChatModel = None,

45

# Advanced parameters

46

injected_agent_state: AgentState = None,

47

source: str = None,

48

file_system_path: str = None,

49

task_id: str = None,

50

cloud_sync: CloudSync = None,

51

calculate_cost: bool = False,

52

display_files_in_done_text: bool = True,

53

include_tool_call_examples: bool = False,

54

vision_detail_level: Literal['auto', 'low', 'high'] = 'auto',

55

llm_timeout: int = 90,

56

step_timeout: int = 120,

57

directly_open_url: bool = True,

58

include_recent_events: bool = False,

59

**kwargs

60

):

61

"""

62

Create an AI agent for browser automation tasks.

63

64

Parameters:

65

- task: Description of the task to be performed

66

- llm: Language model instance (defaults to ChatOpenAI(model='gpt-4o-mini'))

67

- browser_profile: Browser configuration settings

68

- browser_session: Existing browser session to use

69

- browser: Alias for browser_session parameter

70

- tools: Custom tools/actions registry

71

- controller: Alias for tools parameter

72

- sensitive_data: Credentials and sensitive information for the agent

73

- initial_actions: Actions to execute before main task

74

- register_new_step_callback: Callback for new step events

75

- register_done_callback: Callback for task completion events

76

- register_external_agent_status_raise_error_callback: Callback for external status checks

77

- output_model_schema: Schema for structured output

78

- use_vision: Enable vision capabilities for screenshot analysis

79

- save_conversation_path: Path to save conversation history

80

- save_conversation_path_encoding: Encoding for saved conversation files

81

- max_failures: Maximum consecutive failures before stopping

82

- override_system_message: Replace default system prompt

83

- extend_system_message: Add to default system prompt

84

- generate_gif: Generate GIF recording of agent actions

85

- available_file_paths: Files available to the agent

86

- include_attributes: DOM attributes to include in element descriptions

87

- max_actions_per_step: Maximum actions per execution step

88

- use_thinking: Enable internal reasoning mode

89

- flash_mode: Enable faster execution mode with reduced prompting

90

- max_history_items: Maximum history items to keep in memory

91

- page_extraction_llm: Separate LLM for page content extraction

92

- injected_agent_state: Pre-configured agent state for advanced usage

93

- source: Source identifier for tracking

94

- file_system_path: Path to agent file system

95

- task_id: Unique identifier for the task

96

- cloud_sync: Cloud synchronization service instance

97

- calculate_cost: Calculate and track API costs

98

- display_files_in_done_text: Show files in completion messages

99

- include_tool_call_examples: Include examples in tool calls

100

- vision_detail_level: Vision processing detail level ('auto', 'low', 'high')

101

- llm_timeout: LLM request timeout in seconds

102

- step_timeout: Step execution timeout in seconds

103

- directly_open_url: Open URLs directly without confirmation

104

- include_recent_events: Include recent browser events in context

105

- **kwargs: Additional configuration parameters

106

"""

107

```

108

109

### Task Execution

110

111

Primary methods for running agent tasks with both asynchronous and synchronous interfaces.

112

113

```python { .api }

114

async def run(self, max_steps: int = 100) -> AgentHistoryList:

115

"""

116

Execute the agent task asynchronously.

117

118

Parameters:

119

- max_steps: Maximum number of execution steps

120

121

Returns:

122

AgentHistoryList: Complete execution history with results

123

"""

124

125

def run_sync(self, max_steps: int = 100) -> AgentHistoryList:

126

"""

127

Execute the agent task synchronously.

128

129

Parameters:

130

- max_steps: Maximum number of execution steps

131

132

Returns:

133

AgentHistoryList: Complete execution history with results

134

"""

135

```

136

137

### Step-by-Step Execution

138

139

Fine-grained control over agent execution for debugging and custom workflows.

140

141

```python { .api }

142

async def step(self, step_info: AgentStepInfo = None) -> None:

143

"""

144

Execute a single step of the agent task.

145

146

Parameters:

147

- step_info: Optional step information for context

148

"""

149

150

async def take_step(self, step_info: AgentStepInfo = None) -> tuple[bool, bool]:

151

"""

152

Take a step and return completion status.

153

154

Parameters:

155

- step_info: Optional step information for context

156

157

Returns:

158

tuple[bool, bool]: (is_done, is_valid)

159

"""

160

```

161

162

### Task Management

163

164

Methods for dynamic task modification and execution control.

165

166

```python { .api }

167

def add_new_task(self, new_task: str) -> None:

168

"""

169

Add a new task to the agent's task list.

170

171

Parameters:

172

- new_task: Additional task description

173

"""

174

175

def pause() -> None:

176

"""Pause agent execution."""

177

178

def resume() -> None:

179

"""Resume paused agent execution."""

180

181

def stop() -> None:

182

"""Stop agent execution immediately."""

183

```

184

185

### History and State Management

186

187

Methods for saving, loading, and managing execution history.

188

189

```python { .api }

190

def save_history(self, file_path: str | Path = None) -> None:

191

"""

192

Save execution history to file.

193

194

Parameters:

195

- file_path: Path to save history (optional)

196

"""

197

198

async def load_and_rerun(

199

self,

200

history_file: str | Path = None

201

) -> list[ActionResult]:

202

"""

203

Load and replay execution history.

204

205

Parameters:

206

- history_file: Path to history file to replay

207

208

Returns:

209

list[ActionResult]: Results from replayed actions

210

"""

211

212

async def close(self) -> None:

213

"""Clean up resources and close connections."""

214

```

215

216

### System Prompt Management

217

218

Advanced prompt engineering capabilities for customizing agent behavior.

219

220

```python { .api }

221

class SystemPrompt:

222

def __init__(

223

self,

224

action_description: str,

225

max_actions_per_step: int = 10,

226

override_system_message: str = None,

227

extend_system_message: str = None,

228

use_thinking: bool = True,

229

flash_mode: bool = False

230

):

231

"""

232

Manage system prompts for agent behavior.

233

234

Parameters:

235

- action_description: Description of available actions

236

- max_actions_per_step: Maximum actions per step

237

- override_system_message: Replace default system message

238

- extend_system_message: Add to default system message

239

- use_thinking: Enable thinking mode

240

- flash_mode: Enable flash mode

241

"""

242

243

def get_system_message(self) -> SystemMessage:

244

"""Get formatted system prompt message."""

245

```

246

247

## Usage Examples

248

249

### Basic Agent Usage

250

251

```python

252

from browser_use import Agent, ChatOpenAI

253

254

# Simple task execution

255

agent = Agent(

256

task="Go to Google and search for 'Python programming'",

257

llm=ChatOpenAI(model="gpt-4o")

258

)

259

260

result = agent.run_sync()

261

print(f"Task completed: {result.is_done()}")

262

print(f"Final result: {result.final_result()}")

263

```

264

265

### Advanced Configuration

266

267

```python

268

from browser_use import Agent, BrowserProfile, Tools, ChatAnthropic

269

270

# Custom browser profile

271

profile = BrowserProfile(

272

headless=False,

273

user_data_dir="/tmp/browser-data",

274

allowed_domains=["*.github.com", "*.stackoverflow.com"]

275

)

276

277

# Custom tools with exclusions

278

tools = Tools(exclude_actions=["search_google"])

279

280

# Agent with advanced configuration

281

agent = Agent(

282

task="Navigate to GitHub and find Python repositories",

283

llm=ChatAnthropic(model="claude-3-sonnet-20240229"),

284

browser_profile=profile,

285

tools=tools,

286

use_vision=True,

287

max_failures=5,

288

generate_gif=True,

289

extend_system_message="Be extra careful with form submissions."

290

)

291

292

result = await agent.run(max_steps=50)

293

```

294

295

### Structured Output

296

297

```python

298

from pydantic import BaseModel

299

from browser_use import Agent

300

301

class SearchResult(BaseModel):

302

title: str

303

url: str

304

description: str

305

306

agent = Agent(

307

task="Search for AI research papers and extract details",

308

output_model_schema=SearchResult

309

)

310

311

result = agent.run_sync()

312

structured_data = result.final_result() # Returns SearchResult instance

313

```

314

315

### Step-by-Step Execution

316

317

```python

318

from browser_use import Agent

319

320

agent = Agent(task="Multi-step web scraping task")

321

322

# Execute step by step for debugging

323

while not agent.is_done():

324

await agent.step()

325

print(f"Current step: {agent.current_step}")

326

if agent.has_error():

327

print(f"Error: {agent.last_error}")

328

break

329

330

# Save progress

331

agent.save_history("execution_log.json")

332

```

333

334

### History Replay

335

336

```python

337

from browser_use import Agent

338

339

agent = Agent(task="Replay previous execution")

340

results = await agent.load_and_rerun("execution_log.json")

341

342

for result in results:

343

print(f"Action: {result.action}, Success: {result.success}")

344

```

345

346

## Type Definitions

347

348

```python { .api }

349

from typing import Any, Optional

350

from pathlib import Path

351

352

class AgentStepInfo:

353

"""Information context for agent step execution."""

354

pass

355

356

class SystemMessage:

357

"""Formatted system message for LLM prompting."""

358

content: str

359

```