CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-browser-use

AI-powered browser automation library that enables language models to control web browsers for automated tasks

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

browser-actions.mddocs/

Browser Actions and Tools

Extensible action system with built-in browser automation capabilities. The Tools class provides a registry of actions that agents can execute, including navigation, element interaction, form handling, and custom action registration.

Capabilities

Tools Registry and Execution

Core action registry and execution engine for browser automation actions.

class Tools:
    def __init__(
        self,
        exclude_actions: list[str] = None,
        output_model: type = None,
        display_files_in_done_text: bool = True
    ):
        """
        Create tools registry for browser actions.

        Parameters:
        - exclude_actions: List of action names to exclude from registry
        - output_model: Type for structured output formatting
        - display_files_in_done_text: Show files in completion messages
        """

    async def act(
        self,
        action: ActionModel,
        browser_session: BrowserSession,
        controller: Any = None
    ) -> ActionResult:
        """
        Execute a browser action.

        Parameters:
        - action: Action model with parameters
        - browser_session: Browser session to execute action on
        - controller: Optional controller context

        Returns:
        ActionResult: Execution result with success/failure status
        """

    @property
    def registry(self) -> ActionRegistry:
        """Access to action registry for custom action registration."""

Custom Action Registration

System for registering custom browser actions with the tools registry.

# Decorator for registering custom actions
def action(description: str, param_model: type[BaseModel] = None):
    """
    Decorator to register custom browser actions.

    Parameters:
    - description: Description of what the action does
    - param_model: Pydantic model for action parameters

    Usage:
    @tools.registry.action("Custom action description")
    async def custom_action(param: str) -> ActionResult:
        # Action implementation
        return ActionResult(success=True)
    """

Built-in Navigation Actions

Core navigation and URL management actions.

def search_google(query: str) -> ActionResult:
    """
    Search Google with the provided query.

    Parameters:
    - query: Search query string

    Returns:
    ActionResult: Search execution result
    """

def go_to_url(url: str) -> ActionResult:
    """
    Navigate browser to specified URL.

    Parameters:
    - url: Target URL to navigate to

    Returns:
    ActionResult: Navigation result
    """

Element Interaction Actions

Actions for interacting with DOM elements including clicking, text input, and form handling.

def click_element(index: int) -> ActionResult:
    """
    Click DOM element by its index.

    Parameters:
    - index: Element index from DOM serialization

    Returns:
    ActionResult: Click execution result
    """

def input_text(index: int, text: str) -> ActionResult:
    """
    Input text into form element.

    Parameters:
    - index: Element index of input field
    - text: Text to input into the field

    Returns:
    ActionResult: Text input result
    """

def send_keys(keys: str) -> ActionResult:
    """
    Send keyboard keys to the browser.

    Parameters:
    - keys: Key combination to send (e.g., "Ctrl+C", "Enter", "Tab")

    Returns:
    ActionResult: Key sending result
    """

def upload_file(index: int, file_path: str) -> ActionResult:
    """
    Upload file to file input element.

    Parameters:
    - index: Element index of file input
    - file_path: Path to file to upload

    Returns:
    ActionResult: File upload result
    """

Page Navigation Actions

Actions for page scrolling and viewport management.

def scroll(down: bool, num_pages: float) -> ActionResult:
    """
    Scroll page up or down.

    Parameters:
    - down: True to scroll down, False to scroll up
    - num_pages: Number of pages to scroll (can be fractional)

    Returns:
    ActionResult: Scroll execution result
    """

Tab Management Actions

Actions for managing browser tabs and windows.

def switch_tab(tab_id: str) -> ActionResult:
    """
    Switch to different browser tab.

    Parameters:
    - tab_id: Identifier of target tab

    Returns:
    ActionResult: Tab switch result
    """

def close_tab(tab_id: str) -> ActionResult:
    """
    Close browser tab.

    Parameters:
    - tab_id: Identifier of tab to close

    Returns:
    ActionResult: Tab close result
    """

Form and Dropdown Actions

Specialized actions for form element interaction and dropdown handling.

def get_dropdown_options(index: int) -> ActionResult:
    """
    Get available options from dropdown element.

    Parameters:
    - index: Element index of dropdown/select element

    Returns:
    ActionResult: Dropdown options with extracted_content containing option list
    """

def select_dropdown_option(index: int, option_value: str) -> ActionResult:
    """
    Select option from dropdown element.

    Parameters:
    - index: Element index of dropdown/select element
    - option_value: Value of option to select

    Returns:
    ActionResult: Option selection result
    """

Task Completion Actions

Actions for marking tasks as complete and providing results.

def done(text: str, files: list[str] = None) -> ActionResult:
    """
    Mark task as completed with result text.

    Parameters:
    - text: Completion message or result description
    - files: Optional list of file paths to attach to result

    Returns:
    ActionResult: Task completion result with is_done=True
    """

Action Parameter Models

Pydantic models for structured action parameters and validation.

class SearchGoogleAction(BaseModel):
    """Parameters for Google search action."""
    query: str

class GoToUrlAction(BaseModel):
    """Parameters for URL navigation action."""
    url: str

class ClickElementAction(BaseModel):
    """Parameters for element clicking action."""
    index: int

class InputTextAction(BaseModel):
    """Parameters for text input action."""
    index: int
    text: str

class ScrollAction(BaseModel):
    """Parameters for page scrolling action."""
    down: bool
    num_pages: float

class SwitchTabAction(BaseModel):
    """Parameters for tab switching action."""
    tab_id: str

class CloseTabAction(BaseModel):
    """Parameters for tab closing action."""
    tab_id: str

class SendKeysAction(BaseModel):
    """Parameters for keyboard input action."""
    keys: str

class UploadFileAction(BaseModel):
    """Parameters for file upload action."""
    index: int
    file_path: str

class GetDropdownOptionsAction(BaseModel):
    """Parameters for dropdown inspection action."""
    index: int

class SelectDropdownOptionAction(BaseModel):
    """Parameters for dropdown selection action."""
    index: int
    option_value: str

class DoneAction(BaseModel):
    """Parameters for task completion action."""
    text: str
    files: list[str] = None

Action Model Base

Base class for all action models with common functionality.

class ActionModel(BaseModel):
    """Base model for browser actions."""
    
    def get_index(self) -> int | None:
        """
        Get element index from action parameters.

        Returns:
        int | None: Element index if action targets specific element
        """

    def set_index(self, index: int) -> None:
        """
        Set element index for action.

        Parameters:
        - index: Element index to set
        """

Usage Examples

Basic Action Execution

from browser_use import Tools, BrowserSession

tools = Tools()
session = BrowserSession()

# Execute navigation action
result = await tools.act(
    action=GoToUrlAction(url="https://example.com"),
    browser_session=session
)

if result.success:
    print("Navigation successful")
else:
    print(f"Navigation failed: {result.error}")

Custom Tools Configuration

from browser_use import Tools

# Exclude certain actions
tools = Tools(
    exclude_actions=["search_google", "upload_file"],
    display_files_in_done_text=False
)

# Tools now available: go_to_url, click_element, input_text, etc.
# But NOT: search_google, upload_file

Custom Action Registration

from browser_use import Tools, ActionResult
from pydantic import BaseModel

class CustomActionParams(BaseModel):
    target: str
    options: dict = {}

tools = Tools()

@tools.registry.action("Perform custom browser operation", CustomActionParams)
async def custom_browser_action(target: str, options: dict = {}) -> ActionResult:
    """Custom action implementation."""
    try:
        # Perform custom browser operation
        result = f"Custom action performed on {target}"
        return ActionResult(
            success=True,
            extracted_content=result
        )
    except Exception as e:
        return ActionResult(
            success=False,
            error=str(e)
        )

# Use custom action
result = await tools.act(
    action=CustomActionParams(target="special-element", options={"mode": "test"}),
    browser_session=session
)

Form Interaction Workflow

from browser_use import Tools, BrowserSession

tools = Tools()
session = BrowserSession()

# Navigate to form page
await tools.act(GoToUrlAction(url="https://example.com/form"), session)

# Fill form fields
await tools.act(InputTextAction(index=1, text="John Doe"), session)
await tools.act(InputTextAction(index=2, text="john@example.com"), session)

# Handle dropdown
dropdown_result = await tools.act(GetDropdownOptionsAction(index=3), session)
print(f"Available options: {dropdown_result.extracted_content}")

await tools.act(SelectDropdownOptionAction(index=3, option_value="option1"), session)

# Submit form
await tools.act(ClickElementAction(index=4), session)

# Mark task complete
await tools.act(DoneAction(text="Form submitted successfully"), session)

File Upload Workflow

from browser_use import Tools, BrowserSession

tools = Tools()
session = BrowserSession()

# Navigate to upload page
await tools.act(GoToUrlAction(url="https://example.com/upload"), session)

# Upload file
result = await tools.act(
    UploadFileAction(index=2, file_path="/path/to/document.pdf"),
    session
)

if result.success:
    # Continue with form if needed
    await tools.act(ClickElementAction(index=3), session)  # Submit button
    await tools.act(DoneAction(text="File uploaded successfully"), session)

Keyboard Shortcuts

from browser_use import Tools, BrowserSession

tools = Tools()
session = BrowserSession()

# Navigate to page
await tools.act(GoToUrlAction(url="https://example.com"), session)

# Use keyboard shortcuts
await tools.act(SendKeysAction(keys="Ctrl+F"), session)  # Open find
await tools.act(InputTextAction(index=1, text="search term"), session)
await tools.act(SendKeysAction(keys="Enter"), session)  # Search
await tools.act(SendKeysAction(keys="Escape"), session)  # Close find

Type Definitions

from typing import Any, Optional
from pydantic import BaseModel

class ActionRegistry:
    """Registry for browser actions."""
    def action(self, description: str, param_model: type[BaseModel] = None): ...

Controller = Tools  # Type alias for backward compatibility

Install with Tessl CLI

npx tessl i tessl/pypi-browser-use

docs

agent-orchestration.md

browser-actions.md

browser-session.md

dom-processing.md

index.md

llm-integration.md

task-results.md

tile.json