Tessl Tile for pypi/browser-use@0.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

agent-orchestration.md browser-actions.md browser-session.md dom-processing.md index.md llm-integration.md task-results.md

dom-processing.mddocs/

0
# DOM Processing
1

2
Advanced DOM extraction, serialization, element indexing, and interaction capabilities for intelligent web page understanding. The DomService provides sophisticated DOM analysis and manipulation features that enable AI agents to understand and interact with web pages effectively.
3

4
## Capabilities
5

6
### DOM Service Core
7

8
Central service for DOM tree extraction and manipulation with support for cross-origin content and intelligent element indexing.
9

10
```python { .api }
11
class DomService:
12
    def __init__(
13
        self,
14
        browser_session: BrowserSession,
15
        logger: logging.Logger = None,
16
        cross_origin_iframes: bool = False,
17
        include_attributes: list[str] = None
18
    ):
19
        """
20
        Initialize DOM processing service.
21

22
        Parameters:
23
        - browser_session: Browser session for DOM access
24
        - logger: Optional custom logger instance
25
        - cross_origin_iframes: Include cross-origin iframe content
26
        - include_attributes: DOM attributes to include in serialization
27
        """
28

29
    async def get_dom_tree(self) -> DomTree:
30
        """
31
        Extract complete DOM tree from current page.
32

33
        Returns:
34
        DomTree: Structured representation of page DOM
35
        """
36

37
    async def get_clickable_elements(self) -> list[ElementInfo]:
38
        """
39
        Extract all clickable/interactable elements from page.
40

41
        Returns:
42
        list[ElementInfo]: List of elements that can be interacted with
43
        """
44

45
    async def serialize_dom(
46
        self,
47
        include_text: bool = True,
48
        include_attributes: bool = True,
49
        max_depth: int = None
50
    ) -> str:
51
        """
52
        Serialize DOM tree to text representation.
53

54
        Parameters:
55
        - include_text: Include text content of elements
56
        - include_attributes: Include element attributes
57
        - max_depth: Maximum tree depth to serialize
58

59
        Returns:
60
        str: Text representation of DOM structure
61
        """
62

63
    async def find_elements_by_text(self, text: str) -> list[ElementInfo]:
64
        """
65
        Find elements containing specific text.
66

67
        Parameters:
68
        - text: Text to search for in elements
69

70
        Returns:
71
        list[ElementInfo]: Elements containing the text
72
        """
73

74
    async def find_elements_by_selector(self, selector: str) -> list[ElementInfo]:
75
        """
76
        Find elements using CSS selector.
77

78
        Parameters:
79
        - selector: CSS selector string
80

81
        Returns:
82
        list[ElementInfo]: Elements matching the selector
83
        """
84

85
    async def get_element_screenshot(self, index: int) -> str:
86
        """
87
        Take screenshot of specific element.
88

89
        Parameters:
90
        - index: Element index
91

92
        Returns:
93
        str: Path to element screenshot image
94
        """
95
```
96

97
### DOM Tree Structure
98

99
Hierarchical representation of web page DOM structure with element relationships and metadata.
100

101
```python { .api }
102
class DomTree:
103
    """Complete DOM tree representation."""
104
    root: DomNode
105
    total_elements: int
106
    clickable_elements: int
107
    form_elements: int
108
    interactive_elements: int
109

110
class DomNode:
111
    """Individual DOM node representation."""
112
    tag: str
113
    text: str
114
    attributes: dict[str, str]
115
    index: int
116
    children: list[DomNode]
117
    parent: DomNode
118
    bounding_box: BoundingBox
119
    is_clickable: bool
120
    is_visible: bool
121
    xpath: str
122
    css_selector: str
123

124
class BoundingBox:
125
    """Element positioning and dimensions."""
126
    x: float
127
    y: float
128
    width: float
129
    height: float
130
    top: float
131
    left: float
132
    bottom: float
133
    right: float
134
```
135

136
### Element Information
137

138
Detailed information about individual DOM elements for interaction and analysis.
139

140
```python { .api }
141
class ElementInfo:
142
    """Comprehensive element information."""
143
    index: int
144
    tag: str
145
    text: str
146
    attributes: dict[str, str]
147
    bounding_box: BoundingBox
148
    is_clickable: bool
149
    is_visible: bool
150
    is_enabled: bool
151
    element_type: str  # 'button', 'input', 'link', 'text', etc.
152
    xpath: str
153
    css_selector: str
154
    parent_index: int
155
    children_indices: list[int]
156

157
class FormElementInfo(ElementInfo):
158
    """Form-specific element information."""
159
    input_type: str  # 'text', 'password', 'email', 'checkbox', etc.
160
    is_required: bool
161
    placeholder: str
162
    value: str
163
    min_value: str
164
    max_value: str
165
    pattern: str
166

167
class SelectElementInfo(ElementInfo):
168
    """Select/dropdown element information."""
169
    options: list[SelectOption]
170
    selected_value: str
171
    multiple: bool
172

173
class SelectOption:
174
    """Option within select element."""
175
    value: str
176
    text: str
177
    selected: bool
178
    disabled: bool
179
```
180

181
### Element Interaction Analysis
182

183
Advanced analysis of element interactability and interaction patterns.
184

185
```python { .api }
186
class InteractionAnalyzer:
187
    """Analyze element interaction possibilities."""
188
    
189
    async def analyze_clickability(self, element: ElementInfo) -> ClickabilityAnalysis:
190
        """
191
        Analyze how clickable an element is.
192

193
        Parameters:
194
        - element: Element to analyze
195

196
        Returns:
197
        ClickabilityAnalysis: Detailed clickability assessment
198
        """
199

200
    async def analyze_form_structure(self, form_index: int) -> FormAnalysis:
201
        """
202
        Analyze form structure and required fields.
203

204
        Parameters:
205
        - form_index: Index of form element
206

207
        Returns:
208
        FormAnalysis: Complete form structure analysis
209
        """
210

211
    async def suggest_interaction_strategy(
212
        self,
213
        target_goal: str
214
    ) -> InteractionStrategy:
215
        """
216
        Suggest best interaction strategy for achieving goal.
217

218
        Parameters:
219
        - target_goal: Description of desired outcome
220

221
        Returns:
222
        InteractionStrategy: Recommended interaction sequence
223
        """
224

225
class ClickabilityAnalysis:
226
    """Analysis of element clickability."""
227
    is_clickable: bool
228
    confidence: float  # 0.0-1.0
229
    blocking_elements: list[ElementInfo]
230
    alternative_elements: list[ElementInfo]
231
    click_coordinates: tuple[float, float]
232

233
class FormAnalysis:
234
    """Complete form structure analysis."""
235
    form_element: ElementInfo
236
    required_fields: list[FormElementInfo]
237
    optional_fields: list[FormElementInfo]
238
    submit_buttons: list[ElementInfo]
239
    validation_rules: dict[str, str]
240

241
class InteractionStrategy:
242
    """Recommended interaction sequence."""
243
    steps: list[InteractionStep]
244
    confidence: float
245
    alternatives: list[InteractionStep]
246

247
class InteractionStep:
248
    """Individual interaction step."""
249
    action: str  # 'click', 'input', 'scroll', 'wait'
250
    element_index: int
251
    parameters: dict[str, Any]
252
    expected_outcome: str
253
```
254

255
### Content Extraction
256

257
Advanced content extraction capabilities for text, images, and structured data.
258

259
```python { .api }
260
class ContentExtractor:
261
    """Extract various types of content from pages."""
262
    
263
    async def extract_text_content(
264
        self,
265
        clean: bool = True,
266
        include_hidden: bool = False
267
    ) -> str:
268
        """
269
        Extract text content from page.
270

271
        Parameters:
272
        - clean: Clean and normalize text
273
        - include_hidden: Include hidden element text
274

275
        Returns:
276
        str: Extracted text content
277
        """
278

279
    async def extract_links(
280
        self,
281
        internal_only: bool = False,
282
        include_anchors: bool = True
283
    ) -> list[LinkInfo]:
284
        """
285
        Extract all links from page.
286

287
        Parameters:
288
        - internal_only: Only include internal links
289
        - include_anchors: Include anchor links
290

291
        Returns:
292
        list[LinkInfo]: All links found on page
293
        """
294

295
    async def extract_images(
296
        self,
297
        include_data_urls: bool = False,
298
        min_size: tuple[int, int] = None
299
    ) -> list[ImageInfo]:
300
        """
301
        Extract image information from page.
302

303
        Parameters:
304
        - include_data_urls: Include base64 data URLs
305
        - min_size: Minimum image dimensions (width, height)
306

307
        Returns:
308
        list[ImageInfo]: All images found on page
309
        """
310

311
    async def extract_tables(self) -> list[TableInfo]:
312
        """
313
        Extract structured table data.
314

315
        Returns:
316
        list[TableInfo]: All tables with structured data
317
        """
318

319
class LinkInfo:
320
    """Link element information."""
321
    url: str
322
    text: str
323
    title: str
324
    element_index: int
325
    is_external: bool
326
    is_anchor: bool
327

328
class ImageInfo:
329
    """Image element information."""
330
    src: str
331
    alt: str
332
    title: str
333
    width: int
334
    height: int
335
    element_index: int
336
    is_data_url: bool
337

338
class TableInfo:
339
    """Table structure and data."""
340
    headers: list[str]
341
    rows: list[list[str]]
342
    element_index: int
343
    caption: str
344
```
345

346
## Usage Examples
347

348
### Basic DOM Analysis
349

350
```python
351
from browser_use import BrowserSession, DomService
352

353
session = BrowserSession()
354
dom_service = DomService(session)
355

356
# Navigate to page
357
await session.navigate_to_url("https://example.com")
358

359
# Get complete DOM tree
360
dom_tree = await dom_service.get_dom_tree()
361
print(f"Total elements: {dom_tree.total_elements}")
362
print(f"Clickable elements: {dom_tree.clickable_elements}")
363

364
# Get clickable elements
365
clickable = await dom_service.get_clickable_elements()
366
for element in clickable:
367
    print(f"Index {element.index}: {element.tag} - {element.text}")
368
```
369

370
### Element Search and Interaction
371

372
```python
373
from browser_use import DomService, BrowserSession
374

375
session = BrowserSession()
376
dom_service = DomService(session)
377

378
await session.navigate_to_url("https://example.com/search")
379

380
# Find search box by text
381
search_elements = await dom_service.find_elements_by_text("Search")
382
if search_elements:
383
    search_box = search_elements[0]
384
    print(f"Found search box at index: {search_box.index}")
385

386
# Find elements by CSS selector
387
buttons = await dom_service.find_elements_by_selector("button.primary")
388
for button in buttons:
389
    print(f"Button {button.index}: {button.text}")
390

391
# Take screenshot of specific element
392
if buttons:
393
    screenshot_path = await dom_service.get_element_screenshot(buttons[0].index)
394
    print(f"Button screenshot saved: {screenshot_path}")
395
```
396

397
### Advanced DOM Configuration
398

399
```python
400
from browser_use import DomService, BrowserSession
401

402
session = BrowserSession()
403

404
# Configure DOM service with custom attributes
405
dom_service = DomService(
406
    browser_session=session,
407
    cross_origin_iframes=True,  # Include iframe content
408
    include_attributes=[
409
        'id', 'class', 'name', 'data-testid', 
410
        'aria-label', 'placeholder', 'href', 'src'
411
    ]
412
)
413

414
await session.navigate_to_url("https://complex-site.com")
415

416
# Serialize DOM with custom options
417
dom_text = await dom_service.serialize_dom(
418
    include_text=True,
419
    include_attributes=True,
420
    max_depth=5  # Limit depth for large pages
421
)
422

423
print("DOM Structure:")
424
print(dom_text[:1000])  # First 1000 characters
425
```
426

427
### Form Analysis Workflow
428

429
```python
430
from browser_use import DomService, BrowserSession
431

432
session = BrowserSession()
433
dom_service = DomService(session)
434

435
await session.navigate_to_url("https://example.com/contact")
436

437
# Find all form elements
438
forms = await dom_service.find_elements_by_selector("form")
439

440
for form in forms:
441
    print(f"Form {form.index}:")
442
    
443
    # Analyze form structure
444
    analyzer = InteractionAnalyzer()
445
    form_analysis = await analyzer.analyze_form_structure(form.index)
446
    
447
    print(f"  Required fields: {len(form_analysis.required_fields)}")
448
    for field in form_analysis.required_fields:
449
        print(f"    {field.tag}[{field.input_type}]: {field.placeholder}")
450
    
451
    print(f"  Submit buttons: {len(form_analysis.submit_buttons)}")
452
    for button in form_analysis.submit_buttons:
453
        print(f"    {button.text}")
454
```
455

456
### Content Extraction Workflow
457

458
```python
459
from browser_use import DomService, BrowserSession, ContentExtractor
460

461
session = BrowserSession()
462
dom_service = DomService(session)
463
extractor = ContentExtractor()
464

465
await session.navigate_to_url("https://news-site.com/article")
466

467
# Extract page text content
468
text_content = await extractor.extract_text_content(clean=True)
469
print(f"Article text ({len(text_content)} chars):")
470
print(text_content[:500])
471

472
# Extract all links
473
links = await extractor.extract_links(internal_only=False)
474
print(f"\nFound {len(links)} links:")
475
for link in links[:5]:  # First 5 links
476
    print(f"  {link.text}: {link.url}")
477

478
# Extract images
479
images = await extractor.extract_images(min_size=(100, 100))
480
print(f"\nFound {len(images)} images:")
481
for image in images[:3]:  # First 3 images
482
    print(f"  {image.alt}: {image.src}")
483

484
# Extract tables if any
485
tables = await extractor.extract_tables()
486
if tables:
487
    print(f"\nFound {len(tables)} tables:")
488
    for i, table in enumerate(tables):
489
        print(f"  Table {i}: {len(table.headers)} columns, {len(table.rows)} rows")
490
```
491

492
### Interaction Strategy Planning
493

494
```python
495
from browser_use import DomService, BrowserSession, InteractionAnalyzer
496

497
session = BrowserSession()
498
dom_service = DomService(session)
499
analyzer = InteractionAnalyzer()
500

501
await session.navigate_to_url("https://ecommerce-site.com/product")
502

503
# Plan interaction strategy for adding item to cart
504
strategy = await analyzer.suggest_interaction_strategy(
505
    "Add this product to shopping cart"
506
)
507

508
print(f"Interaction strategy (confidence: {strategy.confidence}):")
509
for i, step in enumerate(strategy.steps):
510
    print(f"  Step {i+1}: {step.action} on element {step.element_index}")
511
    print(f"    Expected: {step.expected_outcome}")
512

513
# Execute strategy
514
for step in strategy.steps:
515
    if step.action == "click":
516
        await session.click_element(step.element_index)
517
    elif step.action == "input":
518
        text = step.parameters.get("text", "")
519
        await session.input_text(step.element_index, text)
520
    elif step.action == "scroll":
521
        await session.scroll(step.parameters.get("down", True), 1)
522
```
523

524
### Element Clickability Analysis
525

526
```python
527
from browser_use import DomService, BrowserSession, InteractionAnalyzer
528

529
session = BrowserSession()
530
dom_service = DomService(session)
531
analyzer = InteractionAnalyzer()
532

533
await session.navigate_to_url("https://complex-ui.com")
534

535
# Find potential target element
536
target_elements = await dom_service.find_elements_by_text("Subscribe")
537

538
for element in target_elements:
539
    # Analyze clickability
540
    analysis = await analyzer.analyze_clickability(element)
541
    
542
    print(f"Element {element.index} clickability:")
543
    print(f"  Clickable: {analysis.is_clickable}")
544
    print(f"  Confidence: {analysis.confidence}")
545
    
546
    if analysis.blocking_elements:
547
        print(f"  Blocked by {len(analysis.blocking_elements)} elements")
548
    
549
    if analysis.alternative_elements:
550
        print(f"  {len(analysis.alternative_elements)} alternatives available")
551
    
552
    if analysis.is_clickable:
553
        x, y = analysis.click_coordinates
554
        print(f"  Best click point: ({x}, {y})")
555
```
556

557
## Configuration Constants
558

559
```python { .api }
560
# Default DOM attributes to include in serialization
561
DEFAULT_INCLUDE_ATTRIBUTES: list[str] = [
562
    'id', 'class', 'name', 'aria-label', 'data-testid',
563
    'placeholder', 'href', 'src', 'type', 'value'
564
]
565

566
# Element interaction priorities
567
CLICKABLE_ELEMENT_PRIORITIES: dict[str, int] = {
568
    'button': 10,
569
    'a': 9,
570
    'input[type="submit"]': 8,
571
    'input[type="button"]': 7,
572
    'select': 6,
573
    'input': 5
574
}
575

576
# Maximum DOM serialization limits
577
MAX_DOM_TEXT_LENGTH: int = 50000
578
MAX_ELEMENT_DEPTH: int = 20
579
MAX_CLICKABLE_ELEMENTS: int = 100
580
```

Version

Tile

Files

dom-processing.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

dom-processing.mddocs/