Tessl Tile for pypi/tavily-python@0.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

async.md content.md hybrid-rag.md index.md mapping.md search.md

mapping.mddocs/

0
# Website Mapping
1

2
Discover and map website structures without extracting full content, useful for understanding site architecture and finding relevant pages before detailed crawling or extraction operations.
3

4
## Capabilities
5

6
### Website Structure Mapping
7

8
Map website structure and discover pages without extracting full content, providing an efficient way to understand site architecture and identify relevant content areas.
9

10
```python { .api }
11
def map(
12
    url: str,
13
    max_depth: int = None,
14
    max_breadth: int = None,
15
    limit: int = None,
16
    instructions: str = None,
17
    select_paths: Sequence[str] = None,
18
    select_domains: Sequence[str] = None,
19
    exclude_paths: Sequence[str] = None,
20
    exclude_domains: Sequence[str] = None,
21
    allow_external: bool = None,
22
    include_images: bool = None,
23
    timeout: int = 60,
24
    **kwargs
25
) -> dict:
26
    """
27
    Map website structure and discover pages without full content extraction.
28

29
    Parameters:
30
    - url: Starting URL for mapping
31
    - max_depth: Maximum depth to explore from starting URL
32
    - max_breadth: Maximum number of pages to discover per depth level
33
    - limit: Total maximum number of pages to map
34
    - instructions: Natural language instructions for mapping behavior
35
    - select_paths: List of path patterns to include in mapping
36
    - select_domains: List of domains to explore
37
    - exclude_paths: List of path patterns to exclude from mapping
38
    - exclude_domains: List of domains to avoid
39
    - allow_external: Allow mapping external domains from starting domain
40
    - include_images: Include image URLs in mapping results
41
    - timeout: Request timeout in seconds (max 120)
42
    - **kwargs: Additional mapping parameters
43

44
    Returns:
45
    Dict containing website structure map with discovered pages and hierarchy
46
    """
47
```
48

49
**Usage Examples:**
50

51
```python
52
# Basic website mapping
53
site_map = client.map(
54
    url="https://docs.python.org",
55
    max_depth=3,
56
    limit=100
57
)
58

59
# Focused documentation mapping
60
docs_map = client.map(
61
    url="https://api.example.com",
62
    instructions="Map API documentation structure, focus on reference sections",
63
    select_paths=["/docs/*", "/reference/*", "/api/*"],
64
    max_depth=4,
65
    limit=200
66
)
67

68
# Multi-domain site mapping
69
company_map = client.map(
70
    url="https://company.com",
71
    allow_external=True,
72
    select_domains=[
73
        "company.com",
74
        "docs.company.com",
75
        "support.company.com"
76
    ],
77
    exclude_paths=["/admin/*", "/private/*"],
78
    max_depth=2
79
)
80
```
81

82
## Mapping Use Cases
83

84
### Pre-Crawl Site Analysis
85

86
Use mapping to understand site structure before performing expensive crawling operations:
87

88
```python
89
# Map first to understand structure
90
site_structure = client.map(
91
    url="https://large-company.com",
92
    max_depth=2,
93
    limit=50
94
)
95

96
# Analyze the structure
97
print("Discovered pages:")
98
for page in site_structure.get('results', []):
99
    print(f"- {page['url']} (depth: {page.get('depth', 0)})")
100

101
# Then crawl specific areas based on mapping results
102
focused_crawl = client.crawl(
103
    url="https://large-company.com/products",
104
    select_paths=["/products/*", "/solutions/*"],
105
    max_depth=3,
106
    format="markdown"
107
)
108
```
109

110
### Content Discovery
111

112
Identify content-rich areas of websites before extraction:
113

114
```python
115
# Map to find content sections
116
content_map = client.map(
117
    url="https://news-site.com",
118
    instructions="Find main content sections like articles, reports, and analysis",
119
    exclude_paths=["/ads/*", "/widgets/*", "/social/*"],
120
    max_depth=2
121
)
122

123
# Extract content from discovered high-value pages
124
high_value_pages = [
125
    page['url'] for page in content_map.get('results', [])
126
    if 'article' in page['url'] or 'report' in page['url']
127
]
128

129
content_results = client.extract(
130
    urls=high_value_pages[:10],  # Extract from top 10 pages
131
    format="markdown",
132
    extract_depth="advanced"
133
)
134
```
135

136
### Site Architecture Analysis
137

138
Understand website organization and navigation patterns:
139

140
```python
141
# Comprehensive site mapping
142
architecture_map = client.map(
143
    url="https://enterprise-site.com",
144
    instructions="Map the complete site structure to understand organization",
145
    max_depth=3,
146
    max_breadth=20,
147
    limit=500
148
)
149

150
# Analyze navigation patterns
151
pages_by_depth = {}
152
for page in architecture_map.get('results', []):
153
    depth = page.get('depth', 0)
154
    if depth not in pages_by_depth:
155
        pages_by_depth[depth] = []
156
    pages_by_depth[depth].append(page['url'])
157

158
print("Site structure by depth:")
159
for depth, urls in pages_by_depth.items():
160
    print(f"Depth {depth}: {len(urls)} pages")
161
    for url in urls[:5]:  # Show first 5 URLs per depth
162
        print(f"  - {url}")
163
```
164

165
## Advanced Mapping Patterns
166

167
### Selective Domain Exploration
168

169
Map specific parts of multi-domain organizations:
170

171
```python
172
# Map organization's web presence
173
org_map = client.map(
174
    url="https://university.edu",
175
    allow_external=True,
176
    select_domains=[
177
        "university.edu",           # Main site
178
        "research.university.edu",  # Research portal
179
        "library.university.edu",   # Library system
180
        "news.university.edu"       # News site
181
    ],
182
    exclude_domains=[
183
        "admin.university.edu",     # Admin systems
184
        "student.university.edu"    # Student portals
185
    ],
186
    instructions="Map public-facing educational content and research information",
187
    max_depth=2
188
)
189
```
190

191
### Topic-Focused Mapping
192

193
Discover content related to specific topics or themes:
194

195
```python
196
# Map AI/ML content across a tech site
197
ai_content_map = client.map(
198
    url="https://tech-company.com",
199
    instructions="Find pages related to artificial intelligence, machine learning, and data science",
200
    select_paths=[
201
        "/ai/*", 
202
        "/machine-learning/*", 
203
        "/data-science/*",
204
        "/blog/*ai*",
205
        "/research/*ml*"
206
    ],
207
    max_depth=3,
208
    limit=150
209
)
210

211
# Map specific product documentation
212
product_docs_map = client.map(
213
    url="https://company.com/products/api-gateway",
214
    instructions="Map all documentation related to the API Gateway product",
215
    select_paths=[
216
        "/products/api-gateway/*",
217
        "/docs/api-gateway/*",
218
        "/guides/api-gateway/*"
219
    ],
220
    max_depth=4
221
)
222
```
223

224
### Quality-Based Filtering
225

226
Map only high-quality content pages:
227

228
```python
229
# Map substantial content pages
230
quality_map = client.map(
231
    url="https://content-site.com",
232
    instructions="Focus on pages with substantial text content, skip navigation and utility pages",
233
    exclude_paths=[
234
        "/search*",      # Search pages
235
        "/tag/*",        # Tag pages  
236
        "/category/*",   # Category pages
237
        "/author/*",     # Author pages
238
        "*/print*",      # Print versions
239
        "*/amp*"         # AMP versions
240
    ],
241
    max_depth=2,
242
    limit=200
243
)
244
```
245

246
## Mapping Results Analysis
247

248
Process and analyze mapping results effectively:
249

250
```python
251
# Comprehensive mapping analysis
252
site_map = client.map(
253
    url="https://target-site.com",
254
    max_depth=3,
255
    limit=300
256
)
257

258
# Analyze results
259
results = site_map.get('results', [])
260

261
# Group by URL patterns
262
url_patterns = {}
263
for page in results:
264
    url = page['url']
265
    path_parts = url.split('/')[3:]  # Skip protocol and domain
266
    if path_parts:
267
        pattern = '/' + path_parts[0] + '/*'
268
        if pattern not in url_patterns:
269
            url_patterns[pattern] = []
270
        url_patterns[pattern].append(url)
271

272
print("Content organization:")
273
for pattern, urls in url_patterns.items():
274
    print(f"{pattern}: {len(urls)} pages")
275

276
# Find potential high-value targets for extraction
277
content_candidates = [
278
    page['url'] for page in results 
279
    if any(keyword in page['url'].lower() 
280
           for keyword in ['article', 'post', 'guide', 'tutorial', 'doc'])
281
]
282

283
print(f"\nFound {len(content_candidates)} potential content pages for extraction")
284
```
285

286
## Performance Considerations
287

288
Mapping is more efficient than crawling for site discovery:
289

290
```python
291
# Efficient large site exploration
292
efficient_map = client.map(
293
    url="https://large-site.com",
294
    max_depth=2,        # Shallow but broad exploration
295
    max_breadth=25,     # More pages per level
296
    limit=200,          # Reasonable total limit
297
    timeout=60          # Standard timeout
298
)
299

300
# Quick site overview
301
quick_overview = client.map(
302
    url="https://new-site.com",
303
    max_depth=1,        # Just immediate links
304
    limit=50,           # Small set for overview
305
    timeout=30          # Fast exploration
306
)
307
```
308

309
## Error Handling
310

311
Handle mapping errors and partial results:
312

313
```python
314
from tavily import TavilyClient, TimeoutError, BadRequestError
315

316
try:
317
    site_map = client.map("https://example.com", limit=100)
318
    
319
    # Process successful mapping
320
    discovered_pages = site_map.get('results', [])
321
    print(f"Successfully mapped {len(discovered_pages)} pages")
322
    
323
    # Handle any failed discoveries
324
    failed_mappings = site_map.get('failed_results', [])
325
    if failed_mappings:
326
        print(f"Failed to map {len(failed_mappings)} pages")
327
        
328
except TimeoutError:
329
    print("Mapping operation timed out - partial results may be available")
330
except BadRequestError as e:
331
    print(f"Invalid mapping parameters: {e}")
332
```

Version

Tile

Files

mapping.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

mapping.mddocs/