Tessl Tile for pypi/tavily-python@0.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

async.md content.md hybrid-rag.md index.md mapping.md search.md

content.mddocs/

0
# Content Operations
1

2
Extract content from individual URLs or crawl entire websites with intelligent navigation, content filtering, and structured data extraction capabilities.
3

4
## Capabilities
5

6
### Content Extraction
7

8
Extract structured content from one or more URLs with options for different output formats and extraction depth levels.
9

10
```python { .api }
11
def extract(
12
    urls: Union[List[str], str],
13
    include_images: bool = None,
14
    extract_depth: Literal["basic", "advanced"] = None,
15
    format: Literal["markdown", "text"] = None,
16
    timeout: int = 60,
17
    include_favicon: bool = None,
18
    **kwargs
19
) -> dict:
20
    """
21
    Extract content from single URL or list of URLs.
22

23
    Parameters:
24
    - urls: Single URL string or list of URL strings to extract content from
25
    - include_images: Include image URLs in extracted content
26
    - extract_depth: Extraction thoroughness ("basic" for main content, "advanced" for comprehensive)
27
    - format: Output format ("markdown" for structured text, "text" for plain text)
28
    - timeout: Request timeout in seconds (max 120)
29
    - include_favicon: Include website favicon URLs
30
    - **kwargs: Additional extraction parameters
31

32
    Returns:
33
    Dict containing:
34
    - results: List of extraction result objects with:
35
      - url: Source URL
36
      - content: Extracted content
37
      - title: Page title
38
      - score: Content quality score
39
    - failed_results: List of URLs that failed extraction with error details
40
    """
41
```
42

43
**Usage Examples:**
44

45
```python
46
# Extract from single URL
47
result = client.extract("https://example.com/article")
48
print(result['results'][0]['content'])
49

50
# Extract from multiple URLs
51
urls = [
52
    "https://example.com/page1",
53
    "https://example.com/page2",
54
    "https://example.com/page3"
55
]
56
results = client.extract(
57
    urls=urls,
58
    format="markdown",
59
    extract_depth="advanced",
60
    include_images=True
61
)
62

63
# Process results and handle failures
64
for result in results['results']:
65
    print(f"URL: {result['url']}")
66
    print(f"Title: {result['title']}")
67
    print(f"Content: {result['content'][:200]}...")
68

69
for failed in results['failed_results']:
70
    print(f"Failed to extract: {failed['url']} - {failed['error']}")
71
```
72

73
### Website Crawling
74

75
Intelligently crawl websites with custom navigation instructions, content filtering, and structured data extraction.
76

77
```python { .api }
78
def crawl(
79
    url: str,
80
    max_depth: int = None,
81
    max_breadth: int = None,
82
    limit: int = None,
83
    instructions: str = None,
84
    select_paths: Sequence[str] = None,
85
    select_domains: Sequence[str] = None,
86
    exclude_paths: Sequence[str] = None,
87
    exclude_domains: Sequence[str] = None,
88
    allow_external: bool = None,
89
    include_images: bool = None,
90
    extract_depth: Literal["basic", "advanced"] = None,
91
    format: Literal["markdown", "text"] = None,
92
    timeout: int = 60,
93
    include_favicon: bool = None,
94
    **kwargs
95
) -> dict:
96
    """
97
    Crawl website with intelligent navigation and content extraction.
98

99
    Parameters:
100
    - url: Starting URL for crawling
101
    - max_depth: Maximum depth to crawl from starting URL
102
    - max_breadth: Maximum number of pages to crawl per depth level
103
    - limit: Total maximum number of pages to crawl
104
    - instructions: Natural language instructions for crawling behavior
105
    - select_paths: List of path patterns to include (supports wildcards)
106
    - select_domains: List of domains to crawl
107
    - exclude_paths: List of path patterns to exclude
108
    - exclude_domains: List of domains to avoid
109
    - allow_external: Allow crawling external domains from starting domain
110
    - include_images: Include image URLs in crawled content
111
    - extract_depth: Content extraction thoroughness
112
    - format: Output format for extracted content
113
    - timeout: Request timeout in seconds (max 120)
114
    - include_favicon: Include website favicon URLs
115

116
    Returns:
117
    Dict containing crawling results with pages and extracted content
118
    """
119
```
120

121
**Usage Examples:**
122

123
```python
124
# Basic website crawl
125
crawl_result = client.crawl(
126
    url="https://docs.python.org",
127
    max_depth=2,
128
    limit=20
129
)
130

131
# Advanced crawl with filtering
132
crawl_result = client.crawl(
133
    url="https://example.com",
134
    max_depth=3,
135
    max_breadth=10,
136
    instructions="Focus on documentation and tutorial pages",
137
    select_paths=["/docs/*", "/tutorials/*"],
138
    exclude_paths=["/admin/*", "/private/*"],
139
    format="markdown",
140
    extract_depth="advanced"
141
)
142

143
# Cross-domain crawl
144
crawl_result = client.crawl(
145
    url="https://company.com",
146
    allow_external=True,
147
    select_domains=["company.com", "docs.company.com"],
148
    limit=50
149
)
150
```
151

152
### Advanced Crawling Patterns
153

154
**Targeted Content Crawling:**
155

156
```python
157
# Crawl specific content types
158
blog_crawl = client.crawl(
159
    url="https://techblog.com",
160
    instructions="Only crawl blog posts and articles, skip navigation pages",
161
    select_paths=["/blog/*", "/articles/*", "/posts/*"],
162
    exclude_paths=["/tags/*", "/categories/*", "/authors/*"],
163
    max_depth=2,
164
    format="markdown"
165
)
166

167
# E-commerce product crawl
168
product_crawl = client.crawl(
169
    url="https://store.com",
170
    instructions="Focus on product pages with descriptions and specifications",
171
    select_paths=["/products/*", "/items/*"],
172
    exclude_paths=["/cart/*", "/checkout/*", "/account/*"],
173
    include_images=True,
174
    limit=100
175
)
176
```
177

178
**Research and Documentation Crawling:**
179

180
```python
181
# Academic paper crawl
182
research_crawl = client.crawl(
183
    url="https://university.edu/research",
184
    instructions="Crawl research papers and publications, skip administrative pages",
185
    select_paths=["/papers/*", "/publications/*", "/research/*"],
186
    extract_depth="advanced",
187
    max_depth=3
188
)
189

190
# API documentation crawl
191
docs_crawl = client.crawl(
192
    url="https://api.example.com/docs",
193
    instructions="Focus on API reference and tutorial content",
194
    format="markdown",
195
    max_depth=4,
196
    limit=200
197
)
198
```
199

200
## Crawling Instructions
201

202
The `instructions` parameter accepts natural language descriptions that guide the crawling behavior:
203

204
**Effective Instruction Examples:**
205

206
```python
207
# Content-focused instructions
208
instructions = "Focus on main content pages, skip navigation, sidebar, and footer links"
209

210
# Topic-specific instructions  
211
instructions = "Only crawl pages related to machine learning and AI, ignore general company pages"
212

213
# Quality-focused instructions
214
instructions = "Prioritize pages with substantial text content, skip image galleries and empty pages"
215

216
# Structure-focused instructions
217
instructions = "Follow documentation hierarchy, crawl systematically through sections and subsections"
218
```
219

220
## Path and Domain Filtering
221

222
**Path Pattern Examples:**
223

224
```python
225
# Include patterns
226
select_paths = [
227
    "/docs/*",           # All documentation
228
    "/api/*/reference",  # API reference pages
229
    "/blog/2024/*",      # 2024 blog posts
230
    "*/tutorial*"        # Any tutorial pages
231
]
232

233
# Exclude patterns
234
exclude_paths = [
235
    "/admin/*",          # Admin pages
236
    "/private/*",        # Private content
237
    "*/download*",       # Download pages
238
    "*.pdf",            # PDF files
239
    "*.jpg",            # Image files
240
]
241
```
242

243
**Domain Management:**
244

245
```python
246
# Multi-domain crawling
247
result = client.crawl(
248
    url="https://main-site.com",
249
    allow_external=True,
250
    select_domains=[
251
        "main-site.com",
252
        "docs.main-site.com", 
253
        "blog.main-site.com",
254
        "support.main-site.com"
255
    ],
256
    exclude_domains=[
257
        "ads.main-site.com",
258
        "tracking.main-site.com"
259
    ]
260
)
261
```
262

263
## Performance and Limits
264

265
**Optimization Strategies:**
266

267
```python
268
# Balanced crawl for large sites
269
balanced_crawl = client.crawl(
270
    url="https://large-site.com",
271
    max_depth=2,          # Limit depth to avoid going too deep
272
    max_breadth=15,       # Limit breadth to focus on important pages
273
    limit=100,            # Overall page limit
274
    timeout=90            # Longer timeout for complex sites
275
)
276

277
# Fast shallow crawl
278
quick_crawl = client.crawl(
279
    url="https://site.com",
280
    max_depth=1,          # Only immediate links
281
    limit=20,             # Small page count
282
    timeout=30            # Quick timeout
283
)
284
```
285

286
## Error Handling
287

288
Content operations include robust error handling for failed extractions and crawling issues:
289

290
```python
291
from tavily import TavilyClient, TimeoutError, BadRequestError
292

293
try:
294
    result = client.crawl("https://example.com", limit=50)
295
    
296
    # Process successful results
297
    for page in result.get('results', []):
298
        print(f"Crawled: {page['url']}")
299
    
300
    # Handle any failed pages
301
    for failure in result.get('failed_results', []):
302
        print(f"Failed: {failure['url']} - {failure.get('error', 'Unknown error')}")
303
        
304
except TimeoutError:
305
    print("Crawling operation timed out")
306
except BadRequestError as e:
307
    print(f"Invalid crawl parameters: {e}")
308
```

Version

Tile

Files

content.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

content.mddocs/