Tessl Tile for pypi/googlesearch-python@1.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

index.mddocs/

0
# Googlesearch Python
1

2
A Python library for scraping the Google search engine using web scraping techniques. It leverages requests for HTTP communication and BeautifulSoup4 for HTML parsing to extract search results, titles, URLs, and descriptions from Google's search pages.
3

4
## Package Information
5

6
- **Package Name**: googlesearch-python
7
- **Package Type**: Library
8
- **Language**: Python
9
- **Installation**: `pip install googlesearch-python`
10

11
## Core Imports
12

13
```python
14
from googlesearch import search
15
```
16

17
For advanced search results with structured data:
18

19
```python
20
from googlesearch import search, SearchResult
21
```
22

23
For user agent utilities:
24

25
```python
26
from googlesearch import get_useragent
27
```
28

29
## Basic Usage
30

31
```python
32
from googlesearch import search
33

34
# Simple search - returns URLs only
35
for url in search("Python programming", num_results=10):
36
    print(url)
37

38
# Advanced search - returns SearchResult objects with structured data
39
for result in search("Python programming", num_results=10, advanced=True):
40
    print(f"Title: {result.title}")
41
    print(f"URL: {result.url}")
42
    print(f"Description: {result.description}")
43
    print("---")
44

45
# Search with language and region settings
46
for url in search("Python programming", lang="en", region="us", num_results=5):
47
    print(url)
48
```
49

50
## Capabilities
51

52
### Google Search
53

54
Performs Google search queries with extensive customization options including result count control, language and region specification, proxy support, and safe search toggles.
55

56
```python { .api }
57
def search(
58
    term: str,
59
    num_results: int = 10,
60
    lang: str = "en",
61
    proxy: str = None,
62
    advanced: bool = False,
63
    sleep_interval: int = 0,
64
    timeout: int = 5,
65
    safe: str = "active",
66
    ssl_verify: bool = None,
67
    region: str = None,
68
    start_num: int = 0,
69
    unique: bool = False
70
):
71
    """
72
    Search the Google search engine and yield results.
73

74
    Parameters:
75
    - term: Search query string
76
    - num_results: Number of results to return (default: 10)
77
    - lang: Language code for search results (default: "en")
78
    - proxy: HTTP/HTTPS proxy URL (optional)
79
    - advanced: Return SearchResult objects instead of URLs (default: False)
80
    - sleep_interval: Sleep time between requests in seconds (default: 0)
81
    - timeout: Request timeout in seconds (default: 5)
82
    - safe: Safe search setting - "active" or None (default: "active")
83
    - ssl_verify: SSL certificate verification (optional)
84
    - region: Country code for region-specific results (optional)
85
    - start_num: Starting result number for pagination (default: 0)
86
    - unique: Filter duplicate URLs (default: False)
87

88
    Yields:
89
    - str: URLs when advanced=False
90
    - SearchResult: Result objects when advanced=True
91

92
    Examples:
93
    Basic search returning URLs:
94
    >>> for url in search("machine learning", num_results=5):
95
    ...     print(url)
96

97
    Advanced search with structured results:
98
    >>> for result in search("AI research", advanced=True, num_results=3):
99
    ...     print(f"{result.title}: {result.url}")
100

101
    Search with language and region:
102
    >>> for url in search("café", lang="fr", region="fr", num_results=5):
103
    ...     print(url)
104

105
    Search with proxy and SSL settings:
106
    >>> proxy_url = "http://proxy.example.com:8080"
107
    >>> for url in search("secure search", proxy=proxy_url, ssl_verify=False):
108
    ...     print(url)
109

110
    Paginated search with rate limiting:
111
    >>> for url in search("large dataset", num_results=200, sleep_interval=2):
112
    ...     print(url)
113
    """
114
```
115

116
### User Agent Generation
117

118
Generates random user agent strings for HTTP requests to improve request diversity and reduce detection.
119

120
```python { .api }
121
def get_useragent() -> str:
122
    """
123
    Generate a random user agent string mimicking Lynx browser format.
124
    
125
    The user agent string components:
126
    - Lynx version: Lynx/x.y.z where x is 2-3, y is 8-9, and z is 0-2
127
    - libwww version: libwww-FM/x.y where x is 2-3 and y is 13-15
128
    - SSL-MM version: SSL-MM/x.y where x is 1-2 and y is 3-5
129
    - OpenSSL version: OpenSSL/x.y.z where x is 1-3, y is 0-4, and z is 0-9
130

131
    Returns:
132
    str: A randomly generated user agent string in the format:
133
         "Lynx/x.y.z libwww-FM/x.y SSL-MM/x.y OpenSSL/x.y.z"
134

135
    Examples:
136
    >>> agent = get_useragent()
137
    >>> print(agent)
138
    "Lynx/2.8.1 libwww-FM/2.14 SSL-MM/1.4 OpenSSL/1.2.7"
139
    """
140
```
141

142
## Types
143

144
```python { .api }
145
class SearchResult:
146
    """
147
    Data structure for advanced search results containing structured information
148
    about each search result including URL, title, and description.
149
    """
150
    
151
    def __init__(self, url: str, title: str, description: str):
152
        """
153
        Initialize a SearchResult object.
154

155
        Parameters:
156
        - url: The result URL
157
        - title: The result title
158
        - description: The result description/snippet
159
        """
160
        self.url = url
161
        self.title = title
162
        self.description = description
163

164
    def __repr__(self) -> str:
165
        """
166
        Return string representation of the SearchResult.
167

168
        Returns:
169
        str: String representation in format:
170
             "SearchResult(url={url}, title={title}, description={description})"
171
        """
172
```
173

174
## Configuration Options
175

176
### Language Codes
177
Use standard language codes like "en" (English), "fr" (French), "de" (German), "es" (Spanish), "ja" (Japanese), etc.
178

179
### Region Codes
180
Use [Country Codes](https://developers.google.com/custom-search/docs/json_api_reference#countryCodes) like "us" (United States), "uk" (United Kingdom), "ca" (Canada), "au" (Australia), etc.
181

182
### Safe Search Options
183
- `"active"`: Enable safe search filtering (default)
184
- `None`: Disable safe search filtering
185

186
### Proxy Configuration
187
Supports both HTTP and HTTPS proxies:
188
```python
189
# HTTP proxy
190
proxy = "http://proxy.example.com:8080"
191

192
# HTTPS proxy
193
proxy = "https://proxy.example.com:8080"
194

195
# Proxy with authentication
196
proxy = "http://user:pass@proxy.example.com:8080"
197
```
198

199
## Error Handling
200

201
The library may raise the following exceptions:
202

203
- **requests.exceptions.RequestException**: Network-related errors (timeouts, connection errors)
204
- **requests.exceptions.HTTPError**: HTTP status errors from Google (via resp.raise_for_status())
205
- **requests.exceptions.Timeout**: Request timeout errors when timeout parameter is exceeded
206
- **requests.exceptions.ConnectionError**: Connection-related network errors
207
- **bs4.FeatureNotFound**: BeautifulSoup parsing errors during HTML processing
208
- **ValueError**: Invalid parameter values or parsing errors
209
- **AttributeError**: HTML structure parsing errors when expected elements are missing
210

211
Example error handling:
212
```python
213
import requests
214
from googlesearch import search
215

216
try:
217
    results = list(search("example query", num_results=10, timeout=10))
218
except requests.exceptions.Timeout:
219
    print("Request timed out")
220
except requests.exceptions.RequestException as e:
221
    print(f"Network error: {e}")
222
except Exception as e:
223
    print(f"Unexpected error: {e}")
224
```
225

226
## Rate Limiting Best Practices
227

228
To avoid being blocked by Google:
229

230
1. **Use sleep intervals**: Set `sleep_interval` to 1-5 seconds for large result sets
231
2. **Limit concurrent requests**: Don't run multiple searches simultaneously
232
3. **Use reasonable result counts**: Avoid requesting excessive numbers of results
233
4. **Rotate user agents**: The library automatically uses random user agents
234
5. **Consider using proxies**: For high-volume usage, rotate through different proxies
235

236
Example with rate limiting:
237
```python
238
# Good practice for large result sets
239
for url in search("large query", num_results=100, sleep_interval=2):
240
    print(url)
241
```
242

243
## Dependencies
244

245
The package requires:
246
- **beautifulsoup4 >= 4.9**: HTML parsing for search result extraction
247
- **requests >= 2.20**: HTTP client for Google search requests

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/