or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# Googlesearch Python

1

2

A Python library for scraping the Google search engine using web scraping techniques. It leverages requests for HTTP communication and BeautifulSoup4 for HTML parsing to extract search results, titles, URLs, and descriptions from Google's search pages.

3

4

## Package Information

5

6

- **Package Name**: googlesearch-python

7

- **Package Type**: Library

8

- **Language**: Python

9

- **Installation**: `pip install googlesearch-python`

10

11

## Core Imports

12

13

```python

14

from googlesearch import search

15

```

16

17

For advanced search results with structured data:

18

19

```python

20

from googlesearch import search, SearchResult

21

```

22

23

For user agent utilities:

24

25

```python

26

from googlesearch import get_useragent

27

```

28

29

## Basic Usage

30

31

```python

32

from googlesearch import search

33

34

# Simple search - returns URLs only

35

for url in search("Python programming", num_results=10):

36

print(url)

37

38

# Advanced search - returns SearchResult objects with structured data

39

for result in search("Python programming", num_results=10, advanced=True):

40

print(f"Title: {result.title}")

41

print(f"URL: {result.url}")

42

print(f"Description: {result.description}")

43

print("---")

44

45

# Search with language and region settings

46

for url in search("Python programming", lang="en", region="us", num_results=5):

47

print(url)

48

```

49

50

## Capabilities

51

52

### Google Search

53

54

Performs Google search queries with extensive customization options including result count control, language and region specification, proxy support, and safe search toggles.

55

56

```python { .api }

57

def search(

58

term: str,

59

num_results: int = 10,

60

lang: str = "en",

61

proxy: str = None,

62

advanced: bool = False,

63

sleep_interval: int = 0,

64

timeout: int = 5,

65

safe: str = "active",

66

ssl_verify: bool = None,

67

region: str = None,

68

start_num: int = 0,

69

unique: bool = False

70

):

71

"""

72

Search the Google search engine and yield results.

73

74

Parameters:

75

- term: Search query string

76

- num_results: Number of results to return (default: 10)

77

- lang: Language code for search results (default: "en")

78

- proxy: HTTP/HTTPS proxy URL (optional)

79

- advanced: Return SearchResult objects instead of URLs (default: False)

80

- sleep_interval: Sleep time between requests in seconds (default: 0)

81

- timeout: Request timeout in seconds (default: 5)

82

- safe: Safe search setting - "active" or None (default: "active")

83

- ssl_verify: SSL certificate verification (optional)

84

- region: Country code for region-specific results (optional)

85

- start_num: Starting result number for pagination (default: 0)

86

- unique: Filter duplicate URLs (default: False)

87

88

Yields:

89

- str: URLs when advanced=False

90

- SearchResult: Result objects when advanced=True

91

92

Examples:

93

Basic search returning URLs:

94

>>> for url in search("machine learning", num_results=5):

95

... print(url)

96

97

Advanced search with structured results:

98

>>> for result in search("AI research", advanced=True, num_results=3):

99

... print(f"{result.title}: {result.url}")

100

101

Search with language and region:

102

>>> for url in search("café", lang="fr", region="fr", num_results=5):

103

... print(url)

104

105

Search with proxy and SSL settings:

106

>>> proxy_url = "http://proxy.example.com:8080"

107

>>> for url in search("secure search", proxy=proxy_url, ssl_verify=False):

108

... print(url)

109

110

Paginated search with rate limiting:

111

>>> for url in search("large dataset", num_results=200, sleep_interval=2):

112

... print(url)

113

"""

114

```

115

116

### User Agent Generation

117

118

Generates random user agent strings for HTTP requests to improve request diversity and reduce detection.

119

120

```python { .api }

121

def get_useragent() -> str:

122

"""

123

Generate a random user agent string mimicking Lynx browser format.

124

125

The user agent string components:

126

- Lynx version: Lynx/x.y.z where x is 2-3, y is 8-9, and z is 0-2

127

- libwww version: libwww-FM/x.y where x is 2-3 and y is 13-15

128

- SSL-MM version: SSL-MM/x.y where x is 1-2 and y is 3-5

129

- OpenSSL version: OpenSSL/x.y.z where x is 1-3, y is 0-4, and z is 0-9

130

131

Returns:

132

str: A randomly generated user agent string in the format:

133

"Lynx/x.y.z libwww-FM/x.y SSL-MM/x.y OpenSSL/x.y.z"

134

135

Examples:

136

>>> agent = get_useragent()

137

>>> print(agent)

138

"Lynx/2.8.1 libwww-FM/2.14 SSL-MM/1.4 OpenSSL/1.2.7"

139

"""

140

```

141

142

## Types

143

144

```python { .api }

145

class SearchResult:

146

"""

147

Data structure for advanced search results containing structured information

148

about each search result including URL, title, and description.

149

"""

150

151

def __init__(self, url: str, title: str, description: str):

152

"""

153

Initialize a SearchResult object.

154

155

Parameters:

156

- url: The result URL

157

- title: The result title

158

- description: The result description/snippet

159

"""

160

self.url = url

161

self.title = title

162

self.description = description

163

164

def __repr__(self) -> str:

165

"""

166

Return string representation of the SearchResult.

167

168

Returns:

169

str: String representation in format:

170

"SearchResult(url={url}, title={title}, description={description})"

171

"""

172

```

173

174

## Configuration Options

175

176

### Language Codes

177

Use standard language codes like "en" (English), "fr" (French), "de" (German), "es" (Spanish), "ja" (Japanese), etc.

178

179

### Region Codes

180

Use [Country Codes](https://developers.google.com/custom-search/docs/json_api_reference#countryCodes) like "us" (United States), "uk" (United Kingdom), "ca" (Canada), "au" (Australia), etc.

181

182

### Safe Search Options

183

- `"active"`: Enable safe search filtering (default)

184

- `None`: Disable safe search filtering

185

186

### Proxy Configuration

187

Supports both HTTP and HTTPS proxies:

188

```python

189

# HTTP proxy

190

proxy = "http://proxy.example.com:8080"

191

192

# HTTPS proxy

193

proxy = "https://proxy.example.com:8080"

194

195

# Proxy with authentication

196

proxy = "http://user:pass@proxy.example.com:8080"

197

```

198

199

## Error Handling

200

201

The library may raise the following exceptions:

202

203

- **requests.exceptions.RequestException**: Network-related errors (timeouts, connection errors)

204

- **requests.exceptions.HTTPError**: HTTP status errors from Google (via resp.raise_for_status())

205

- **requests.exceptions.Timeout**: Request timeout errors when timeout parameter is exceeded

206

- **requests.exceptions.ConnectionError**: Connection-related network errors

207

- **bs4.FeatureNotFound**: BeautifulSoup parsing errors during HTML processing

208

- **ValueError**: Invalid parameter values or parsing errors

209

- **AttributeError**: HTML structure parsing errors when expected elements are missing

210

211

Example error handling:

212

```python

213

import requests

214

from googlesearch import search

215

216

try:

217

results = list(search("example query", num_results=10, timeout=10))

218

except requests.exceptions.Timeout:

219

print("Request timed out")

220

except requests.exceptions.RequestException as e:

221

print(f"Network error: {e}")

222

except Exception as e:

223

print(f"Unexpected error: {e}")

224

```

225

226

## Rate Limiting Best Practices

227

228

To avoid being blocked by Google:

229

230

1. **Use sleep intervals**: Set `sleep_interval` to 1-5 seconds for large result sets

231

2. **Limit concurrent requests**: Don't run multiple searches simultaneously

232

3. **Use reasonable result counts**: Avoid requesting excessive numbers of results

233

4. **Rotate user agents**: The library automatically uses random user agents

234

5. **Consider using proxies**: For high-volume usage, rotate through different proxies

235

236

Example with rate limiting:

237

```python

238

# Good practice for large result sets

239

for url in search("large query", num_results=100, sleep_interval=2):

240

print(url)

241

```

242

243

## Dependencies

244

245

The package requires:

246

- **beautifulsoup4 >= 4.9**: HTML parsing for search result extraction

247

- **requests >= 2.20**: HTTP client for Google search requests