0
# Core Browser Operations
1
2
Low-level browser functionality for HTTP requests with automatic BeautifulSoup parsing. The Browser class provides direct request/response handling with session management and is recommended for applications requiring fine-grained control over HTTP interactions.
3
4
## Capabilities
5
6
### Browser Creation and Configuration
7
8
Create a browser instance with optional session, parsing, and adapter configuration.
9
10
```python { .api }
11
class Browser:
12
def __init__(self, session=None, soup_config={'features': 'lxml'},
13
requests_adapters=None, raise_on_404=False, user_agent=None):
14
"""
15
Create a Browser instance.
16
17
Parameters:
18
- session: Optional requests.Session instance
19
- soup_config: BeautifulSoup configuration dict
20
- requests_adapters: Requests adapter configuration
21
- raise_on_404: If True, raise LinkNotFoundError on 404 errors
22
- user_agent: Custom User-Agent string
23
"""
24
```
25
26
**Usage Example:**
27
28
```python
29
import mechanicalsoup
30
import requests
31
32
# Basic browser
33
browser = mechanicalsoup.Browser()
34
35
# Browser with custom session
36
session = requests.Session()
37
browser = mechanicalsoup.Browser(session=session)
38
39
# Browser with custom BeautifulSoup parser
40
browser = mechanicalsoup.Browser(soup_config={'features': 'html.parser'})
41
42
# Browser that raises on 404 errors
43
browser = mechanicalsoup.Browser(raise_on_404=True)
44
```
45
46
### HTTP Request Methods
47
48
Standard HTTP methods with automatic BeautifulSoup parsing of HTML responses.
49
50
```python { .api }
51
def request(self, *args, **kwargs):
52
"""Low-level request method, forwards to session.request()"""
53
54
def get(self, *args, **kwargs):
55
"""HTTP GET request with soup parsing"""
56
57
def post(self, *args, **kwargs):
58
"""HTTP POST request with soup parsing"""
59
60
def put(self, *args, **kwargs):
61
"""HTTP PUT request with soup parsing"""
62
```
63
64
**Usage Example:**
65
66
```python
67
browser = mechanicalsoup.Browser()
68
69
# GET request
70
response = browser.get("https://httpbin.org/get")
71
print(response.soup.title.string)
72
73
# POST request with data
74
response = browser.post("https://httpbin.org/post",
75
data={"key": "value"})
76
77
# PUT request with data
78
response = browser.put("https://httpbin.org/put",
79
json={"updated_key": "updated_value"})
80
81
# Request with headers
82
response = browser.get("https://httpbin.org/headers",
83
headers={"Custom-Header": "value"})
84
```
85
86
### Form Submission
87
88
Submit HTML forms with automatic data extraction and encoding.
89
90
```python { .api }
91
def submit(self, form, url=None, **kwargs):
92
"""
93
Submit a form object.
94
95
Parameters:
96
- form: Form instance to submit
97
- url: Optional URL override for form action
98
- **kwargs: Additional request parameters
99
100
Returns:
101
requests.Response with soup attribute
102
"""
103
```
104
105
**Usage Example:**
106
107
```python
108
from mechanicalsoup import Browser, Form
109
110
browser = Browser()
111
response = browser.get("https://httpbin.org/forms/post")
112
113
# Create and fill form
114
form = Form(response.soup.find("form"))
115
form["custname"] = "John Doe"
116
117
# Submit form
118
result = browser.submit(form)
119
print(result.soup)
120
```
121
122
### Session and Cookie Management
123
124
Manage cookies and session state for authenticated or persistent interactions.
125
126
```python { .api }
127
def set_cookiejar(self, cookiejar):
128
"""Replace the current cookiejar in the requests session"""
129
130
def get_cookiejar(self):
131
"""Get the current cookiejar from the requests session"""
132
```
133
134
**Usage Example:**
135
136
```python
137
import mechanicalsoup
138
from http.cookiejar import CookieJar
139
140
browser = mechanicalsoup.Browser()
141
142
# Get current cookies
143
cookies = browser.get_cookiejar()
144
145
# Set new cookie jar
146
new_jar = CookieJar()
147
browser.set_cookiejar(new_jar)
148
```
149
150
### User Agent Management
151
152
Set and manage the User-Agent header for requests.
153
154
```python { .api }
155
def set_user_agent(self, user_agent):
156
"""
157
Set the User-Agent header for requests.
158
159
Parameters:
160
- user_agent: String to use as User-Agent, or None for default
161
"""
162
```
163
164
**Usage Example:**
165
166
```python
167
browser = mechanicalsoup.Browser()
168
169
# Set custom user agent
170
browser.set_user_agent("MyBot/1.0 (Contact: admin@example.com)")
171
172
# Reset to default
173
browser.set_user_agent(None)
174
```
175
176
### Debugging and Development
177
178
Tools for debugging and development workflow.
179
180
```python { .api }
181
def launch_browser(self, soup):
182
"""
183
Launch external browser with page content for debugging.
184
185
Parameters:
186
- soup: BeautifulSoup object to display
187
"""
188
```
189
190
### Session Cleanup
191
192
Clean up browser resources and close connections.
193
194
```python { .api }
195
def close(self):
196
"""Close the session and clear cookies"""
197
```
198
199
**Usage Example:**
200
201
```python
202
browser = mechanicalsoup.Browser()
203
try:
204
response = browser.get("https://example.com")
205
# Use response...
206
finally:
207
browser.close()
208
```
209
210
### Context Manager Support
211
212
Browser supports context manager protocol for automatic resource cleanup.
213
214
```python { .api }
215
def __enter__(self):
216
"""Enter context manager, returns self"""
217
218
def __exit__(self, *args):
219
"""Exit context manager, calls close() automatically"""
220
```
221
222
**Usage Example:**
223
224
```python
225
# Recommended approach using context manager
226
with mechanicalsoup.Browser() as browser:
227
response = browser.get("https://example.com")
228
# Process response...
229
response2 = browser.post("https://example.com/api", data={"key": "value"})
230
# Browser automatically closed when exiting with-block
231
232
# For long-running applications
233
with mechanicalsoup.Browser(user_agent="MyApp/1.0") as browser:
234
for url in urls:
235
try:
236
response = browser.get(url)
237
process_page(response.soup)
238
except Exception as e:
239
print(f"Error processing {url}: {e}")
240
```
241
242
### Static Utility Methods
243
244
Helper methods for response processing and form data extraction.
245
246
```python { .api }
247
@staticmethod
248
def add_soup(response, soup_config):
249
"""
250
Attach a BeautifulSoup object to a requests response.
251
252
Parameters:
253
- response: requests.Response object
254
- soup_config: BeautifulSoup configuration dict
255
"""
256
257
@staticmethod
258
def get_request_kwargs(form, url=None, **kwargs):
259
"""
260
Extract form data for request submission.
261
262
Parameters:
263
- form: Form instance
264
- url: Optional URL override
265
- **kwargs: Additional parameters
266
267
Returns:
268
Dict with request parameters
269
"""
270
```
271
272
## Public Attributes
273
274
```python { .api }
275
# Browser instance attributes
276
session: requests.Session # The underlying requests session
277
soup_config: Dict[str, Any] # BeautifulSoup configuration
278
raise_on_404: bool # Whether to raise on 404 errors
279
```
280
281
## Error Handling
282
283
The Browser class can raise LinkNotFoundError when `raise_on_404=True` and a 404 error occurs:
284
285
```python
286
import mechanicalsoup
287
288
browser = mechanicalsoup.Browser(raise_on_404=True)
289
try:
290
response = browser.get("https://httpbin.org/status/404")
291
except mechanicalsoup.LinkNotFoundError:
292
print("Page not found!")
293
```