Tessl Tile for pypi/feedparser@6.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

data-structures.md date-handling.md error-handling.md http-features.md index.md parsing.md

http-features.mddocs/

0
# HTTP Features
1

2
Feedparser provides comprehensive HTTP client capabilities for fetching feeds from URLs, including conditional requests, custom headers, authentication support, and redirect handling.
3

4
## Capabilities
5

6
### Global Configuration Constants
7

8
Configure default HTTP behavior for all parsing operations.
9

10
```python { .api }
11
USER_AGENT: str = "feedparser/{version} +https://github.com/kurtmckee/feedparser/"
12
# Default HTTP User-Agent header sent with requests
13

14
RESOLVE_RELATIVE_URIS: int = 1  
15
# Global setting: resolve relative URIs to absolute (1=enabled, 0=disabled)
16

17
SANITIZE_HTML: int = 1
18
# Global setting: sanitize HTML content (1=enabled, 0=disabled)  
19
```
20

21
### HTTP Response Information
22

23
When parsing from URLs, the result contains comprehensive HTTP response data:
24

25
```python { .api }
26
# HTTP response fields in result
27
result = {
28
    'status': int,        # HTTP status code (200, 304, 404, etc.)
29
    'headers': dict,      # All HTTP response headers
30
    'etag': str,         # HTTP ETag header for caching
31
    'modified': str,     # HTTP Last-Modified header  
32
    'href': str,         # Final URL after redirects
33
}
34
```
35

36
## HTTP Client Features
37

38
### User-Agent Configuration
39

40
Set custom User-Agent strings for identification:
41

42
```python
43
import feedparser
44

45
# Set global User-Agent for all requests
46
feedparser.USER_AGENT = 'MyFeedReader/1.0 (+https://example.com/bot.html)'
47

48
# Or specify per-request
49
result = feedparser.parse(
50
    url, 
51
    agent='MyBot/2.0 (contact@example.com)'
52
)
53
```
54

55
### Custom Request Headers
56

57
Add custom HTTP headers to requests:
58

59
```python
60
# Add authorization
61
result = feedparser.parse(
62
    url,
63
    request_headers={
64
        'Authorization': 'Bearer your-token-here',
65
        'Accept-Language': 'en-US,en;q=0.9',
66
        'Accept-Encoding': 'gzip, deflate',
67
    }
68
)
69

70
# Override default headers
71
result = feedparser.parse(
72
    url,
73
    request_headers={
74
        'User-Agent': 'CustomBot/1.0',  # Overrides agent parameter
75
        'Referer': 'https://example.com',  # Custom referer
76
    }
77
)
78
```
79

80
### Conditional Requests (Caching)
81

82
Use ETags and Last-Modified headers for efficient feed polling:
83

84
```python
85
# Initial request - save caching headers
86
result = feedparser.parse('https://example.com/feed.xml')
87

88
# Store caching information
89
etag = result.get('etag')
90
modified = result.get('modified')
91

92
# Subsequent conditional request  
93
result = feedparser.parse(
94
    'https://example.com/feed.xml',
95
    etag=etag,
96
    modified=modified
97
)
98

99
# Check if content was modified
100
if result.status == 304:
101
    print("Feed not modified - use cached version")
102
else:
103
    print(f"Feed updated - {len(result.entries)} entries")
104
```
105

106
### HTTP Authentication
107

108
Feedparser supports various authentication methods through custom handlers:
109

110
```python
111
import urllib.request
112
import feedparser
113

114
# Basic authentication
115
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
116
password_mgr.add_password(None, 'https://example.com/', 'username', 'password')
117

118
auth_handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
119

120
result = feedparser.parse(
121
    'https://example.com/protected-feed.xml',
122
    handlers=[auth_handler]
123
)
124

125
# Digest authentication
126
digest_handler = urllib.request.HTTPDigestAuthHandler(password_mgr)
127

128
result = feedparser.parse(
129
    url,
130
    handlers=[digest_handler] 
131
)
132
```
133

134
### Proxy Support
135

136
Configure proxy settings using urllib handlers:
137

138
```python
139
import urllib.request
140
import feedparser
141

142
# HTTP proxy
143
proxy_handler = urllib.request.ProxyHandler({
144
    'http': 'http://proxy.example.com:8080',
145
    'https': 'https://proxy.example.com:8080'
146
})
147

148
result = feedparser.parse(
149
    url,
150
    handlers=[proxy_handler]
151
)
152

153
# Authenticated proxy
154
proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
155
proxy_auth_handler.add_password('realm', 'proxy.example.com', 'username', 'password')
156

157
result = feedparser.parse(
158
    url, 
159
    handlers=[proxy_handler, proxy_auth_handler]
160
)
161
```
162

163
### Custom URL Handlers
164

165
Extend feedparser with custom protocol handlers:
166

167
```python
168
import urllib.request
169
import feedparser
170

171
class CustomHTTPHandler(urllib.request.HTTPHandler):
172
    def http_open(self, req):
173
        # Custom HTTP handling logic
174
        print(f"Fetching: {req.get_full_url()}")
175
        return super().http_open(req)
176

177
custom_handler = CustomHTTPHandler()
178

179
result = feedparser.parse(
180
    url,
181
    handlers=[custom_handler]
182
)
183
```
184

185
### SSL/TLS Configuration
186

187
Configure SSL settings for HTTPS requests:
188

189
```python
190
import ssl
191
import urllib.request
192
import feedparser
193

194
# Create SSL context with custom settings
195
ssl_context = ssl.create_default_context()
196
ssl_context.check_hostname = False  # Disable hostname verification
197
ssl_context.verify_mode = ssl.CERT_NONE  # Disable certificate verification
198

199
# Create HTTPS handler with custom context
200
https_handler = urllib.request.HTTPSHandler(context=ssl_context)
201

202
result = feedparser.parse(
203
    'https://example.com/feed.xml',
204
    handlers=[https_handler]
205
)
206
```
207

208
### Redirect Handling
209

210
Feedparser automatically follows redirects and provides final URL:
211

212
```python
213
result = feedparser.parse('https://example.com/redirect-to-feed')
214

215
# Check if redirects occurred
216
original_url = 'https://example.com/redirect-to-feed'
217
final_url = result.get('href', '')
218

219
if final_url and final_url != original_url:
220
    print(f"Redirected from {original_url} to {final_url}")
221

222
# Access redirect history through headers
223
if 'location' in result.headers:
224
    print(f"Redirect location: {result.headers['location']}")
225
```
226

227
## Response Header Handling
228

229
### Accessing Response Headers
230

231
```python
232
result = feedparser.parse(url)
233

234
# Access all headers
235
headers = result.headers
236
print(f"Content-Type: {headers.get('content-type')}")
237
print(f"Content-Length: {headers.get('content-length')}")
238
print(f"Server: {headers.get('server')}")
239

240
# Check for specific caching headers
241
if 'etag' in headers:
242
    print(f"ETag: {headers['etag']}")
243
    
244
if 'last-modified' in headers:
245
    print(f"Last-Modified: {headers['last-modified']}")
246

247
# Check content encoding
248
if 'content-encoding' in headers:
249
    print(f"Compression: {headers['content-encoding']}")
250
```
251

252
### Overriding Response Headers
253

254
Useful for testing or when parsing content without HTTP:
255

256
```python
257
# Override/supplement response headers
258
result = feedparser.parse(
259
    content_string,
260
    response_headers={
261
        'content-type': 'application/rss+xml; charset=utf-8',
262
        'content-location': 'https://example.com/feed.xml',
263
        'last-modified': 'Mon, 06 Sep 2021 12:00:00 GMT',
264
        'etag': '"abc123"'
265
    }
266
)
267

268
# Headers affect base URI resolution and caching behavior
269
print(f"Base URI: {result.href}")
270
```
271

272
## Error Handling
273

274
### HTTP Status Codes
275

276
```python
277
result = feedparser.parse(url)
278

279
# Check HTTP status
280
status = result.get('status', 0)
281

282
if status == 200:
283
    print("Feed fetched successfully")
284
elif status == 304:
285
    print("Feed not modified (cached version is current)")
286
elif status == 404:
287
    print("Feed not found")
288
elif status == 403:
289
    print("Access forbidden") 
290
elif status >= 500:
291
    print(f"Server error: {status}")
292
elif status >= 400:
293
    print(f"Client error: {status}")
294
else:
295
    print(f"Unexpected status: {status}")
296

297
# Process feed data regardless of minor HTTP issues
298
if result.entries:
299
    print(f"Found {len(result.entries)} entries despite HTTP status {status}")
300
```
301

302
### Network Error Handling
303

304
```python
305
import urllib.error
306
import feedparser
307

308
try:
309
    result = feedparser.parse(url)
310
    
311
    # Check for network-related bozo exceptions  
312
    if result.bozo and isinstance(result.bozo_exception, urllib.error.URLError):
313
        print(f"Network error: {result.bozo_exception}")
314
        
315
        # Specific error types
316
        if isinstance(result.bozo_exception, urllib.error.HTTPError):
317
            print(f"HTTP Error {result.bozo_exception.code}: {result.bozo_exception.reason}")
318
        else:
319
            print(f"URL Error: {result.bozo_exception.reason}")
320
    
321
    # Process any data that was retrieved
322
    if result.entries:
323
        print("Some data was retrieved despite errors")
324
        
325
except Exception as e:
326
    print(f"Unexpected error: {e}")
327
```
328

329
### Timeout Configuration
330

331
```python
332
import socket
333
import urllib.request
334
import feedparser
335

336
# Set global socket timeout
337
socket.setdefaulttimeout(30)  # 30 seconds
338

339
# Or create custom opener with timeout
340
opener = urllib.request.build_opener()
341

342
result = feedparser.parse(
343
    url,
344
    handlers=[opener]
345
)
346
```
347

348
## Content-Type Handling
349

350
Feedparser handles various content types gracefully:
351

352
```python
353
result = feedparser.parse(url)
354

355
# Check detected content type
356
content_type = result.headers.get('content-type', '')
357

358
if 'xml' in content_type.lower():
359
    print("XML content detected")
360
elif 'html' in content_type.lower():
361
    print("HTML content - may use loose parser")
362
    
363
# Check for non-XML content type exception
364
if result.bozo and isinstance(result.bozo_exception, feedparser.NonXMLContentType):
365
    print(f"Non-XML content type: {content_type}")
366
    # Feedparser will still attempt to parse
367
```
368

369
## Compression Support
370

371
Feedparser automatically handles compressed responses:
372

373
```python
374
# Automatic gzip/deflate decompression
375
result = feedparser.parse(url)
376

377
# Check if content was compressed
378
content_encoding = result.headers.get('content-encoding', '')
379
if content_encoding:
380
    print(f"Content was compressed with: {content_encoding}")
381

382
# Request specific compression
383
result = feedparser.parse(
384
    url,
385
    request_headers={
386
        'Accept-Encoding': 'gzip, deflate, br'
387
    }
388
)
389
```
390

391
## Global Configuration Examples
392

393
```python
394
import feedparser
395

396
# Configure global defaults
397
feedparser.USER_AGENT = 'MyFeedAggregator/1.0 (+https://example.com)'
398
feedparser.RESOLVE_RELATIVE_URIS = 1  # Enable URI resolution
399
feedparser.SANITIZE_HTML = 1  # Enable HTML sanitization
400

401
# All subsequent parse() calls use these defaults
402
result1 = feedparser.parse(url1)  
403
result2 = feedparser.parse(url2)
404

405
# Override global settings per-request
406
result3 = feedparser.parse(
407
    url3,
408
    agent='SpecialBot/2.0',  # Override global USER_AGENT
409
    sanitize_html=False      # Override global SANITIZE_HTML
410
)
411
```

Version

Tile

Files

http-features.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

http-features.mddocs/