or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-structures.mddate-handling.mderror-handling.mdhttp-features.mdindex.mdparsing.md

http-features.mddocs/

0

# HTTP Features

1

2

Feedparser provides comprehensive HTTP client capabilities for fetching feeds from URLs, including conditional requests, custom headers, authentication support, and redirect handling.

3

4

## Capabilities

5

6

### Global Configuration Constants

7

8

Configure default HTTP behavior for all parsing operations.

9

10

```python { .api }

11

USER_AGENT: str = "feedparser/{version} +https://github.com/kurtmckee/feedparser/"

12

# Default HTTP User-Agent header sent with requests

13

14

RESOLVE_RELATIVE_URIS: int = 1

15

# Global setting: resolve relative URIs to absolute (1=enabled, 0=disabled)

16

17

SANITIZE_HTML: int = 1

18

# Global setting: sanitize HTML content (1=enabled, 0=disabled)

19

```

20

21

### HTTP Response Information

22

23

When parsing from URLs, the result contains comprehensive HTTP response data:

24

25

```python { .api }

26

# HTTP response fields in result

27

result = {

28

'status': int, # HTTP status code (200, 304, 404, etc.)

29

'headers': dict, # All HTTP response headers

30

'etag': str, # HTTP ETag header for caching

31

'modified': str, # HTTP Last-Modified header

32

'href': str, # Final URL after redirects

33

}

34

```

35

36

## HTTP Client Features

37

38

### User-Agent Configuration

39

40

Set custom User-Agent strings for identification:

41

42

```python

43

import feedparser

44

45

# Set global User-Agent for all requests

46

feedparser.USER_AGENT = 'MyFeedReader/1.0 (+https://example.com/bot.html)'

47

48

# Or specify per-request

49

result = feedparser.parse(

50

url,

51

agent='MyBot/2.0 (contact@example.com)'

52

)

53

```

54

55

### Custom Request Headers

56

57

Add custom HTTP headers to requests:

58

59

```python

60

# Add authorization

61

result = feedparser.parse(

62

url,

63

request_headers={

64

'Authorization': 'Bearer your-token-here',

65

'Accept-Language': 'en-US,en;q=0.9',

66

'Accept-Encoding': 'gzip, deflate',

67

}

68

)

69

70

# Override default headers

71

result = feedparser.parse(

72

url,

73

request_headers={

74

'User-Agent': 'CustomBot/1.0', # Overrides agent parameter

75

'Referer': 'https://example.com', # Custom referer

76

}

77

)

78

```

79

80

### Conditional Requests (Caching)

81

82

Use ETags and Last-Modified headers for efficient feed polling:

83

84

```python

85

# Initial request - save caching headers

86

result = feedparser.parse('https://example.com/feed.xml')

87

88

# Store caching information

89

etag = result.get('etag')

90

modified = result.get('modified')

91

92

# Subsequent conditional request

93

result = feedparser.parse(

94

'https://example.com/feed.xml',

95

etag=etag,

96

modified=modified

97

)

98

99

# Check if content was modified

100

if result.status == 304:

101

print("Feed not modified - use cached version")

102

else:

103

print(f"Feed updated - {len(result.entries)} entries")

104

```

105

106

### HTTP Authentication

107

108

Feedparser supports various authentication methods through custom handlers:

109

110

```python

111

import urllib.request

112

import feedparser

113

114

# Basic authentication

115

password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()

116

password_mgr.add_password(None, 'https://example.com/', 'username', 'password')

117

118

auth_handler = urllib.request.HTTPBasicAuthHandler(password_mgr)

119

120

result = feedparser.parse(

121

'https://example.com/protected-feed.xml',

122

handlers=[auth_handler]

123

)

124

125

# Digest authentication

126

digest_handler = urllib.request.HTTPDigestAuthHandler(password_mgr)

127

128

result = feedparser.parse(

129

url,

130

handlers=[digest_handler]

131

)

132

```

133

134

### Proxy Support

135

136

Configure proxy settings using urllib handlers:

137

138

```python

139

import urllib.request

140

import feedparser

141

142

# HTTP proxy

143

proxy_handler = urllib.request.ProxyHandler({

144

'http': 'http://proxy.example.com:8080',

145

'https': 'https://proxy.example.com:8080'

146

})

147

148

result = feedparser.parse(

149

url,

150

handlers=[proxy_handler]

151

)

152

153

# Authenticated proxy

154

proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()

155

proxy_auth_handler.add_password('realm', 'proxy.example.com', 'username', 'password')

156

157

result = feedparser.parse(

158

url,

159

handlers=[proxy_handler, proxy_auth_handler]

160

)

161

```

162

163

### Custom URL Handlers

164

165

Extend feedparser with custom protocol handlers:

166

167

```python

168

import urllib.request

169

import feedparser

170

171

class CustomHTTPHandler(urllib.request.HTTPHandler):

172

def http_open(self, req):

173

# Custom HTTP handling logic

174

print(f"Fetching: {req.get_full_url()}")

175

return super().http_open(req)

176

177

custom_handler = CustomHTTPHandler()

178

179

result = feedparser.parse(

180

url,

181

handlers=[custom_handler]

182

)

183

```

184

185

### SSL/TLS Configuration

186

187

Configure SSL settings for HTTPS requests:

188

189

```python

190

import ssl

191

import urllib.request

192

import feedparser

193

194

# Create SSL context with custom settings

195

ssl_context = ssl.create_default_context()

196

ssl_context.check_hostname = False # Disable hostname verification

197

ssl_context.verify_mode = ssl.CERT_NONE # Disable certificate verification

198

199

# Create HTTPS handler with custom context

200

https_handler = urllib.request.HTTPSHandler(context=ssl_context)

201

202

result = feedparser.parse(

203

'https://example.com/feed.xml',

204

handlers=[https_handler]

205

)

206

```

207

208

### Redirect Handling

209

210

Feedparser automatically follows redirects and provides final URL:

211

212

```python

213

result = feedparser.parse('https://example.com/redirect-to-feed')

214

215

# Check if redirects occurred

216

original_url = 'https://example.com/redirect-to-feed'

217

final_url = result.get('href', '')

218

219

if final_url and final_url != original_url:

220

print(f"Redirected from {original_url} to {final_url}")

221

222

# Access redirect history through headers

223

if 'location' in result.headers:

224

print(f"Redirect location: {result.headers['location']}")

225

```

226

227

## Response Header Handling

228

229

### Accessing Response Headers

230

231

```python

232

result = feedparser.parse(url)

233

234

# Access all headers

235

headers = result.headers

236

print(f"Content-Type: {headers.get('content-type')}")

237

print(f"Content-Length: {headers.get('content-length')}")

238

print(f"Server: {headers.get('server')}")

239

240

# Check for specific caching headers

241

if 'etag' in headers:

242

print(f"ETag: {headers['etag']}")

243

244

if 'last-modified' in headers:

245

print(f"Last-Modified: {headers['last-modified']}")

246

247

# Check content encoding

248

if 'content-encoding' in headers:

249

print(f"Compression: {headers['content-encoding']}")

250

```

251

252

### Overriding Response Headers

253

254

Useful for testing or when parsing content without HTTP:

255

256

```python

257

# Override/supplement response headers

258

result = feedparser.parse(

259

content_string,

260

response_headers={

261

'content-type': 'application/rss+xml; charset=utf-8',

262

'content-location': 'https://example.com/feed.xml',

263

'last-modified': 'Mon, 06 Sep 2021 12:00:00 GMT',

264

'etag': '"abc123"'

265

}

266

)

267

268

# Headers affect base URI resolution and caching behavior

269

print(f"Base URI: {result.href}")

270

```

271

272

## Error Handling

273

274

### HTTP Status Codes

275

276

```python

277

result = feedparser.parse(url)

278

279

# Check HTTP status

280

status = result.get('status', 0)

281

282

if status == 200:

283

print("Feed fetched successfully")

284

elif status == 304:

285

print("Feed not modified (cached version is current)")

286

elif status == 404:

287

print("Feed not found")

288

elif status == 403:

289

print("Access forbidden")

290

elif status >= 500:

291

print(f"Server error: {status}")

292

elif status >= 400:

293

print(f"Client error: {status}")

294

else:

295

print(f"Unexpected status: {status}")

296

297

# Process feed data regardless of minor HTTP issues

298

if result.entries:

299

print(f"Found {len(result.entries)} entries despite HTTP status {status}")

300

```

301

302

### Network Error Handling

303

304

```python

305

import urllib.error

306

import feedparser

307

308

try:

309

result = feedparser.parse(url)

310

311

# Check for network-related bozo exceptions

312

if result.bozo and isinstance(result.bozo_exception, urllib.error.URLError):

313

print(f"Network error: {result.bozo_exception}")

314

315

# Specific error types

316

if isinstance(result.bozo_exception, urllib.error.HTTPError):

317

print(f"HTTP Error {result.bozo_exception.code}: {result.bozo_exception.reason}")

318

else:

319

print(f"URL Error: {result.bozo_exception.reason}")

320

321

# Process any data that was retrieved

322

if result.entries:

323

print("Some data was retrieved despite errors")

324

325

except Exception as e:

326

print(f"Unexpected error: {e}")

327

```

328

329

### Timeout Configuration

330

331

```python

332

import socket

333

import urllib.request

334

import feedparser

335

336

# Set global socket timeout

337

socket.setdefaulttimeout(30) # 30 seconds

338

339

# Or create custom opener with timeout

340

opener = urllib.request.build_opener()

341

342

result = feedparser.parse(

343

url,

344

handlers=[opener]

345

)

346

```

347

348

## Content-Type Handling

349

350

Feedparser handles various content types gracefully:

351

352

```python

353

result = feedparser.parse(url)

354

355

# Check detected content type

356

content_type = result.headers.get('content-type', '')

357

358

if 'xml' in content_type.lower():

359

print("XML content detected")

360

elif 'html' in content_type.lower():

361

print("HTML content - may use loose parser")

362

363

# Check for non-XML content type exception

364

if result.bozo and isinstance(result.bozo_exception, feedparser.NonXMLContentType):

365

print(f"Non-XML content type: {content_type}")

366

# Feedparser will still attempt to parse

367

```

368

369

## Compression Support

370

371

Feedparser automatically handles compressed responses:

372

373

```python

374

# Automatic gzip/deflate decompression

375

result = feedparser.parse(url)

376

377

# Check if content was compressed

378

content_encoding = result.headers.get('content-encoding', '')

379

if content_encoding:

380

print(f"Content was compressed with: {content_encoding}")

381

382

# Request specific compression

383

result = feedparser.parse(

384

url,

385

request_headers={

386

'Accept-Encoding': 'gzip, deflate, br'

387

}

388

)

389

```

390

391

## Global Configuration Examples

392

393

```python

394

import feedparser

395

396

# Configure global defaults

397

feedparser.USER_AGENT = 'MyFeedAggregator/1.0 (+https://example.com)'

398

feedparser.RESOLVE_RELATIVE_URIS = 1 # Enable URI resolution

399

feedparser.SANITIZE_HTML = 1 # Enable HTML sanitization

400

401

# All subsequent parse() calls use these defaults

402

result1 = feedparser.parse(url1)

403

result2 = feedparser.parse(url2)

404

405

# Override global settings per-request

406

result3 = feedparser.parse(

407

url3,

408

agent='SpecialBot/2.0', # Override global USER_AGENT

409

sanitize_html=False # Override global SANITIZE_HTML

410

)

411

```