Universal feed parser for RSS, Atom, and CDF feeds with comprehensive format support and robust parsing capabilities
npx @tessl/cli install tessl/pypi-feedparser@6.0.00
# Feedparser
1
2
A universal Python library for parsing RSS, Atom, and CDF feeds with comprehensive format support. Feedparser handles multiple feed formats (RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, Atom 1.0) and provides robust parsing capabilities with automatic encoding detection, HTML sanitization, and graceful error handling.
3
4
## Package Information
5
6
- **Package Name**: feedparser
7
- **Language**: Python
8
- **Installation**: `pip install feedparser`
9
10
## Core Imports
11
12
```python
13
import feedparser
14
```
15
16
## Basic Usage
17
18
```python
19
import feedparser
20
21
# Parse a feed from URL
22
result = feedparser.parse('https://example.com/feed.xml')
23
24
# Access feed metadata
25
print(result.feed.title)
26
print(result.feed.description)
27
print(result.feed.link)
28
29
# Access entries/items
30
for entry in result.entries:
31
print(entry.title)
32
print(entry.summary)
33
print(entry.link)
34
print(entry.published)
35
36
# Check for parsing errors
37
if result.bozo:
38
print(f"Feed had parsing issues: {result.bozo_exception}")
39
```
40
41
## Architecture
42
43
Feedparser uses a flexible parsing architecture that supports both strict XML parsing and lenient HTML-style parsing:
44
45
- **Dual Parser System**: Automatic selection between strict XML parsing and lenient HTML-style parsing
46
- **Format Detection**: Automatic detection of RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 formats
47
- **Enhanced Dictionary**: FeedParserDict provides backward compatibility and attribute-style access
48
- **Namespace Support**: Handles various XML namespaces (Dublin Core, iTunes, Media RSS, GeoRSS, etc.)
49
- **Character Encoding**: Automatic encoding detection with UTF-8 conversion and fallback handling
50
51
## Capabilities
52
53
### Core Parsing
54
55
Main feed parsing functionality with support for multiple input sources, HTTP features, and extensive configuration options.
56
57
```python { .api }
58
def parse(url_file_stream_or_string, etag=None, modified=None, agent=None, referrer=None, handlers=None, request_headers=None, response_headers=None, resolve_relative_uris=None, sanitize_html=None):
59
"""
60
Parse a feed from URL, file, stream, or string.
61
62
Args:
63
url_file_stream_or_string: Feed source (URL, file path, file-like object, or string)
64
etag (str, optional): HTTP ETag for conditional requests
65
modified (str/datetime/tuple, optional): Last-Modified date for conditional requests
66
agent (str, optional): HTTP User-Agent header
67
referrer (str, optional): HTTP Referer header
68
handlers (list, optional): Custom urllib handlers
69
request_headers (dict, optional): Additional HTTP request headers
70
response_headers (dict, optional): Override/supplement response headers
71
resolve_relative_uris (bool, optional): Enable relative URI resolution
72
sanitize_html (bool, optional): Enable HTML sanitization
73
74
Returns:
75
FeedParserDict: Parsed feed data with feed metadata and entries
76
"""
77
```
78
79
[Parsing](./parsing.md)
80
81
### Data Structures
82
83
Comprehensive feed data structures with normalized access to feed metadata, entries, and all feed elements across different formats.
84
85
```python { .api }
86
class FeedParserDict(dict):
87
"""Enhanced dictionary with attribute access and legacy key mapping."""
88
89
def __getitem__(self, key): ...
90
def __contains__(self, key): ...
91
def get(self, key, default=None): ...
92
def __getattr__(self, key): ...
93
```
94
95
[Data Structures](./data-structures.md)
96
97
### Date Handling
98
99
Date parsing system supporting multiple date formats with extensible custom date handler registration.
100
101
```python { .api }
102
def registerDateHandler(func):
103
"""
104
Register a custom date handler function.
105
106
Args:
107
func: Function that takes date string, returns 9-tuple date in GMT
108
"""
109
```
110
111
[Date Handling](./date-handling.md)
112
113
### HTTP Features
114
115
HTTP client capabilities including conditional requests, authentication, custom headers, and redirect handling.
116
117
```python { .api }
118
# Configuration constants
119
USER_AGENT: str # Default HTTP User-Agent header
120
RESOLVE_RELATIVE_URIS: int # Global URI resolution setting
121
SANITIZE_HTML: int # Global HTML sanitization setting
122
123
# Package metadata constants
124
__author__: str # Package author information
125
__license__: str # Package license type
126
__version__: str # Package version string
127
```
128
129
[HTTP Features](./http-features.md)
130
131
### Error Handling
132
133
Exception system for parsing errors, encoding issues, and malformed content with graceful degradation.
134
135
```python { .api }
136
class ThingsNobodyCaresAboutButMe(Exception): ...
137
class CharacterEncodingOverride(ThingsNobodyCaresAboutButMe): ...
138
class CharacterEncodingUnknown(ThingsNobodyCaresAboutButMe): ...
139
class NonXMLContentType(ThingsNobodyCaresAboutButMe): ...
140
class UndeclaredNamespace(Exception): ...
141
```
142
143
[Error Handling](./error-handling.md)
144
145
## Types
146
147
```python { .api }
148
# Feed parsing result structure
149
FeedParserDict = {
150
'bozo': bool, # True if feed had parsing issues
151
'bozo_exception': Exception, # Exception if parsing errors occurred
152
'encoding': str, # Character encoding used
153
'etag': str, # HTTP ETag from response
154
'headers': dict, # HTTP response headers
155
'href': str, # Final URL after redirects
156
'modified': str, # HTTP Last-Modified header
157
'namespaces': dict, # XML namespaces used
158
'status': int, # HTTP status code
159
'version': str, # Feed format version (e.g., 'rss20', 'atom10')
160
'entries': list, # List of entry/item dictionaries
161
'feed': dict, # Feed-level metadata
162
}
163
```