0
# Data Structures
1
2
Feedparser provides comprehensive data structures for accessing feed content with normalized field names across different feed formats. The main result structure contains feed metadata, entries, and parsing information.
3
4
## Capabilities
5
6
### FeedParserDict Class
7
8
Enhanced dictionary providing attribute-style access and backward compatibility with legacy field names.
9
10
```python { .api }
11
class FeedParserDict(dict):
12
"""
13
Enhanced dictionary with attribute access and legacy key mapping.
14
15
Provides backward compatibility by mapping old RSS field names to
16
modern equivalents and supports both dict-style and attribute-style access.
17
"""
18
19
def __getitem__(self, key):
20
"""
21
Get item with legacy key mapping support.
22
23
Special handling for:
24
- 'category': Returns first tag term
25
- 'enclosures': Returns links with rel='enclosure'
26
- 'license': Returns first license link href
27
- 'updated'/'updated_parsed': Falls back to published if not present
28
29
Returns:
30
Value for the key, with legacy key mapping applied
31
"""
32
33
def __contains__(self, key):
34
"""Check if key exists, with legacy mapping support."""
35
36
def get(self, key, default=None):
37
"""Get item with default, using legacy key mapping."""
38
39
def __getattr__(self, key):
40
"""Enable attribute-style access (result.feed.title)."""
41
42
def __setitem__(self, key, value):
43
"""Set item with legacy key mapping."""
44
45
def setdefault(self, k, default):
46
"""Set default value if key doesn't exist."""
47
```
48
49
### Legacy Key Mapping
50
51
FeedParserDict automatically maps legacy RSS field names to modern equivalents:
52
53
```python
54
# Legacy key mappings (automatically handled)
55
keymap = {
56
'channel': 'feed',
57
'items': 'entries',
58
'guid': 'id',
59
'date': 'updated',
60
'date_parsed': 'updated_parsed',
61
'description': ['summary', 'subtitle'],
62
'description_detail': ['summary_detail', 'subtitle_detail'],
63
'url': ['href'],
64
'modified': 'updated',
65
'modified_parsed': 'updated_parsed',
66
'issued': 'published',
67
'issued_parsed': 'published_parsed',
68
'copyright': 'rights',
69
'copyright_detail': 'rights_detail',
70
'tagline': 'subtitle',
71
'tagline_detail': 'subtitle_detail',
72
}
73
```
74
75
## Top-Level Result Structure
76
77
The `parse()` function returns a FeedParserDict with these top-level properties:
78
79
### Parsing Information
80
81
```python { .api }
82
# Parsing status and metadata
83
result = {
84
'bozo': bool, # True if feed had parsing issues
85
'bozo_exception': Exception, # Exception object if errors occurred
86
'encoding': str, # Character encoding used (e.g., 'utf-8')
87
'version': str, # Feed format version (e.g., 'rss20', 'atom10')
88
'namespaces': dict, # XML namespaces used in feed
89
}
90
```
91
92
### HTTP Information
93
94
```python { .api }
95
# HTTP response data (when parsing from URL)
96
result = {
97
'etag': str, # HTTP ETag header
98
'headers': dict, # All HTTP response headers
99
'href': str, # Final URL after redirects
100
'modified': str, # HTTP Last-Modified header
101
'status': int, # HTTP status code
102
}
103
```
104
105
### Feed Content
106
107
```python { .api }
108
# Feed content structure
109
result = {
110
'feed': FeedParserDict, # Feed-level metadata
111
'entries': list, # List of entry/item FeedParserDict objects
112
}
113
```
114
115
## Feed-Level Structure (result.feed)
116
117
Feed metadata contains comprehensive information about the feed itself:
118
119
### Identity and Basic Information
120
121
```python { .api }
122
feed = {
123
'title': str, # Feed title
124
'title_detail': { # Detailed title information
125
'type': str, # Content type ('text', 'html', 'xhtml')
126
'language': str, # Language code
127
'base': str, # Base URI
128
'value': str, # Title content
129
},
130
'link': str, # Main feed/site URL
131
'links': [ # All feed links
132
{
133
'rel': str, # Relationship ('alternate', 'self', etc.)
134
'type': str, # MIME type
135
'href': str, # URL
136
'title': str, # Link title (optional)
137
}
138
],
139
'id': str, # Unique feed identifier
140
'description': str, # Feed description (RSS)
141
'subtitle': str, # Feed subtitle (Atom)
142
'subtitle_detail': dict, # Detailed subtitle information
143
'language': str, # Feed language code
144
}
145
```
146
147
### Authorship and Publication
148
149
```python { .api }
150
feed = {
151
'author': str, # Primary author name
152
'author_detail': { # Detailed author information
153
'name': str, # Author name
154
'email': str, # Author email
155
'href': str, # Author URL
156
},
157
'contributors': [ # List of contributor objects
158
{
159
'name': str,
160
'email': str,
161
'href': str,
162
}
163
],
164
'publisher': str, # Publisher name
165
'publisher_detail': { # Detailed publisher information
166
'name': str,
167
'email': str,
168
'href': str,
169
},
170
'generator': str, # Feed generator software
171
'generator_detail': { # Detailed generator information
172
'name': str,
173
'version': str,
174
'href': str,
175
},
176
}
177
```
178
179
### Dates and Updates
180
181
```python { .api }
182
feed = {
183
'updated': str, # Last updated timestamp (string)
184
'updated_parsed': tuple, # Parsed time as 9-tuple in GMT
185
'published': str, # Publication timestamp (string)
186
'published_parsed': tuple, # Parsed publication time as 9-tuple
187
}
188
```
189
190
### Rights and Legal
191
192
```python { .api }
193
feed = {
194
'rights': str, # Copyright/rights statement
195
'rights_detail': { # Detailed rights information
196
'type': str,
197
'language': str,
198
'base': str,
199
'value': str,
200
},
201
}
202
```
203
204
### Visual Elements
205
206
```python { .api }
207
feed = {
208
'image': { # Feed image/logo (RSS)
209
'title': str, # Image title
210
'url': str, # Image URL
211
'link': str, # Image link target
212
'width': int, # Image width
213
'height': int, # Image height
214
'description': str, # Image description
215
},
216
'icon': str, # Feed icon URL (Atom)
217
'logo': str, # Feed logo URL (Atom)
218
}
219
```
220
221
### RSS-Specific Elements
222
223
```python { .api }
224
feed = {
225
'ttl': int, # Time-to-live (cache duration in minutes)
226
'cloud': { # RSS cloud notification
227
'domain': str,
228
'port': int,
229
'path': str,
230
'registerprocedure': str,
231
'protocol': str,
232
},
233
'textinput': { # RSS text input box
234
'title': str,
235
'description': str,
236
'name': str,
237
'link': str,
238
},
239
'docs': str, # Documentation URL
240
}
241
```
242
243
### Categories and Tags
244
245
```python { .api }
246
feed = {
247
'tags': [ # List of categories/tags
248
{
249
'term': str, # Category term
250
'scheme': str, # Category scheme/domain
251
'label': str, # Human-readable label
252
}
253
],
254
}
255
```
256
257
## Entry-Level Structure (result.entries[n])
258
259
Each entry/item in the feed contains detailed article information:
260
261
### Identity and Content
262
263
```python { .api }
264
entry = {
265
'title': str, # Entry title
266
'title_detail': dict, # Detailed title information
267
'link': str, # Main entry URL
268
'links': list, # All entry links
269
'id': str, # Unique entry identifier
270
'summary': str, # Entry summary/description
271
'summary_detail': dict, # Detailed summary information
272
'content': [ # Entry content blocks
273
{
274
'type': str, # Content type ('text', 'html', 'xhtml')
275
'language': str, # Content language
276
'base': str, # Base URI
277
'value': str, # Content text
278
}
279
],
280
}
281
```
282
283
### Authorship
284
285
```python { .api }
286
entry = {
287
'author': str, # Primary author name
288
'author_detail': dict, # Detailed author information
289
'contributors': list, # List of contributor objects
290
'publisher': str, # Publisher name
291
'publisher_detail': dict, # Detailed publisher information
292
}
293
```
294
295
### Dates
296
297
```python { .api }
298
entry = {
299
'updated': str, # Last updated timestamp
300
'updated_parsed': tuple, # Parsed updated time as 9-tuple
301
'published': str, # Publication timestamp
302
'published_parsed': tuple, # Parsed publication time as 9-tuple
303
'created': str, # Creation timestamp (rare)
304
'created_parsed': tuple, # Parsed creation time as 9-tuple
305
'expired': str, # Expiration timestamp (rare)
306
'expired_parsed': tuple, # Parsed expiration time as 9-tuple
307
}
308
```
309
310
### Media and Attachments
311
312
```python { .api }
313
entry = {
314
'enclosures': [ # Attached files (podcasts, etc.)
315
{
316
'href': str, # File URL
317
'type': str, # MIME type
318
'length': str, # File size in bytes
319
}
320
],
321
}
322
```
323
324
### Categories and Classification
325
326
```python { .api }
327
entry = {
328
'tags': [ # Entry categories/tags
329
{
330
'term': str, # Tag term
331
'scheme': str, # Tag scheme/domain
332
'label': str, # Human-readable label
333
}
334
],
335
}
336
```
337
338
### Comments and Interaction
339
340
```python { .api }
341
entry = {
342
'comments': str, # Comments URL
343
'license': str, # Content license URL
344
}
345
```
346
347
### Source Attribution
348
349
```python { .api }
350
entry = {
351
'source': { # Original source information
352
'title': str, # Source feed title
353
'href': str, # Source feed URL
354
'value': str, # Source description
355
},
356
}
357
```
358
359
## Usage Examples
360
361
### Basic Data Access
362
363
```python
364
result = feedparser.parse(url)
365
366
# Feed information
367
print(f"Feed: {result.feed.title}")
368
print(f"Description: {result.feed.description}")
369
print(f"Last updated: {result.feed.updated}")
370
371
# Entry information
372
for entry in result.entries:
373
print(f"Title: {entry.title}")
374
print(f"Link: {entry.link}")
375
print(f"Published: {entry.published}")
376
print(f"Summary: {entry.summary}")
377
```
378
379
### Attribute vs Dictionary Access
380
381
```python
382
# Both styles work identically
383
title1 = result.feed.title
384
title2 = result.feed['title']
385
title3 = result['feed']['title']
386
387
# All three methods return the same value
388
assert title1 == title2 == title3
389
```
390
391
### Legacy Key Compatibility
392
393
```python
394
# Legacy RSS keys automatically map to modern equivalents
395
description = result.feed.description # RSS 'description'
396
subtitle = result.feed.subtitle # Atom 'subtitle'
397
# Both may return the same content depending on feed format
398
399
# Legacy item access
400
items = result.items # Maps to result.entries
401
guid = entry.guid # Maps to entry.id
402
```
403
404
### Content Type Handling
405
406
```python
407
# Check content types for proper rendering
408
if entry.title_detail.type == 'html':
409
# Contains HTML markup
410
html_title = entry.title
411
elif entry.title_detail.type == 'text':
412
# Plain text only
413
text_title = entry.title
414
415
# Handle multiple content blocks
416
for content_block in entry.content:
417
if content_block.type == 'html':
418
html_content = content_block.value
419
elif content_block.type == 'text':
420
text_content = content_block.value
421
```
422
423
### Safe Content Access
424
425
```python
426
# Use .get() for optional fields
427
author = entry.get('author', 'Unknown')
428
published = entry.get('published', 'Date not available')
429
430
# Check for field existence
431
if 'enclosures' in entry:
432
for enclosure in entry.enclosures:
433
print(f"Attachment: {enclosure.href}")
434
435
# Handle missing nested fields
436
if hasattr(entry, 'author_detail') and entry.author_detail:
437
email = entry.author_detail.get('email', 'No email')
438
```