Tessl Tile for pypi/feedparser@6.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

data-structures.md date-handling.md error-handling.md http-features.md index.md parsing.md

data-structures.mddocs/

0
# Data Structures
1

2
Feedparser provides comprehensive data structures for accessing feed content with normalized field names across different feed formats. The main result structure contains feed metadata, entries, and parsing information.
3

4
## Capabilities
5

6
### FeedParserDict Class
7

8
Enhanced dictionary providing attribute-style access and backward compatibility with legacy field names.
9

10
```python { .api }
11
class FeedParserDict(dict):
12
    """
13
    Enhanced dictionary with attribute access and legacy key mapping.
14
    
15
    Provides backward compatibility by mapping old RSS field names to 
16
    modern equivalents and supports both dict-style and attribute-style access.
17
    """
18
    
19
    def __getitem__(self, key):
20
        """
21
        Get item with legacy key mapping support.
22
        
23
        Special handling for:
24
        - 'category': Returns first tag term
25
        - 'enclosures': Returns links with rel='enclosure'
26
        - 'license': Returns first license link href
27
        - 'updated'/'updated_parsed': Falls back to published if not present
28
        
29
        Returns:
30
            Value for the key, with legacy key mapping applied
31
        """
32
    
33
    def __contains__(self, key):
34
        """Check if key exists, with legacy mapping support."""
35
    
36
    def get(self, key, default=None):
37
        """Get item with default, using legacy key mapping."""
38
    
39
    def __getattr__(self, key):
40
        """Enable attribute-style access (result.feed.title)."""
41
    
42
    def __setitem__(self, key, value):
43
        """Set item with legacy key mapping."""
44
        
45
    def setdefault(self, k, default):
46
        """Set default value if key doesn't exist."""
47
```
48

49
### Legacy Key Mapping
50

51
FeedParserDict automatically maps legacy RSS field names to modern equivalents:
52

53
```python
54
# Legacy key mappings (automatically handled)
55
keymap = {
56
    'channel': 'feed',
57
    'items': 'entries', 
58
    'guid': 'id',
59
    'date': 'updated',
60
    'date_parsed': 'updated_parsed',
61
    'description': ['summary', 'subtitle'],
62
    'description_detail': ['summary_detail', 'subtitle_detail'],
63
    'url': ['href'],
64
    'modified': 'updated',
65
    'modified_parsed': 'updated_parsed',
66
    'issued': 'published',
67
    'issued_parsed': 'published_parsed',
68
    'copyright': 'rights',
69
    'copyright_detail': 'rights_detail',
70
    'tagline': 'subtitle',
71
    'tagline_detail': 'subtitle_detail',
72
}
73
```
74

75
## Top-Level Result Structure
76

77
The `parse()` function returns a FeedParserDict with these top-level properties:
78

79
### Parsing Information
80

81
```python { .api }
82
# Parsing status and metadata
83
result = {
84
    'bozo': bool,           # True if feed had parsing issues
85
    'bozo_exception': Exception,  # Exception object if errors occurred
86
    'encoding': str,        # Character encoding used (e.g., 'utf-8')
87
    'version': str,         # Feed format version (e.g., 'rss20', 'atom10')
88
    'namespaces': dict,     # XML namespaces used in feed
89
}
90
```
91

92
### HTTP Information
93

94
```python { .api }
95
# HTTP response data (when parsing from URL)
96
result = {
97
    'etag': str,           # HTTP ETag header
98
    'headers': dict,       # All HTTP response headers
99
    'href': str,           # Final URL after redirects
100
    'modified': str,       # HTTP Last-Modified header
101
    'status': int,         # HTTP status code
102
}
103
```
104

105
### Feed Content
106

107
```python { .api }
108
# Feed content structure
109
result = {
110
    'feed': FeedParserDict,    # Feed-level metadata
111
    'entries': list,           # List of entry/item FeedParserDict objects
112
}
113
```
114

115
## Feed-Level Structure (result.feed)
116

117
Feed metadata contains comprehensive information about the feed itself:
118

119
### Identity and Basic Information
120

121
```python { .api }
122
feed = {
123
    'title': str,              # Feed title
124
    'title_detail': {          # Detailed title information
125
        'type': str,           # Content type ('text', 'html', 'xhtml')
126
        'language': str,       # Language code
127
        'base': str,           # Base URI
128
        'value': str,          # Title content
129
    },
130
    'link': str,               # Main feed/site URL
131
    'links': [                 # All feed links
132
        {
133
            'rel': str,        # Relationship ('alternate', 'self', etc.)
134
            'type': str,       # MIME type
135
            'href': str,       # URL
136
            'title': str,      # Link title (optional)
137
        }
138
    ],
139
    'id': str,                 # Unique feed identifier
140
    'description': str,        # Feed description (RSS)
141
    'subtitle': str,           # Feed subtitle (Atom)
142
    'subtitle_detail': dict,   # Detailed subtitle information
143
    'language': str,           # Feed language code
144
}
145
```
146

147
### Authorship and Publication
148

149
```python { .api }
150
feed = {
151
    'author': str,             # Primary author name
152
    'author_detail': {         # Detailed author information
153
        'name': str,           # Author name
154
        'email': str,          # Author email
155
        'href': str,           # Author URL
156
    },
157
    'contributors': [          # List of contributor objects
158
        {
159
            'name': str,
160
            'email': str, 
161
            'href': str,
162
        }
163
    ],
164
    'publisher': str,          # Publisher name
165
    'publisher_detail': {      # Detailed publisher information
166
        'name': str,
167
        'email': str,
168
        'href': str,
169
    },
170
    'generator': str,          # Feed generator software
171
    'generator_detail': {      # Detailed generator information
172
        'name': str,
173
        'version': str,
174
        'href': str,
175
    },
176
}
177
```
178

179
### Dates and Updates
180

181
```python { .api }
182
feed = {
183
    'updated': str,            # Last updated timestamp (string)
184
    'updated_parsed': tuple,   # Parsed time as 9-tuple in GMT
185
    'published': str,          # Publication timestamp (string)  
186
    'published_parsed': tuple, # Parsed publication time as 9-tuple
187
}
188
```
189

190
### Rights and Legal
191

192
```python { .api }
193
feed = {
194
    'rights': str,             # Copyright/rights statement
195
    'rights_detail': {         # Detailed rights information
196
        'type': str,
197
        'language': str,
198
        'base': str,
199
        'value': str,
200
    },
201
}
202
```
203

204
### Visual Elements
205

206
```python { .api }
207
feed = {
208
    'image': {                 # Feed image/logo (RSS)
209
        'title': str,          # Image title
210
        'url': str,            # Image URL
211
        'link': str,           # Image link target
212
        'width': int,          # Image width
213
        'height': int,         # Image height
214
        'description': str,    # Image description
215
    },
216
    'icon': str,               # Feed icon URL (Atom)
217
    'logo': str,               # Feed logo URL (Atom)
218
}
219
```
220

221
### RSS-Specific Elements
222

223
```python { .api }
224
feed = {
225
    'ttl': int,                # Time-to-live (cache duration in minutes)
226
    'cloud': {                 # RSS cloud notification
227
        'domain': str,
228
        'port': int,
229
        'path': str,
230
        'registerprocedure': str,
231
        'protocol': str,
232
    },
233
    'textinput': {             # RSS text input box
234
        'title': str,
235
        'description': str,
236
        'name': str,
237
        'link': str,
238
    },
239
    'docs': str,               # Documentation URL
240
}
241
```
242

243
### Categories and Tags
244

245
```python { .api }
246
feed = {
247
    'tags': [                  # List of categories/tags
248
        {
249
            'term': str,       # Category term
250
            'scheme': str,     # Category scheme/domain
251
            'label': str,      # Human-readable label
252
        }
253
    ],
254
}
255
```
256

257
## Entry-Level Structure (result.entries[n])
258

259
Each entry/item in the feed contains detailed article information:
260

261
### Identity and Content
262

263
```python { .api }
264
entry = {
265
    'title': str,              # Entry title
266
    'title_detail': dict,      # Detailed title information
267
    'link': str,               # Main entry URL
268
    'links': list,             # All entry links
269
    'id': str,                 # Unique entry identifier
270
    'summary': str,            # Entry summary/description
271
    'summary_detail': dict,    # Detailed summary information
272
    'content': [               # Entry content blocks
273
        {
274
            'type': str,       # Content type ('text', 'html', 'xhtml')
275
            'language': str,   # Content language
276
            'base': str,       # Base URI
277
            'value': str,      # Content text
278
        }
279
    ],
280
}
281
```
282

283
### Authorship
284

285
```python { .api }
286
entry = {
287
    'author': str,             # Primary author name
288
    'author_detail': dict,     # Detailed author information
289
    'contributors': list,      # List of contributor objects
290
    'publisher': str,          # Publisher name
291
    'publisher_detail': dict,  # Detailed publisher information
292
}
293
```
294

295
### Dates
296

297
```python { .api }
298
entry = {
299
    'updated': str,            # Last updated timestamp
300
    'updated_parsed': tuple,   # Parsed updated time as 9-tuple
301
    'published': str,          # Publication timestamp
302
    'published_parsed': tuple, # Parsed publication time as 9-tuple
303
    'created': str,            # Creation timestamp (rare)
304
    'created_parsed': tuple,   # Parsed creation time as 9-tuple
305
    'expired': str,            # Expiration timestamp (rare)
306
    'expired_parsed': tuple,   # Parsed expiration time as 9-tuple
307
}
308
```
309

310
### Media and Attachments
311

312
```python { .api }
313
entry = {
314
    'enclosures': [            # Attached files (podcasts, etc.)
315
        {
316
            'href': str,       # File URL
317
            'type': str,       # MIME type
318
            'length': str,     # File size in bytes
319
        }
320
    ],
321
}
322
```
323

324
### Categories and Classification
325

326
```python { .api }
327
entry = {
328
    'tags': [                  # Entry categories/tags
329
        {
330
            'term': str,       # Tag term
331
            'scheme': str,     # Tag scheme/domain
332
            'label': str,      # Human-readable label
333
        }
334
    ],
335
}
336
```
337

338
### Comments and Interaction
339

340
```python { .api }
341
entry = {
342
    'comments': str,           # Comments URL
343
    'license': str,            # Content license URL
344
}
345
```
346

347
### Source Attribution
348

349
```python { .api }
350
entry = {
351
    'source': {                # Original source information
352
        'title': str,          # Source feed title
353
        'href': str,           # Source feed URL
354
        'value': str,          # Source description
355
    },
356
}
357
```
358

359
## Usage Examples
360

361
### Basic Data Access
362

363
```python
364
result = feedparser.parse(url)
365

366
# Feed information
367
print(f"Feed: {result.feed.title}")
368
print(f"Description: {result.feed.description}")
369
print(f"Last updated: {result.feed.updated}")
370

371
# Entry information
372
for entry in result.entries:
373
    print(f"Title: {entry.title}")
374
    print(f"Link: {entry.link}")
375
    print(f"Published: {entry.published}")
376
    print(f"Summary: {entry.summary}")
377
```
378

379
### Attribute vs Dictionary Access
380

381
```python
382
# Both styles work identically
383
title1 = result.feed.title
384
title2 = result.feed['title']
385
title3 = result['feed']['title']
386

387
# All three methods return the same value
388
assert title1 == title2 == title3
389
```
390

391
### Legacy Key Compatibility
392

393
```python
394
# Legacy RSS keys automatically map to modern equivalents
395
description = result.feed.description  # RSS 'description'
396
subtitle = result.feed.subtitle        # Atom 'subtitle'  
397
# Both may return the same content depending on feed format
398

399
# Legacy item access
400
items = result.items    # Maps to result.entries
401
guid = entry.guid       # Maps to entry.id
402
```
403

404
### Content Type Handling
405

406
```python
407
# Check content types for proper rendering
408
if entry.title_detail.type == 'html':
409
    # Contains HTML markup
410
    html_title = entry.title
411
elif entry.title_detail.type == 'text':
412
    # Plain text only
413
    text_title = entry.title
414

415
# Handle multiple content blocks
416
for content_block in entry.content:
417
    if content_block.type == 'html':
418
        html_content = content_block.value
419
    elif content_block.type == 'text':
420
        text_content = content_block.value
421
```
422

423
### Safe Content Access
424

425
```python
426
# Use .get() for optional fields
427
author = entry.get('author', 'Unknown')
428
published = entry.get('published', 'Date not available')
429

430
# Check for field existence
431
if 'enclosures' in entry:
432
    for enclosure in entry.enclosures:
433
        print(f"Attachment: {enclosure.href}")
434

435
# Handle missing nested fields
436
if hasattr(entry, 'author_detail') and entry.author_detail:
437
    email = entry.author_detail.get('email', 'No email')
438
```

Version

Tile

Files

data-structures.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-structures.mddocs/