0
# Date Handling
1
2
Feedparser provides comprehensive date parsing capabilities supporting multiple date formats commonly found in RSS and Atom feeds. The system includes built-in handlers for various date formats and allows registration of custom date parsers.
3
4
## Capabilities
5
6
### Custom Date Handler Registration
7
8
Register custom date parsing functions to handle non-standard date formats.
9
10
```python { .api }
11
def registerDateHandler(func):
12
"""
13
Register a date handler function.
14
15
Date handlers are tried in reverse registration order (most recently
16
registered first) until one successfully parses the date string.
17
18
Args:
19
func: Function that takes a date string and returns a 9-tuple date
20
in GMT, or None if unable to parse. Should handle exceptions
21
internally and return None rather than raising.
22
23
Example:
24
def my_date_handler(date_string):
25
try:
26
# Custom parsing logic here
27
return time.strptime(date_string, '%Y-%m-%d %H:%M:%S')
28
except ValueError:
29
return None
30
31
feedparser.registerDateHandler(my_date_handler)
32
"""
33
```
34
35
36
## Built-in Date Formats
37
38
Feedparser includes built-in support for numerous date formats commonly found in feeds:
39
40
### RFC 822 Format
41
42
Standard email/RSS date format:
43
```
44
Mon, 06 Sep 2021 12:00:00 GMT
45
Mon, 06 Sep 2021 12:00:00 +0000
46
06 Sep 2021 12:00:00 EST
47
```
48
49
### ISO 8601 / W3C DateTime Format
50
51
Standard Atom and modern date format:
52
```
53
2021-09-06T12:00:00Z
54
2021-09-06T12:00:00+00:00
55
2021-09-06T12:00:00.123Z
56
2021-09-06T12:00:00
57
2021-09-06
58
```
59
60
### Unix asctime() Format
61
62
Unix/C library date format:
63
```
64
Mon Sep 6 12:00:00 2021
65
```
66
67
### Localized Date Formats
68
69
Support for various localized date formats:
70
71
**Korean Formats**:
72
- OnBlog format
73
- Nate portal format
74
75
**European Formats**:
76
- Greek date formats
77
- Hungarian date formats
78
79
### Perforce Format
80
81
Version control system date format used by some feeds.
82
83
## Date Parsing Examples
84
85
### Basic Date Access
86
87
```python
88
result = feedparser.parse(url)
89
90
# Access parsed dates as tuples
91
if result.feed.updated_parsed:
92
updated_tuple = result.feed.updated_parsed
93
# updated_tuple is a 9-tuple: (year, month, day, hour, minute, second, weekday, yearday, dst)
94
95
# Convert to datetime objects
96
import time
97
import datetime
98
99
if result.feed.updated_parsed:
100
timestamp = time.mktime(result.feed.updated_parsed)
101
dt = datetime.datetime.fromtimestamp(timestamp, tz=datetime.timezone.utc)
102
print(f"Feed updated: {dt.isoformat()}")
103
104
# Entry dates
105
for entry in result.entries:
106
if entry.published_parsed:
107
pub_time = time.mktime(entry.published_parsed)
108
dt = datetime.datetime.fromtimestamp(pub_time, tz=datetime.timezone.utc)
109
print(f"Published: {dt.strftime('%Y-%m-%d %H:%M:%S UTC')}")
110
```
111
112
### Custom Date Handler Example
113
114
```python
115
import re
116
import time
117
import feedparser
118
119
def parse_custom_date(date_string):
120
"""
121
Parse a custom date format: "DD/MM/YYYY HH:MM"
122
"""
123
if not date_string:
124
return None
125
126
# Match DD/MM/YYYY HH:MM format
127
match = re.match(r'(\d{2})/(\d{2})/(\d{4}) (\d{2}):(\d{2})', date_string)
128
if not match:
129
return None
130
131
try:
132
day, month, year, hour, minute = map(int, match.groups())
133
# Return 9-tuple in GMT
134
return (year, month, day, hour, minute, 0, 0, 0, 0)
135
except (ValueError, OverflowError):
136
return None
137
138
# Register the custom handler
139
feedparser.registerDateHandler(parse_custom_date)
140
141
# Now feeds with "DD/MM/YYYY HH:MM" dates will be parsed correctly
142
result = feedparser.parse(feed_with_custom_dates)
143
```
144
145
### Advanced Date Handler
146
147
```python
148
import dateutil.parser
149
import feedparser
150
151
def parse_flexible_date(date_string):
152
"""
153
Use dateutil for flexible date parsing as a fallback.
154
"""
155
if not date_string:
156
return None
157
158
try:
159
# dateutil can parse many formats
160
dt = dateutil.parser.parse(date_string)
161
162
# Convert to GMT if timezone-aware
163
if dt.tzinfo:
164
dt = dt.astimezone(dateutil.tz.UTC)
165
166
# Return as 9-tuple
167
return dt.timetuple()
168
except (ValueError, TypeError, OverflowError):
169
return None
170
171
# Register as fallback handler (will be tried first due to LIFO order)
172
feedparser.registerDateHandler(parse_flexible_date)
173
```
174
175
### Using Parsed Dates
176
177
```python
178
import feedparser
179
180
# Date parsing is handled automatically by feedparser.parse()
181
# You don't need to call date parsing functions directly
182
183
result = feedparser.parse("https://example.com/feed.xml")
184
for entry in result.entries:
185
if hasattr(entry, 'published_parsed') and entry.published_parsed:
186
import time
187
readable = time.strftime('%Y-%m-%d %H:%M:%S UTC', entry.published_parsed)
188
print(f"Published: {readable}")
189
else:
190
print("No handler could parse the date")
191
```
192
193
## Date Handler Registration Order
194
195
Date handlers are tried in **reverse registration order** (LIFO - Last In, First Out):
196
197
```python
198
import feedparser
199
200
def handler1(date): return None # Register first
201
def handler2(date): return None # Register second
202
def handler3(date): return None # Register third
203
204
feedparser.registerDateHandler(handler1)
205
feedparser.registerDateHandler(handler2)
206
feedparser.registerDateHandler(handler3)
207
208
# When parsing, handlers are tried in this order:
209
# 1. handler3 (most recently registered)
210
# 2. handler2
211
# 3. handler1 (least recently registered)
212
# 4. Built-in handlers (in their predefined order)
213
```
214
215
## Built-in Handler Order
216
217
Built-in date handlers are registered in this order (and thus tried in reverse):
218
219
1. W3C Date and Time Format (_parse_date_w3dtf)
220
2. RFC 822 format (_parse_date_rfc822)
221
3. ISO 8601 format (_parse_date_iso8601)
222
4. Unix asctime format (_parse_date_asctime)
223
5. Perforce format (_parse_date_perforce)
224
6. Hungarian format (_parse_date_hungarian)
225
7. Greek format (_parse_date_greek)
226
8. Korean Nate format (_parse_date_nate)
227
9. Korean OnBlog format (_parse_date_onblog)
228
229
So W3C format is tried first, OnBlog format is tried last.
230
231
## Common Date Fields
232
233
Both feed-level and entry-level objects may contain these date fields:
234
235
### Feed-Level Dates
236
237
```python
238
feed = result.feed
239
240
# Publication dates
241
feed.published # Publication date string
242
feed.published_parsed # Parsed publication date tuple
243
244
# Update dates
245
feed.updated # Last updated date string
246
feed.updated_parsed # Parsed last updated date tuple
247
```
248
249
### Entry-Level Dates
250
251
```python
252
for entry in result.entries:
253
# Publication dates
254
entry.published # Publication date string
255
entry.published_parsed # Parsed publication date tuple
256
257
# Update dates
258
entry.updated # Last updated date string
259
entry.updated_parsed # Parsed last updated date tuple
260
261
# Creation dates (rare)
262
entry.created # Creation date string
263
entry.created_parsed # Parsed creation date tuple
264
265
# Expiration dates (rare)
266
entry.expired # Expiration date string
267
entry.expired_parsed # Parsed expiration date tuple
268
```
269
270
## Error Handling
271
272
Date parsing is designed to be fault-tolerant:
273
274
```python
275
result = feedparser.parse(url)
276
277
# Always check if dates were successfully parsed
278
for entry in result.entries:
279
if entry.published_parsed:
280
# Date was successfully parsed
281
print(f"Published: {entry.published}")
282
else:
283
# Date parsing failed or no date present
284
print(f"No valid publication date found")
285
if hasattr(entry, 'published'):
286
print(f"Raw date string: {entry.published}")
287
```
288
289
## Time Zone Handling
290
291
All parsed dates are normalized to GMT (UTC):
292
293
```python
294
# All *_parsed dates are in GMT regardless of original timezone
295
if entry.published_parsed:
296
gmt_tuple = entry.published_parsed
297
298
# Convert to local time if needed
299
import time
300
local_timestamp = time.mktime(gmt_tuple)
301
local_time = time.localtime(local_timestamp)
302
303
print(f"GMT: {time.strftime('%Y-%m-%d %H:%M:%S', gmt_tuple)}")
304
print(f"Local: {time.strftime('%Y-%m-%d %H:%M:%S', local_time)}")
305
```
306
307
## Legacy Date Compatibility
308
309
FeedParserDict provides backward compatibility for legacy date field names:
310
311
```python
312
# These all refer to the same data:
313
entry.updated # Modern name
314
entry.modified # Legacy RSS name
315
entry.date # Very old legacy name
316
317
entry.updated_parsed # Modern name
318
entry.modified_parsed # Legacy RSS name
319
entry.date_parsed # Very old legacy name
320
```