or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-structures.mddate-handling.mderror-handling.mdhttp-features.mdindex.mdparsing.md

date-handling.mddocs/

0

# Date Handling

1

2

Feedparser provides comprehensive date parsing capabilities supporting multiple date formats commonly found in RSS and Atom feeds. The system includes built-in handlers for various date formats and allows registration of custom date parsers.

3

4

## Capabilities

5

6

### Custom Date Handler Registration

7

8

Register custom date parsing functions to handle non-standard date formats.

9

10

```python { .api }

11

def registerDateHandler(func):

12

"""

13

Register a date handler function.

14

15

Date handlers are tried in reverse registration order (most recently

16

registered first) until one successfully parses the date string.

17

18

Args:

19

func: Function that takes a date string and returns a 9-tuple date

20

in GMT, or None if unable to parse. Should handle exceptions

21

internally and return None rather than raising.

22

23

Example:

24

def my_date_handler(date_string):

25

try:

26

# Custom parsing logic here

27

return time.strptime(date_string, '%Y-%m-%d %H:%M:%S')

28

except ValueError:

29

return None

30

31

feedparser.registerDateHandler(my_date_handler)

32

"""

33

```

34

35

36

## Built-in Date Formats

37

38

Feedparser includes built-in support for numerous date formats commonly found in feeds:

39

40

### RFC 822 Format

41

42

Standard email/RSS date format:

43

```

44

Mon, 06 Sep 2021 12:00:00 GMT

45

Mon, 06 Sep 2021 12:00:00 +0000

46

06 Sep 2021 12:00:00 EST

47

```

48

49

### ISO 8601 / W3C DateTime Format

50

51

Standard Atom and modern date format:

52

```

53

2021-09-06T12:00:00Z

54

2021-09-06T12:00:00+00:00

55

2021-09-06T12:00:00.123Z

56

2021-09-06T12:00:00

57

2021-09-06

58

```

59

60

### Unix asctime() Format

61

62

Unix/C library date format:

63

```

64

Mon Sep 6 12:00:00 2021

65

```

66

67

### Localized Date Formats

68

69

Support for various localized date formats:

70

71

**Korean Formats**:

72

- OnBlog format

73

- Nate portal format

74

75

**European Formats**:

76

- Greek date formats

77

- Hungarian date formats

78

79

### Perforce Format

80

81

Version control system date format used by some feeds.

82

83

## Date Parsing Examples

84

85

### Basic Date Access

86

87

```python

88

result = feedparser.parse(url)

89

90

# Access parsed dates as tuples

91

if result.feed.updated_parsed:

92

updated_tuple = result.feed.updated_parsed

93

# updated_tuple is a 9-tuple: (year, month, day, hour, minute, second, weekday, yearday, dst)

94

95

# Convert to datetime objects

96

import time

97

import datetime

98

99

if result.feed.updated_parsed:

100

timestamp = time.mktime(result.feed.updated_parsed)

101

dt = datetime.datetime.fromtimestamp(timestamp, tz=datetime.timezone.utc)

102

print(f"Feed updated: {dt.isoformat()}")

103

104

# Entry dates

105

for entry in result.entries:

106

if entry.published_parsed:

107

pub_time = time.mktime(entry.published_parsed)

108

dt = datetime.datetime.fromtimestamp(pub_time, tz=datetime.timezone.utc)

109

print(f"Published: {dt.strftime('%Y-%m-%d %H:%M:%S UTC')}")

110

```

111

112

### Custom Date Handler Example

113

114

```python

115

import re

116

import time

117

import feedparser

118

119

def parse_custom_date(date_string):

120

"""

121

Parse a custom date format: "DD/MM/YYYY HH:MM"

122

"""

123

if not date_string:

124

return None

125

126

# Match DD/MM/YYYY HH:MM format

127

match = re.match(r'(\d{2})/(\d{2})/(\d{4}) (\d{2}):(\d{2})', date_string)

128

if not match:

129

return None

130

131

try:

132

day, month, year, hour, minute = map(int, match.groups())

133

# Return 9-tuple in GMT

134

return (year, month, day, hour, minute, 0, 0, 0, 0)

135

except (ValueError, OverflowError):

136

return None

137

138

# Register the custom handler

139

feedparser.registerDateHandler(parse_custom_date)

140

141

# Now feeds with "DD/MM/YYYY HH:MM" dates will be parsed correctly

142

result = feedparser.parse(feed_with_custom_dates)

143

```

144

145

### Advanced Date Handler

146

147

```python

148

import dateutil.parser

149

import feedparser

150

151

def parse_flexible_date(date_string):

152

"""

153

Use dateutil for flexible date parsing as a fallback.

154

"""

155

if not date_string:

156

return None

157

158

try:

159

# dateutil can parse many formats

160

dt = dateutil.parser.parse(date_string)

161

162

# Convert to GMT if timezone-aware

163

if dt.tzinfo:

164

dt = dt.astimezone(dateutil.tz.UTC)

165

166

# Return as 9-tuple

167

return dt.timetuple()

168

except (ValueError, TypeError, OverflowError):

169

return None

170

171

# Register as fallback handler (will be tried first due to LIFO order)

172

feedparser.registerDateHandler(parse_flexible_date)

173

```

174

175

### Using Parsed Dates

176

177

```python

178

import feedparser

179

180

# Date parsing is handled automatically by feedparser.parse()

181

# You don't need to call date parsing functions directly

182

183

result = feedparser.parse("https://example.com/feed.xml")

184

for entry in result.entries:

185

if hasattr(entry, 'published_parsed') and entry.published_parsed:

186

import time

187

readable = time.strftime('%Y-%m-%d %H:%M:%S UTC', entry.published_parsed)

188

print(f"Published: {readable}")

189

else:

190

print("No handler could parse the date")

191

```

192

193

## Date Handler Registration Order

194

195

Date handlers are tried in **reverse registration order** (LIFO - Last In, First Out):

196

197

```python

198

import feedparser

199

200

def handler1(date): return None # Register first

201

def handler2(date): return None # Register second

202

def handler3(date): return None # Register third

203

204

feedparser.registerDateHandler(handler1)

205

feedparser.registerDateHandler(handler2)

206

feedparser.registerDateHandler(handler3)

207

208

# When parsing, handlers are tried in this order:

209

# 1. handler3 (most recently registered)

210

# 2. handler2

211

# 3. handler1 (least recently registered)

212

# 4. Built-in handlers (in their predefined order)

213

```

214

215

## Built-in Handler Order

216

217

Built-in date handlers are registered in this order (and thus tried in reverse):

218

219

1. W3C Date and Time Format (_parse_date_w3dtf)

220

2. RFC 822 format (_parse_date_rfc822)

221

3. ISO 8601 format (_parse_date_iso8601)

222

4. Unix asctime format (_parse_date_asctime)

223

5. Perforce format (_parse_date_perforce)

224

6. Hungarian format (_parse_date_hungarian)

225

7. Greek format (_parse_date_greek)

226

8. Korean Nate format (_parse_date_nate)

227

9. Korean OnBlog format (_parse_date_onblog)

228

229

So W3C format is tried first, OnBlog format is tried last.

230

231

## Common Date Fields

232

233

Both feed-level and entry-level objects may contain these date fields:

234

235

### Feed-Level Dates

236

237

```python

238

feed = result.feed

239

240

# Publication dates

241

feed.published # Publication date string

242

feed.published_parsed # Parsed publication date tuple

243

244

# Update dates

245

feed.updated # Last updated date string

246

feed.updated_parsed # Parsed last updated date tuple

247

```

248

249

### Entry-Level Dates

250

251

```python

252

for entry in result.entries:

253

# Publication dates

254

entry.published # Publication date string

255

entry.published_parsed # Parsed publication date tuple

256

257

# Update dates

258

entry.updated # Last updated date string

259

entry.updated_parsed # Parsed last updated date tuple

260

261

# Creation dates (rare)

262

entry.created # Creation date string

263

entry.created_parsed # Parsed creation date tuple

264

265

# Expiration dates (rare)

266

entry.expired # Expiration date string

267

entry.expired_parsed # Parsed expiration date tuple

268

```

269

270

## Error Handling

271

272

Date parsing is designed to be fault-tolerant:

273

274

```python

275

result = feedparser.parse(url)

276

277

# Always check if dates were successfully parsed

278

for entry in result.entries:

279

if entry.published_parsed:

280

# Date was successfully parsed

281

print(f"Published: {entry.published}")

282

else:

283

# Date parsing failed or no date present

284

print(f"No valid publication date found")

285

if hasattr(entry, 'published'):

286

print(f"Raw date string: {entry.published}")

287

```

288

289

## Time Zone Handling

290

291

All parsed dates are normalized to GMT (UTC):

292

293

```python

294

# All *_parsed dates are in GMT regardless of original timezone

295

if entry.published_parsed:

296

gmt_tuple = entry.published_parsed

297

298

# Convert to local time if needed

299

import time

300

local_timestamp = time.mktime(gmt_tuple)

301

local_time = time.localtime(local_timestamp)

302

303

print(f"GMT: {time.strftime('%Y-%m-%d %H:%M:%S', gmt_tuple)}")

304

print(f"Local: {time.strftime('%Y-%m-%d %H:%M:%S', local_time)}")

305

```

306

307

## Legacy Date Compatibility

308

309

FeedParserDict provides backward compatibility for legacy date field names:

310

311

```python

312

# These all refer to the same data:

313

entry.updated # Modern name

314

entry.modified # Legacy RSS name

315

entry.date # Very old legacy name

316

317

entry.updated_parsed # Modern name

318

entry.modified_parsed # Legacy RSS name

319

entry.date_parsed # Very old legacy name

320

```