or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-structures.mddate-handling.mderror-handling.mdhttp-features.mdindex.mdparsing.md

data-structures.mddocs/

0

# Data Structures

1

2

Feedparser provides comprehensive data structures for accessing feed content with normalized field names across different feed formats. The main result structure contains feed metadata, entries, and parsing information.

3

4

## Capabilities

5

6

### FeedParserDict Class

7

8

Enhanced dictionary providing attribute-style access and backward compatibility with legacy field names.

9

10

```python { .api }

11

class FeedParserDict(dict):

12

"""

13

Enhanced dictionary with attribute access and legacy key mapping.

14

15

Provides backward compatibility by mapping old RSS field names to

16

modern equivalents and supports both dict-style and attribute-style access.

17

"""

18

19

def __getitem__(self, key):

20

"""

21

Get item with legacy key mapping support.

22

23

Special handling for:

24

- 'category': Returns first tag term

25

- 'enclosures': Returns links with rel='enclosure'

26

- 'license': Returns first license link href

27

- 'updated'/'updated_parsed': Falls back to published if not present

28

29

Returns:

30

Value for the key, with legacy key mapping applied

31

"""

32

33

def __contains__(self, key):

34

"""Check if key exists, with legacy mapping support."""

35

36

def get(self, key, default=None):

37

"""Get item with default, using legacy key mapping."""

38

39

def __getattr__(self, key):

40

"""Enable attribute-style access (result.feed.title)."""

41

42

def __setitem__(self, key, value):

43

"""Set item with legacy key mapping."""

44

45

def setdefault(self, k, default):

46

"""Set default value if key doesn't exist."""

47

```

48

49

### Legacy Key Mapping

50

51

FeedParserDict automatically maps legacy RSS field names to modern equivalents:

52

53

```python

54

# Legacy key mappings (automatically handled)

55

keymap = {

56

'channel': 'feed',

57

'items': 'entries',

58

'guid': 'id',

59

'date': 'updated',

60

'date_parsed': 'updated_parsed',

61

'description': ['summary', 'subtitle'],

62

'description_detail': ['summary_detail', 'subtitle_detail'],

63

'url': ['href'],

64

'modified': 'updated',

65

'modified_parsed': 'updated_parsed',

66

'issued': 'published',

67

'issued_parsed': 'published_parsed',

68

'copyright': 'rights',

69

'copyright_detail': 'rights_detail',

70

'tagline': 'subtitle',

71

'tagline_detail': 'subtitle_detail',

72

}

73

```

74

75

## Top-Level Result Structure

76

77

The `parse()` function returns a FeedParserDict with these top-level properties:

78

79

### Parsing Information

80

81

```python { .api }

82

# Parsing status and metadata

83

result = {

84

'bozo': bool, # True if feed had parsing issues

85

'bozo_exception': Exception, # Exception object if errors occurred

86

'encoding': str, # Character encoding used (e.g., 'utf-8')

87

'version': str, # Feed format version (e.g., 'rss20', 'atom10')

88

'namespaces': dict, # XML namespaces used in feed

89

}

90

```

91

92

### HTTP Information

93

94

```python { .api }

95

# HTTP response data (when parsing from URL)

96

result = {

97

'etag': str, # HTTP ETag header

98

'headers': dict, # All HTTP response headers

99

'href': str, # Final URL after redirects

100

'modified': str, # HTTP Last-Modified header

101

'status': int, # HTTP status code

102

}

103

```

104

105

### Feed Content

106

107

```python { .api }

108

# Feed content structure

109

result = {

110

'feed': FeedParserDict, # Feed-level metadata

111

'entries': list, # List of entry/item FeedParserDict objects

112

}

113

```

114

115

## Feed-Level Structure (result.feed)

116

117

Feed metadata contains comprehensive information about the feed itself:

118

119

### Identity and Basic Information

120

121

```python { .api }

122

feed = {

123

'title': str, # Feed title

124

'title_detail': { # Detailed title information

125

'type': str, # Content type ('text', 'html', 'xhtml')

126

'language': str, # Language code

127

'base': str, # Base URI

128

'value': str, # Title content

129

},

130

'link': str, # Main feed/site URL

131

'links': [ # All feed links

132

{

133

'rel': str, # Relationship ('alternate', 'self', etc.)

134

'type': str, # MIME type

135

'href': str, # URL

136

'title': str, # Link title (optional)

137

}

138

],

139

'id': str, # Unique feed identifier

140

'description': str, # Feed description (RSS)

141

'subtitle': str, # Feed subtitle (Atom)

142

'subtitle_detail': dict, # Detailed subtitle information

143

'language': str, # Feed language code

144

}

145

```

146

147

### Authorship and Publication

148

149

```python { .api }

150

feed = {

151

'author': str, # Primary author name

152

'author_detail': { # Detailed author information

153

'name': str, # Author name

154

'email': str, # Author email

155

'href': str, # Author URL

156

},

157

'contributors': [ # List of contributor objects

158

{

159

'name': str,

160

'email': str,

161

'href': str,

162

}

163

],

164

'publisher': str, # Publisher name

165

'publisher_detail': { # Detailed publisher information

166

'name': str,

167

'email': str,

168

'href': str,

169

},

170

'generator': str, # Feed generator software

171

'generator_detail': { # Detailed generator information

172

'name': str,

173

'version': str,

174

'href': str,

175

},

176

}

177

```

178

179

### Dates and Updates

180

181

```python { .api }

182

feed = {

183

'updated': str, # Last updated timestamp (string)

184

'updated_parsed': tuple, # Parsed time as 9-tuple in GMT

185

'published': str, # Publication timestamp (string)

186

'published_parsed': tuple, # Parsed publication time as 9-tuple

187

}

188

```

189

190

### Rights and Legal

191

192

```python { .api }

193

feed = {

194

'rights': str, # Copyright/rights statement

195

'rights_detail': { # Detailed rights information

196

'type': str,

197

'language': str,

198

'base': str,

199

'value': str,

200

},

201

}

202

```

203

204

### Visual Elements

205

206

```python { .api }

207

feed = {

208

'image': { # Feed image/logo (RSS)

209

'title': str, # Image title

210

'url': str, # Image URL

211

'link': str, # Image link target

212

'width': int, # Image width

213

'height': int, # Image height

214

'description': str, # Image description

215

},

216

'icon': str, # Feed icon URL (Atom)

217

'logo': str, # Feed logo URL (Atom)

218

}

219

```

220

221

### RSS-Specific Elements

222

223

```python { .api }

224

feed = {

225

'ttl': int, # Time-to-live (cache duration in minutes)

226

'cloud': { # RSS cloud notification

227

'domain': str,

228

'port': int,

229

'path': str,

230

'registerprocedure': str,

231

'protocol': str,

232

},

233

'textinput': { # RSS text input box

234

'title': str,

235

'description': str,

236

'name': str,

237

'link': str,

238

},

239

'docs': str, # Documentation URL

240

}

241

```

242

243

### Categories and Tags

244

245

```python { .api }

246

feed = {

247

'tags': [ # List of categories/tags

248

{

249

'term': str, # Category term

250

'scheme': str, # Category scheme/domain

251

'label': str, # Human-readable label

252

}

253

],

254

}

255

```

256

257

## Entry-Level Structure (result.entries[n])

258

259

Each entry/item in the feed contains detailed article information:

260

261

### Identity and Content

262

263

```python { .api }

264

entry = {

265

'title': str, # Entry title

266

'title_detail': dict, # Detailed title information

267

'link': str, # Main entry URL

268

'links': list, # All entry links

269

'id': str, # Unique entry identifier

270

'summary': str, # Entry summary/description

271

'summary_detail': dict, # Detailed summary information

272

'content': [ # Entry content blocks

273

{

274

'type': str, # Content type ('text', 'html', 'xhtml')

275

'language': str, # Content language

276

'base': str, # Base URI

277

'value': str, # Content text

278

}

279

],

280

}

281

```

282

283

### Authorship

284

285

```python { .api }

286

entry = {

287

'author': str, # Primary author name

288

'author_detail': dict, # Detailed author information

289

'contributors': list, # List of contributor objects

290

'publisher': str, # Publisher name

291

'publisher_detail': dict, # Detailed publisher information

292

}

293

```

294

295

### Dates

296

297

```python { .api }

298

entry = {

299

'updated': str, # Last updated timestamp

300

'updated_parsed': tuple, # Parsed updated time as 9-tuple

301

'published': str, # Publication timestamp

302

'published_parsed': tuple, # Parsed publication time as 9-tuple

303

'created': str, # Creation timestamp (rare)

304

'created_parsed': tuple, # Parsed creation time as 9-tuple

305

'expired': str, # Expiration timestamp (rare)

306

'expired_parsed': tuple, # Parsed expiration time as 9-tuple

307

}

308

```

309

310

### Media and Attachments

311

312

```python { .api }

313

entry = {

314

'enclosures': [ # Attached files (podcasts, etc.)

315

{

316

'href': str, # File URL

317

'type': str, # MIME type

318

'length': str, # File size in bytes

319

}

320

],

321

}

322

```

323

324

### Categories and Classification

325

326

```python { .api }

327

entry = {

328

'tags': [ # Entry categories/tags

329

{

330

'term': str, # Tag term

331

'scheme': str, # Tag scheme/domain

332

'label': str, # Human-readable label

333

}

334

],

335

}

336

```

337

338

### Comments and Interaction

339

340

```python { .api }

341

entry = {

342

'comments': str, # Comments URL

343

'license': str, # Content license URL

344

}

345

```

346

347

### Source Attribution

348

349

```python { .api }

350

entry = {

351

'source': { # Original source information

352

'title': str, # Source feed title

353

'href': str, # Source feed URL

354

'value': str, # Source description

355

},

356

}

357

```

358

359

## Usage Examples

360

361

### Basic Data Access

362

363

```python

364

result = feedparser.parse(url)

365

366

# Feed information

367

print(f"Feed: {result.feed.title}")

368

print(f"Description: {result.feed.description}")

369

print(f"Last updated: {result.feed.updated}")

370

371

# Entry information

372

for entry in result.entries:

373

print(f"Title: {entry.title}")

374

print(f"Link: {entry.link}")

375

print(f"Published: {entry.published}")

376

print(f"Summary: {entry.summary}")

377

```

378

379

### Attribute vs Dictionary Access

380

381

```python

382

# Both styles work identically

383

title1 = result.feed.title

384

title2 = result.feed['title']

385

title3 = result['feed']['title']

386

387

# All three methods return the same value

388

assert title1 == title2 == title3

389

```

390

391

### Legacy Key Compatibility

392

393

```python

394

# Legacy RSS keys automatically map to modern equivalents

395

description = result.feed.description # RSS 'description'

396

subtitle = result.feed.subtitle # Atom 'subtitle'

397

# Both may return the same content depending on feed format

398

399

# Legacy item access

400

items = result.items # Maps to result.entries

401

guid = entry.guid # Maps to entry.id

402

```

403

404

### Content Type Handling

405

406

```python

407

# Check content types for proper rendering

408

if entry.title_detail.type == 'html':

409

# Contains HTML markup

410

html_title = entry.title

411

elif entry.title_detail.type == 'text':

412

# Plain text only

413

text_title = entry.title

414

415

# Handle multiple content blocks

416

for content_block in entry.content:

417

if content_block.type == 'html':

418

html_content = content_block.value

419

elif content_block.type == 'text':

420

text_content = content_block.value

421

```

422

423

### Safe Content Access

424

425

```python

426

# Use .get() for optional fields

427

author = entry.get('author', 'Unknown')

428

published = entry.get('published', 'Date not available')

429

430

# Check for field existence

431

if 'enclosures' in entry:

432

for enclosure in entry.enclosures:

433

print(f"Attachment: {enclosure.href}")

434

435

# Handle missing nested fields

436

if hasattr(entry, 'author_detail') and entry.author_detail:

437

email = entry.author_detail.get('email', 'No email')

438

```