or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-feedparser

Universal feed parser for RSS, Atom, and CDF feeds with comprehensive format support and robust parsing capabilities

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/feedparser@6.0.x

To install, run

npx @tessl/cli install tessl/pypi-feedparser@6.0.0

0

# Feedparser

1

2

A universal Python library for parsing RSS, Atom, and CDF feeds with comprehensive format support. Feedparser handles multiple feed formats (RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, Atom 1.0) and provides robust parsing capabilities with automatic encoding detection, HTML sanitization, and graceful error handling.

3

4

## Package Information

5

6

- **Package Name**: feedparser

7

- **Language**: Python

8

- **Installation**: `pip install feedparser`

9

10

## Core Imports

11

12

```python

13

import feedparser

14

```

15

16

## Basic Usage

17

18

```python

19

import feedparser

20

21

# Parse a feed from URL

22

result = feedparser.parse('https://example.com/feed.xml')

23

24

# Access feed metadata

25

print(result.feed.title)

26

print(result.feed.description)

27

print(result.feed.link)

28

29

# Access entries/items

30

for entry in result.entries:

31

print(entry.title)

32

print(entry.summary)

33

print(entry.link)

34

print(entry.published)

35

36

# Check for parsing errors

37

if result.bozo:

38

print(f"Feed had parsing issues: {result.bozo_exception}")

39

```

40

41

## Architecture

42

43

Feedparser uses a flexible parsing architecture that supports both strict XML parsing and lenient HTML-style parsing:

44

45

- **Dual Parser System**: Automatic selection between strict XML parsing and lenient HTML-style parsing

46

- **Format Detection**: Automatic detection of RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 formats

47

- **Enhanced Dictionary**: FeedParserDict provides backward compatibility and attribute-style access

48

- **Namespace Support**: Handles various XML namespaces (Dublin Core, iTunes, Media RSS, GeoRSS, etc.)

49

- **Character Encoding**: Automatic encoding detection with UTF-8 conversion and fallback handling

50

51

## Capabilities

52

53

### Core Parsing

54

55

Main feed parsing functionality with support for multiple input sources, HTTP features, and extensive configuration options.

56

57

```python { .api }

58

def parse(url_file_stream_or_string, etag=None, modified=None, agent=None, referrer=None, handlers=None, request_headers=None, response_headers=None, resolve_relative_uris=None, sanitize_html=None):

59

"""

60

Parse a feed from URL, file, stream, or string.

61

62

Args:

63

url_file_stream_or_string: Feed source (URL, file path, file-like object, or string)

64

etag (str, optional): HTTP ETag for conditional requests

65

modified (str/datetime/tuple, optional): Last-Modified date for conditional requests

66

agent (str, optional): HTTP User-Agent header

67

referrer (str, optional): HTTP Referer header

68

handlers (list, optional): Custom urllib handlers

69

request_headers (dict, optional): Additional HTTP request headers

70

response_headers (dict, optional): Override/supplement response headers

71

resolve_relative_uris (bool, optional): Enable relative URI resolution

72

sanitize_html (bool, optional): Enable HTML sanitization

73

74

Returns:

75

FeedParserDict: Parsed feed data with feed metadata and entries

76

"""

77

```

78

79

[Parsing](./parsing.md)

80

81

### Data Structures

82

83

Comprehensive feed data structures with normalized access to feed metadata, entries, and all feed elements across different formats.

84

85

```python { .api }

86

class FeedParserDict(dict):

87

"""Enhanced dictionary with attribute access and legacy key mapping."""

88

89

def __getitem__(self, key): ...

90

def __contains__(self, key): ...

91

def get(self, key, default=None): ...

92

def __getattr__(self, key): ...

93

```

94

95

[Data Structures](./data-structures.md)

96

97

### Date Handling

98

99

Date parsing system supporting multiple date formats with extensible custom date handler registration.

100

101

```python { .api }

102

def registerDateHandler(func):

103

"""

104

Register a custom date handler function.

105

106

Args:

107

func: Function that takes date string, returns 9-tuple date in GMT

108

"""

109

```

110

111

[Date Handling](./date-handling.md)

112

113

### HTTP Features

114

115

HTTP client capabilities including conditional requests, authentication, custom headers, and redirect handling.

116

117

```python { .api }

118

# Configuration constants

119

USER_AGENT: str # Default HTTP User-Agent header

120

RESOLVE_RELATIVE_URIS: int # Global URI resolution setting

121

SANITIZE_HTML: int # Global HTML sanitization setting

122

123

# Package metadata constants

124

__author__: str # Package author information

125

__license__: str # Package license type

126

__version__: str # Package version string

127

```

128

129

[HTTP Features](./http-features.md)

130

131

### Error Handling

132

133

Exception system for parsing errors, encoding issues, and malformed content with graceful degradation.

134

135

```python { .api }

136

class ThingsNobodyCaresAboutButMe(Exception): ...

137

class CharacterEncodingOverride(ThingsNobodyCaresAboutButMe): ...

138

class CharacterEncodingUnknown(ThingsNobodyCaresAboutButMe): ...

139

class NonXMLContentType(ThingsNobodyCaresAboutButMe): ...

140

class UndeclaredNamespace(Exception): ...

141

```

142

143

[Error Handling](./error-handling.md)

144

145

## Types

146

147

```python { .api }

148

# Feed parsing result structure

149

FeedParserDict = {

150

'bozo': bool, # True if feed had parsing issues

151

'bozo_exception': Exception, # Exception if parsing errors occurred

152

'encoding': str, # Character encoding used

153

'etag': str, # HTTP ETag from response

154

'headers': dict, # HTTP response headers

155

'href': str, # Final URL after redirects

156

'modified': str, # HTTP Last-Modified header

157

'namespaces': dict, # XML namespaces used

158

'status': int, # HTTP status code

159

'version': str, # Feed format version (e.g., 'rss20', 'atom10')

160

'entries': list, # List of entry/item dictionaries

161

'feed': dict, # Feed-level metadata

162

}

163

```