or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

categories.mdcontent-extraction.mdindex.mdpage-navigation.mdwikipedia-wrapper.md

index.mddocs/

0

# Wikipedia-API

1

2

A comprehensive Python wrapper for Wikipedia's API that provides easy access to page content, sections, links, categories, and translations. This library enables developers to extract structured information from Wikipedia articles across all language editions, with support for various content formats, automatic redirect handling, and robust error management.

3

4

## Package Information

5

6

- **Package Name**: Wikipedia-API

7

- **Language**: Python

8

- **Installation**: `pip install wikipedia-api`

9

10

## Core Imports

11

12

```python

13

import wikipediaapi

14

```

15

16

## Basic Usage

17

18

```python

19

import wikipediaapi

20

21

# Initialize Wikipedia object with required user agent and language

22

wiki = wikipediaapi.Wikipedia(

23

user_agent='MyProject/1.0 (contact@example.com)',

24

language='en'

25

)

26

27

# Get a Wikipedia page

28

page = wiki.page('Python_(programming_language)')

29

30

# Check if page exists and get basic information

31

if page.exists():

32

print(f"Title: {page.title}")

33

print(f"Summary: {page.summary[:100]}...")

34

print(f"URL: {page.fullurl}")

35

36

# Access page content

37

print(f"Full text length: {len(page.text)}")

38

39

# Get page sections

40

for section in page.sections:

41

print(f"Section: {section.title} (level {section.level})")

42

43

# Get related pages

44

print(f"Categories: {len(page.categories)}")

45

print(f"Links: {len(page.links)}")

46

print(f"Language versions: {len(page.langlinks)}")

47

```

48

49

## Architecture

50

51

Wikipedia-API uses a lazy-loading design with three main components:

52

53

- **Wikipedia**: Main API wrapper that manages sessions, configurations, and makes API calls to Wikipedia's servers

54

- **WikipediaPage**: Represents individual Wikipedia pages with lazy-loaded properties for content, links, and metadata

55

- **WikipediaPageSection**: Hierarchical representation of page sections with nested subsections and text content

56

57

The library automatically handles API pagination, redirects, and provides both WIKI and HTML extraction formats. All content is fetched on-demand when properties are accessed, enabling efficient usage patterns.

58

59

## Capabilities

60

61

### Wikipedia API Wrapper

62

63

Core functionality for initializing Wikipedia API connections, configuring extraction formats, language settings, and creating page objects. Provides the foundation for all Wikipedia data access.

64

65

```python { .api }

66

class Wikipedia:

67

def __init__(

68

self,

69

user_agent: str,

70

language: str = "en",

71

variant: Optional[str] = None,

72

extract_format: ExtractFormat = ExtractFormat.WIKI,

73

headers: Optional[dict[str, Any]] = None,

74

extra_api_params: Optional[dict[str, Any]] = None,

75

**request_kwargs

76

): ...

77

78

def page(

79

self,

80

title: str,

81

ns: WikiNamespace = Namespace.MAIN,

82

unquote: bool = False

83

) -> WikipediaPage: ...

84

85

def article(

86

self,

87

title: str,

88

ns: WikiNamespace = Namespace.MAIN,

89

unquote: bool = False

90

) -> WikipediaPage: ... # Alias for page()

91

92

def extracts(self, page: WikipediaPage, **kwargs) -> str: ...

93

94

def info(self, page: WikipediaPage) -> WikipediaPage: ...

95

96

def langlinks(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...

97

98

def links(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...

99

100

def backlinks(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...

101

102

def categories(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...

103

104

def categorymembers(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...

105

```

106

107

[Wikipedia API Wrapper](./wikipedia-wrapper.md)

108

109

### Content Extraction

110

111

Extract and access Wikipedia page content including summaries, full text, sections, and hierarchical page structure. Supports both WIKI and HTML formats with automatic section parsing.

112

113

```python { .api }

114

class WikipediaPage:

115

@property

116

def title(self) -> str: ...

117

118

@property

119

def language(self) -> str: ...

120

121

@property

122

def variant(self) -> Optional[str]: ...

123

124

@property

125

def namespace(self) -> int: ...

126

127

@property

128

def pageid(self) -> int: ... # -1 if page doesn't exist

129

130

@property

131

def fullurl(self) -> str: ...

132

133

@property

134

def canonicalurl(self) -> str: ...

135

136

@property

137

def displaytitle(self) -> str: ...

138

139

def exists(self) -> bool: ...

140

141

@property

142

def summary(self) -> str: ...

143

144

@property

145

def text(self) -> str: ...

146

147

@property

148

def sections(self) -> list[WikipediaPageSection]: ...

149

150

def section_by_title(self, title: str) -> Optional[WikipediaPageSection]: ...

151

152

def sections_by_title(self, title: str) -> list[WikipediaPageSection]: ...

153

154

class WikipediaPageSection:

155

@property

156

def title(self) -> str: ...

157

158

@property

159

def text(self) -> str: ...

160

161

@property

162

def level(self) -> int: ...

163

164

@property

165

def sections(self) -> list[WikipediaPageSection]: ...

166

167

def section_by_title(self, title: str) -> Optional[WikipediaPageSection]: ...

168

169

def full_text(self, level: int = 1) -> str: ...

170

```

171

172

[Content Extraction](./content-extraction.md)

173

174

### Page Navigation

175

176

Access Wikipedia's link structure including internal page links, backlinks, and language translations. Enables navigation between related pages and discovery of page relationships.

177

178

```python { .api }

179

class WikipediaPage:

180

@property

181

def links(self) -> dict[str, WikipediaPage]: ...

182

183

@property

184

def backlinks(self) -> dict[str, WikipediaPage]: ...

185

186

@property

187

def langlinks(self) -> dict[str, WikipediaPage]: ...

188

```

189

190

[Page Navigation](./page-navigation.md)

191

192

### Categories

193

194

Work with Wikipedia's category system including page categories and category membership. Enables discovery of related content and hierarchical organization navigation.

195

196

```python { .api }

197

class WikipediaPage:

198

@property

199

def categories(self) -> dict[str, WikipediaPage]: ...

200

201

@property

202

def categorymembers(self) -> dict[str, WikipediaPage]: ...

203

```

204

205

[Categories](./categories.md)

206

207

## Types and Constants

208

209

```python { .api }

210

class ExtractFormat(IntEnum):

211

WIKI = 1 # Wiki format (allows recognizing subsections)

212

HTML = 2 # HTML format (allows retrieval of HTML tags)

213

214

class Namespace(IntEnum):

215

MAIN = 0

216

TALK = 1

217

USER = 2

218

USER_TALK = 3

219

WIKIPEDIA = 4

220

WIKIPEDIA_TALK = 5

221

FILE = 6

222

FILE_TALK = 7

223

MEDIAWIKI = 8

224

MEDIAWIKI_TALK = 9

225

TEMPLATE = 10

226

TEMPLATE_TALK = 11

227

HELP = 12

228

HELP_TALK = 13

229

CATEGORY = 14

230

CATEGORY_TALK = 15

231

PORTAL = 100

232

PORTAL_TALK = 101

233

PROJECT = 102

234

PROJECT_TALK = 103

235

REFERENCE = 104

236

REFERENCE_TALK = 105

237

BOOK = 108

238

BOOK_TALK = 109

239

DRAFT = 118

240

DRAFT_TALK = 119

241

EDUCATION_PROGRAM = 446

242

EDUCATION_PROGRAM_TALK = 447

243

TIMED_TEXT = 710

244

TIMED_TEXT_TALK = 711

245

MODULE = 828

246

MODULE_TALK = 829

247

GADGET = 2300

248

GADGET_TALK = 2301

249

GADGET_DEFINITION = 2302

250

GADGET_DEFINITION_TALK = 2303

251

252

# Type aliases

253

PagesDict = dict[str, WikipediaPage]

254

WikiNamespace = Union[Namespace, int]

255

256

# Utility function

257

def namespace2int(namespace: WikiNamespace) -> int: ...

258

```