0
# Wikipedia-API
1
2
A comprehensive Python wrapper for Wikipedia's API that provides easy access to page content, sections, links, categories, and translations. This library enables developers to extract structured information from Wikipedia articles across all language editions, with support for various content formats, automatic redirect handling, and robust error management.
3
4
## Package Information
5
6
- **Package Name**: Wikipedia-API
7
- **Language**: Python
8
- **Installation**: `pip install wikipedia-api`
9
10
## Core Imports
11
12
```python
13
import wikipediaapi
14
```
15
16
## Basic Usage
17
18
```python
19
import wikipediaapi
20
21
# Initialize Wikipedia object with required user agent and language
22
wiki = wikipediaapi.Wikipedia(
23
user_agent='MyProject/1.0 (contact@example.com)',
24
language='en'
25
)
26
27
# Get a Wikipedia page
28
page = wiki.page('Python_(programming_language)')
29
30
# Check if page exists and get basic information
31
if page.exists():
32
print(f"Title: {page.title}")
33
print(f"Summary: {page.summary[:100]}...")
34
print(f"URL: {page.fullurl}")
35
36
# Access page content
37
print(f"Full text length: {len(page.text)}")
38
39
# Get page sections
40
for section in page.sections:
41
print(f"Section: {section.title} (level {section.level})")
42
43
# Get related pages
44
print(f"Categories: {len(page.categories)}")
45
print(f"Links: {len(page.links)}")
46
print(f"Language versions: {len(page.langlinks)}")
47
```
48
49
## Architecture
50
51
Wikipedia-API uses a lazy-loading design with three main components:
52
53
- **Wikipedia**: Main API wrapper that manages sessions, configurations, and makes API calls to Wikipedia's servers
54
- **WikipediaPage**: Represents individual Wikipedia pages with lazy-loaded properties for content, links, and metadata
55
- **WikipediaPageSection**: Hierarchical representation of page sections with nested subsections and text content
56
57
The library automatically handles API pagination, redirects, and provides both WIKI and HTML extraction formats. All content is fetched on-demand when properties are accessed, enabling efficient usage patterns.
58
59
## Capabilities
60
61
### Wikipedia API Wrapper
62
63
Core functionality for initializing Wikipedia API connections, configuring extraction formats, language settings, and creating page objects. Provides the foundation for all Wikipedia data access.
64
65
```python { .api }
66
class Wikipedia:
67
def __init__(
68
self,
69
user_agent: str,
70
language: str = "en",
71
variant: Optional[str] = None,
72
extract_format: ExtractFormat = ExtractFormat.WIKI,
73
headers: Optional[dict[str, Any]] = None,
74
extra_api_params: Optional[dict[str, Any]] = None,
75
**request_kwargs
76
): ...
77
78
def page(
79
self,
80
title: str,
81
ns: WikiNamespace = Namespace.MAIN,
82
unquote: bool = False
83
) -> WikipediaPage: ...
84
85
def article(
86
self,
87
title: str,
88
ns: WikiNamespace = Namespace.MAIN,
89
unquote: bool = False
90
) -> WikipediaPage: ... # Alias for page()
91
92
def extracts(self, page: WikipediaPage, **kwargs) -> str: ...
93
94
def info(self, page: WikipediaPage) -> WikipediaPage: ...
95
96
def langlinks(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...
97
98
def links(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...
99
100
def backlinks(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...
101
102
def categories(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...
103
104
def categorymembers(self, page: WikipediaPage, **kwargs) -> dict[str, WikipediaPage]: ...
105
```
106
107
[Wikipedia API Wrapper](./wikipedia-wrapper.md)
108
109
### Content Extraction
110
111
Extract and access Wikipedia page content including summaries, full text, sections, and hierarchical page structure. Supports both WIKI and HTML formats with automatic section parsing.
112
113
```python { .api }
114
class WikipediaPage:
115
@property
116
def title(self) -> str: ...
117
118
@property
119
def language(self) -> str: ...
120
121
@property
122
def variant(self) -> Optional[str]: ...
123
124
@property
125
def namespace(self) -> int: ...
126
127
@property
128
def pageid(self) -> int: ... # -1 if page doesn't exist
129
130
@property
131
def fullurl(self) -> str: ...
132
133
@property
134
def canonicalurl(self) -> str: ...
135
136
@property
137
def displaytitle(self) -> str: ...
138
139
def exists(self) -> bool: ...
140
141
@property
142
def summary(self) -> str: ...
143
144
@property
145
def text(self) -> str: ...
146
147
@property
148
def sections(self) -> list[WikipediaPageSection]: ...
149
150
def section_by_title(self, title: str) -> Optional[WikipediaPageSection]: ...
151
152
def sections_by_title(self, title: str) -> list[WikipediaPageSection]: ...
153
154
class WikipediaPageSection:
155
@property
156
def title(self) -> str: ...
157
158
@property
159
def text(self) -> str: ...
160
161
@property
162
def level(self) -> int: ...
163
164
@property
165
def sections(self) -> list[WikipediaPageSection]: ...
166
167
def section_by_title(self, title: str) -> Optional[WikipediaPageSection]: ...
168
169
def full_text(self, level: int = 1) -> str: ...
170
```
171
172
[Content Extraction](./content-extraction.md)
173
174
### Page Navigation
175
176
Access Wikipedia's link structure including internal page links, backlinks, and language translations. Enables navigation between related pages and discovery of page relationships.
177
178
```python { .api }
179
class WikipediaPage:
180
@property
181
def links(self) -> dict[str, WikipediaPage]: ...
182
183
@property
184
def backlinks(self) -> dict[str, WikipediaPage]: ...
185
186
@property
187
def langlinks(self) -> dict[str, WikipediaPage]: ...
188
```
189
190
[Page Navigation](./page-navigation.md)
191
192
### Categories
193
194
Work with Wikipedia's category system including page categories and category membership. Enables discovery of related content and hierarchical organization navigation.
195
196
```python { .api }
197
class WikipediaPage:
198
@property
199
def categories(self) -> dict[str, WikipediaPage]: ...
200
201
@property
202
def categorymembers(self) -> dict[str, WikipediaPage]: ...
203
```
204
205
[Categories](./categories.md)
206
207
## Types and Constants
208
209
```python { .api }
210
class ExtractFormat(IntEnum):
211
WIKI = 1 # Wiki format (allows recognizing subsections)
212
HTML = 2 # HTML format (allows retrieval of HTML tags)
213
214
class Namespace(IntEnum):
215
MAIN = 0
216
TALK = 1
217
USER = 2
218
USER_TALK = 3
219
WIKIPEDIA = 4
220
WIKIPEDIA_TALK = 5
221
FILE = 6
222
FILE_TALK = 7
223
MEDIAWIKI = 8
224
MEDIAWIKI_TALK = 9
225
TEMPLATE = 10
226
TEMPLATE_TALK = 11
227
HELP = 12
228
HELP_TALK = 13
229
CATEGORY = 14
230
CATEGORY_TALK = 15
231
PORTAL = 100
232
PORTAL_TALK = 101
233
PROJECT = 102
234
PROJECT_TALK = 103
235
REFERENCE = 104
236
REFERENCE_TALK = 105
237
BOOK = 108
238
BOOK_TALK = 109
239
DRAFT = 118
240
DRAFT_TALK = 119
241
EDUCATION_PROGRAM = 446
242
EDUCATION_PROGRAM_TALK = 447
243
TIMED_TEXT = 710
244
TIMED_TEXT_TALK = 711
245
MODULE = 828
246
MODULE_TALK = 829
247
GADGET = 2300
248
GADGET_TALK = 2301
249
GADGET_DEFINITION = 2302
250
GADGET_DEFINITION_TALK = 2303
251
252
# Type aliases
253
PagesDict = dict[str, WikipediaPage]
254
WikiNamespace = Union[Namespace, int]
255
256
# Utility function
257
def namespace2int(namespace: WikiNamespace) -> int: ...
258
```