or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-internetarchive

A Python interface to archive.org for programmatic access to the Internet Archive's digital library

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/internetarchive@5.5.x

To install, run

npx @tessl/cli install tessl/pypi-internetarchive@5.5.0

0

# Internet Archive Python Library

1

2

A comprehensive Python interface to archive.org for programmatic access to the Internet Archive's vast digital library. This library enables developers to search, download, upload, and manage items in the Internet Archive through both a Python API and command-line tools.

3

4

## Package Information

5

6

- **Package Name**: internetarchive

7

- **Language**: Python

8

- **Installation**: `pip install internetarchive`

9

- **Version**: 5.5.0

10

- **License**: AGPL-3.0

11

12

## Core Imports

13

14

```python

15

import internetarchive

16

```

17

18

Common imports for specific functionality:

19

20

```python

21

from internetarchive import get_item, search_items, get_session

22

from internetarchive import Item, Search, ArchiveSession

23

```

24

25

## Basic Usage

26

27

```python

28

import internetarchive

29

30

# Get an item from the Internet Archive

31

item = internetarchive.get_item('govlawgacode20071')

32

print(f"Item exists: {item.exists}")

33

print(f"Item title: {item.metadata.get('title')}")

34

35

# Download files from an item

36

item.download()

37

38

# Search for items

39

search = internetarchive.search_items('collection:nasa')

40

for result in search:

41

print(f"Found: {result['identifier']} - {result.get('title', 'No title')}")

42

43

# Upload files to create or update an item

44

internetarchive.upload('my-item-identifier',

45

files=['local-file.txt'],

46

metadata={'title': 'My Item', 'creator': 'Your Name'})

47

```

48

49

## Architecture

50

51

The Internet Archive Python library follows a layered architecture:

52

53

- **ArchiveSession**: Core session management with persistent configuration and authentication

54

- **Item/Collection**: Object-oriented representation of Archive.org items and collections

55

- **File**: Individual file objects within items with download and management capabilities

56

- **Search**: Powerful search interface with result iteration and filtering

57

- **Catalog**: Task management system for Archive.org operations

58

- **CLI Tools**: Comprehensive command-line interface via the `ia` command

59

60

This design enables both high-level convenience functions and low-level session-based access patterns, supporting everything from simple file downloads to complex metadata operations and bulk processing workflows.

61

62

## Capabilities

63

64

### Session Management

65

66

Create and manage persistent sessions with configuration, authentication, and HTTP adapter customization for efficient bulk operations.

67

68

```python { .api }

69

def get_session(config=None, config_file=None, debug=False, http_adapter_kwargs=None):

70

"""

71

Return a new ArchiveSession object for persistent configuration across tasks.

72

73

Args:

74

config (dict, optional): Configuration dictionary

75

config_file (str, optional): Path to configuration file

76

debug (bool): Enable debug logging

77

http_adapter_kwargs (dict, optional): HTTP adapter keyword arguments

78

79

Returns:

80

ArchiveSession: Session object for API interactions

81

"""

82

```

83

84

[Session Management](./session-management.md)

85

86

### Item Operations

87

88

Access, download, upload, and manage Archive.org items with comprehensive metadata support and file filtering capabilities.

89

90

```python { .api }

91

def get_item(identifier, config=None, config_file=None, archive_session=None, debug=False, http_adapter_kwargs=None, request_kwargs=None):

92

"""

93

Get an Item object by Archive.org identifier.

94

95

Args:

96

identifier (str): The globally unique Archive.org item identifier

97

config (dict, optional): Configuration dictionary

98

config_file (str, optional): Path to configuration file

99

archive_session (ArchiveSession, optional): Existing session object

100

debug (bool): Enable debug logging

101

http_adapter_kwargs (dict, optional): HTTP adapter kwargs

102

request_kwargs (dict, optional): Request kwargs

103

104

Returns:

105

Item: Item object for the specified identifier

106

"""

107

108

def upload(identifier, files, metadata=None, headers=None, access_key=None, secret_key=None, queue_derive=None, verbose=False, verify=False, checksum=False, delete=False, retries=None, retries_sleep=None, debug=False, validate_identifier=False, request_kwargs=None, **get_item_kwargs):

109

"""

110

Upload files to an Archive.org item (creates item if it doesn't exist).

111

112

Args:

113

identifier (str): Item identifier to upload to

114

files (list): List of file paths or file-like objects to upload

115

metadata (dict, optional): Item metadata

116

headers (dict, optional): HTTP headers

117

Various authentication and upload options...

118

119

Returns:

120

list: List of Request/Response objects from upload operations

121

"""

122

123

def download(identifier, files=None, formats=None, glob_pattern=None, dry_run=False, verbose=False, ignore_existing=False, checksum=False, checksum_archive=False, destdir=None, no_directory=False, retries=None, item_index=None, ignore_errors=False, on_the_fly=False, return_responses=False, no_change_timestamp=False, timeout=None, **get_item_kwargs):

124

"""

125

Download files from an Archive.org item with extensive filtering options.

126

127

Args:

128

identifier (str): Item identifier to download from

129

files (list, optional): Specific files to download

130

formats (list, optional): File formats to download

131

glob_pattern (str, optional): Glob pattern for file selection

132

Various download configuration options...

133

134

Returns:

135

list: List of Request/Response objects from download operations

136

"""

137

```

138

139

[Item Operations](./item-operations.md)

140

141

### Search Operations

142

143

Search the Internet Archive with advanced query syntax, field selection, sorting, and full-text search capabilities.

144

145

```python { .api }

146

def search_items(query, fields=None, sorts=None, params=None, full_text_search=False, dsl_fts=False, archive_session=None, config=None, config_file=None, http_adapter_kwargs=None, request_kwargs=None, max_retries=None):

147

"""

148

Search for items on Archive.org with advanced filtering options.

149

150

Args:

151

query (str): Search query string

152

fields (list, optional): Fields to return in results

153

sorts (list, optional): Sort criteria

154

params (dict, optional): Additional search parameters

155

full_text_search (bool): Enable full-text search

156

dsl_fts (bool): Enable DSL full-text search

157

Various session and request options...

158

159

Returns:

160

Search: Search object for iterating over results

161

"""

162

```

163

164

[Search Operations](./search-operations.md)

165

166

### File Management

167

168

Access and manage individual files within Archive.org items, including download, deletion, and metadata access.

169

170

```python { .api }

171

def get_files(identifier, files=None, formats=None, glob_pattern=None, exclude_pattern=None, on_the_fly=False, **get_item_kwargs):

172

"""

173

Get File objects from an item with optional filtering.

174

175

Args:

176

identifier (str): Item identifier

177

files (list, optional): Specific files to retrieve

178

formats (list, optional): File formats to filter by

179

glob_pattern (str, optional): Glob pattern for file selection

180

exclude_pattern (str, optional): Glob pattern for exclusion

181

on_the_fly (bool): Include on-the-fly files

182

183

Returns:

184

list: List of File objects

185

"""

186

187

def delete(identifier, files=None, formats=None, glob_pattern=None, cascade_delete=False, access_key=None, secret_key=None, verbose=False, debug=False, **kwargs):

188

"""

189

Delete files from an Archive.org item.

190

191

Args:

192

identifier (str): Item identifier

193

files (list, optional): Specific files to delete

194

formats (list, optional): File formats to delete

195

glob_pattern (str, optional): Glob pattern for file selection

196

cascade_delete (bool): Delete derived files

197

Various authentication and request options...

198

199

Returns:

200

list: List of Request/Response objects from delete operations

201

"""

202

```

203

204

[File Management](./file-management.md)

205

206

### Metadata Operations

207

208

View and modify item metadata with support for appending, targeting specific metadata sections, and batch operations.

209

210

```python { .api }

211

def modify_metadata(identifier, metadata, target=None, append=False, append_list=False, priority=0, access_key=None, secret_key=None, debug=False, request_kwargs=None, **get_item_kwargs):

212

"""

213

Modify metadata of an existing Archive.org item.

214

215

Args:

216

identifier (str): Item identifier

217

metadata (dict): Metadata changes to apply

218

target (str, optional): Target metadata section

219

append (bool): Append to existing metadata

220

append_list (bool): Append to metadata lists

221

priority (int): Task priority

222

Various authentication and request options...

223

224

Returns:

225

Request or Response: Metadata modification result

226

"""

227

```

228

229

[Metadata Operations](./metadata-operations.md)

230

231

### Task Management

232

233

Manage Archive.org catalog tasks including derive operations, item processing, and task monitoring.

234

235

```python { .api }

236

def get_tasks(identifier="", params=None, config=None, config_file=None, archive_session=None, http_adapter_kwargs=None, request_kwargs=None):

237

"""

238

Get tasks from the Archive.org catalog system.

239

240

Args:

241

identifier (str, optional): Filter tasks by item identifier

242

params (dict, optional): Additional task query parameters

243

Various session and request options...

244

245

Returns:

246

set: Set of CatalogTask objects

247

"""

248

```

249

250

[Task Management](./task-management.md)

251

252

### Configuration and Authentication

253

254

Configure the library with Archive.org credentials and retrieve user information.

255

256

```python { .api }

257

def configure(username="", password="", config_file="", host="archive.org"):

258

"""

259

Configure internetarchive with Archive.org credentials.

260

261

Args:

262

username (str): Archive.org username

263

password (str): Archive.org password

264

config_file (str): Path to config file

265

host (str): Archive.org host

266

267

Returns:

268

str: Path to configuration file

269

"""

270

271

def get_username(access_key, secret_key):

272

"""

273

Get Archive.org username from IA-S3 key pair.

274

275

Args:

276

access_key (str): IA-S3 access key

277

secret_key (str): IA-S3 secret key

278

279

Returns:

280

str: Archive.org username

281

"""

282

283

def get_user_info(access_key, secret_key):

284

"""

285

Get detailed user information from IA-S3 key pair.

286

287

Args:

288

access_key (str): IA-S3 access key

289

secret_key (str): IA-S3 secret key

290

291

Returns:

292

dict: User information dictionary

293

"""

294

```

295

296

[Configuration and Authentication](./configuration-auth.md)

297

298

### Account Management

299

300

Administrative functions for managing Archive.org user accounts. Requires administrative privileges.

301

302

**Note:** The Account class is not part of the main public API but can be imported directly from `internetarchive.account`.

303

304

```python { .api }

305

# Import required for Account class

306

from internetarchive.account import Account

307

308

class Account:

309

"""

310

Administrative interface for managing Archive.org user accounts.

311

312

Note: Requires administrative privileges.

313

"""

314

315

@classmethod

316

def from_account_lookup(cls, identifier_type: str, identifier: str, session=None):

317

"""

318

Factory method to get Account by identifier type and value.

319

320

Args:

321

identifier_type (str): Type of identifier ('email', 'screenname', 'itemname')

322

identifier (str): The identifier value (e.g., 'user@example.com')

323

session (ArchiveSession, optional): Session object to use

324

325

Returns:

326

Account: Account object with user information

327

328

Raises:

329

AccountAPIError: If account lookup fails or access denied

330

"""

331

332

def lock(self, comment: str):

333

"""Lock the account with a comment."""

334

335

def unlock(self, comment: str):

336

"""Unlock the account with a comment."""

337

338

def to_dict(self):

339

"""Convert account data to dictionary."""

340

```

341

342

[Account Management](./account-management.md)

343

344

### Command Line Interface

345

346

Comprehensive command-line tools accessible through the `ia` command for all major Archive.org operations.

347

348

```python { .api }

349

# CLI Commands (accessed via command line):

350

# ia configure - Configure credentials

351

# ia upload - Upload files to items

352

# ia download - Download files from items

353

# ia delete - Delete files from items

354

# ia metadata - View/modify item metadata

355

# ia search - Search Archive.org

356

# ia list - List item files

357

# ia tasks - Manage catalog tasks

358

# ia copy - Copy files between items

359

# ia move - Move files between items

360

# ia account - Account management

361

# ia reviews - Manage item reviews

362

# ia flag - Flag items for review

363

```

364

365

[Command Line Interface](./cli-interface.md)

366

367

## Types

368

369

```python { .api }

370

class ArchiveSession:

371

"""Main session class for Internet Archive operations."""

372

373

class Item:

374

"""Represents an Archive.org item."""

375

376

class Collection:

377

"""Represents an Archive.org collection (extends Item)."""

378

379

class File:

380

"""Represents a file within an Archive.org item."""

381

382

class Search:

383

"""Represents a search query and results."""

384

385

class Catalog:

386

"""Interface to Archive.org catalog/tasks system."""

387

388

class CatalogTask:

389

"""Represents a catalog task."""

390

391

class Account:

392

"""Account management interface (requires admin privileges)."""

393

394

# Package metadata

395

__version__: str

396

"""Current version of the internetarchive package (5.5.0)."""

397

398

# Exceptions

399

class AuthenticationError(Exception):

400

"""Authentication failed."""

401

402

class ItemLocateError(Exception):

403

"""Item cannot be located (dark or non-existent)."""

404

405

class InvalidChecksumError(Exception):

406

"""File corrupt, checksums don't match."""

407

408

class AccountAPIError(Exception):

409

"""Account API-related errors."""

410

```