A Python interface to archive.org for programmatic access to the Internet Archive's digital library
—
The Internet Archive Python library provides a comprehensive command-line interface through the ia command, offering access to all major Archive.org operations directly from the terminal.
Set up and manage Archive.org credentials.
# Configure credentials interactively
ia configure
# Configure with specific credentials
ia configure --username your-email@example.com --password your-password
# Configure with specific config file
ia configure --config-file /path/to/config.iniManage Archive.org items and their metadata.
# Upload files to create or update an item
ia upload my-item-id file1.pdf file2.txt --metadata='title:My Document Collection' --metadata='creator:Your Name'
# Upload with metadata file
ia upload my-item-id files/ --metadata=metadata.json
# Upload with specific options
ia upload my-item-id file.pdf --verify --checksum --queue-deriveDownload files from Archive.org items with extensive filtering options.
# Download all files from an item
ia download example-item
# Download specific files
ia download example-item file1.pdf file2.txt
# Download by format
ia download example-item --format=pdf --format=epub
# Download with pattern matching
ia download example-item --glob='*.txt'
# Download to specific directory
ia download example-item --destdir=./downloads
# Download with verification
ia download example-item --checksum --verify
# Dry run (show what would be downloaded)
ia download example-item --dry-runSearch Archive.org with advanced query options.
# Basic search
ia search 'collection:nasa'
# Search with field selection
ia search 'collection:movies' --fields=identifier,title,creator
# Search with sorting
ia search 'collection:books' --sort='downloads desc'
# Search with output formats
ia search 'mediatype:texts' --output-format=json
ia search 'mediatype:texts' --output-format=csv
# Full-text search
ia search 'artificial intelligence' --full-text
# Advanced search with parameters
ia search 'collection:opensource' --rows=100 --page=2List and delete files from items.
# List all files in an item
ia list example-item
# List with specific formats
ia list example-item --format=pdf
# List with glob pattern
ia list example-item --glob='*.txt'
# List with detailed information
ia list example-item --columns=name,size,format,md5
# Delete files
ia delete example-item file1.pdf file2.txt
# Delete by format
ia delete example-item --format=tmp
# Delete with pattern
ia delete example-item --glob='*_backup.*'
# Delete with cascade (derived files)
ia delete example-item file.pdf --cascadeView and modify item metadata.
# View item metadata
ia metadata example-item
# View specific metadata fields
ia metadata example-item --fields=title,creator,date
# Modify metadata
ia metadata example-item --modify='title:New Title' --modify='creator:New Author'
# Append to metadata
ia metadata example-item --append='subject:new-keyword'
# Modify metadata from file
ia metadata example-item --modify=metadata.json
# Target specific metadata section
ia metadata example-item --target=files/document.pdf --modify='title:Chapter Title'Manage Archive.org catalog tasks.
# View tasks for an item
ia tasks example-item
# View all user tasks
ia tasks --submitter=username
# View task summary
ia tasks example-item --summary
# View completed tasks
ia tasks example-item --history
# View queued/running tasks
ia tasks example-item --catalog
# Get task log
ia tasks --task-id=12345 --logCopy or move files between items.
# Copy files between items
ia copy source-item target-item file1.pdf file2.txt
# Copy all files
ia copy source-item target-item
# Copy with metadata
ia copy source-item target-item --metadata='title:Copied Item'
# Move files between items
ia move source-item target-item file1.pdf
# Move with metadata update
ia move source-item target-item --metadata='collection:new-collection'Manage Archive.org user accounts (requires admin privileges).
# View account information
ia account info username
# Lock account
ia account lock username --comment='Policy violation'
# Unlock account
ia account unlock username --comment='Issue resolved'Manage item reviews.
# View item reviews
ia reviews example-item
# Submit item review
ia reviews example-item --review='Excellent content' --stars=5
# Moderate reviews (requires privileges)
ia reviews example-item --moderate --approve=review-idFlag items for administrative review and content moderation.
# Flag an item with reason
ia flag example-item --reason='Copyright concern'
ia flag example-item --reason='Inappropriate content'
ia flag example-item --reason='Spam'
# View flags for an item
ia flag example-item --list
# Remove a flag (requires privileges)
ia flag example-item --unflag --reason='Issue resolved'
# Flag with detailed comment
ia flag example-item --reason='Copyright violation' --comment='DMCA request received'Manage simple lists within collections for organizing items.
# Add items to a simple list
ia simplelists collection-name/list-name --add=item1,item2,item3
# Remove items from a simple list
ia simplelists collection-name/list-name --remove=item1,item2
# View items in a simple list
ia simplelists collection-name/list-name --view
# Create a new simple list
ia simplelists collection-name/new-list --create --add=initial-item
# List all simple lists in a collection
ia simplelists collection-name --list-all
# Clear all items from a simple list
ia simplelists collection-name/list-name --clear
# Copy items from one list to another
ia simplelists source-collection/source-list --copy-to=target-collection/target-listOptions available for most commands:
# Configuration
--config-file PATH # Use specific config file
--access-key KEY # Override access key
--secret-key KEY # Override secret key
# Output control
--verbose # Enable verbose output
--quiet # Suppress output
--debug # Enable debug logging
--no-color # Disable colored output
# Format options
--output-format FORMAT # Output format (json, csv, yaml)
--columns COLUMNS # Specify output columns--metadata KEY:VALUE # Set metadata field
--header KEY:VALUE # Set HTTP header
--verify # Verify upload integrity
--checksum # Calculate checksums
--queue-derive # Queue derive task after upload
--delete # Delete local files after upload
--retries N # Number of retry attempts
--size-hint N # Expected upload size
--no-derive # Skip derive task
--spreadsheet # Upload spreadsheet data
--file-metadata # Include file-level metadata
--status-check # Check upload status--format FORMAT # Download specific formats
--glob PATTERN # Download files matching pattern
--exclude PATTERN # Exclude files matching pattern
--destdir PATH # Destination directory
--no-directory # Don't create item directory
--dry-run # Show what would be downloaded
--checksum # Verify checksums
--ignore-existing # Re-download existing files
--on-the-fly # Include on-the-fly files
--timeout N # Request timeout--fields FIELDS # Comma-separated field list
--sort CRITERIA # Sort criteria
--rows N # Number of results per page
--page N # Page number
--full-text # Enable full-text search
--dsl-fts # Enable DSL full-text search
--output-format FORMAT # Output format
--itemlist # Output as item list--reason REASON # Reason for flagging
--comment COMMENT # Additional comment for flag
--list # List existing flags
--unflag # Remove a flag--add ITEMS # Add comma-separated items to list
--remove ITEMS # Remove comma-separated items from list
--view # View items in list
--create # Create new list
--list-all # List all lists in collection
--clear # Clear all items from list
--copy-to TARGET # Copy items to target list# Create comprehensive item upload
ia upload my-research-2024 \
paper.pdf slides.pptx data.csv \
--metadata='title:My Research Project 2024' \
--metadata='creator:Dr. Jane Smith' \
--metadata='description:Research findings on climate change' \
--metadata='subject:climate change' \
--metadata='subject:research' \
--metadata='date:2024-01-15' \
--metadata='collection:opensource' \
--verify --checksum --queue-derive# Download all PDFs from NASA collection items
ia search 'collection:nasa AND mediatype:texts' --fields=identifier |\
while read identifier; do
echo "Downloading PDFs from $identifier"
ia download "$identifier" --format=pdf --destdir=./nasa-pdfs/
done# Bulk metadata update
echo "item1\nitem2\nitem3" | while read item; do
ia metadata "$item" \
--modify='subject:updated-2024' \
--modify='contributor:Metadata Team' \
--append
done# Generate CSV report of collection items
ia search 'collection:mydata' \
--fields=identifier,title,creator,date,downloads \
--sort='downloads desc' \
--rows=1000 \
--output-format=csv > collection_report.csv# Monitor derive tasks for an item
while true; do
echo "$(date): Checking tasks for my-item"
ia tasks my-item --summary
# Check if no tasks pending
if ia tasks my-item --summary | grep -q "queued.*0.*running.*0"; then
echo "All tasks completed!"
break
fi
sleep 30
done#!/bin/bash
# Content moderation workflow
# Flag suspicious items
for item in $(ia search 'uploader:suspicious-user' --fields=identifier); do
echo "Reviewing item: $item"
# Check item metadata
ia metadata "$item" --fields=title,description,creator
# Flag for review if needed
read -p "Flag this item? (y/n): " flag_choice
if [ "$flag_choice" = "y" ]; then
read -p "Enter reason: " reason
ia flag "$item" --reason="$reason"
echo "Flagged $item for review"
fi
done#!/bin/bash
# Manage featured items in a collection
COLLECTION="my-collection"
FEATURED_LIST="featured-items"
# Add new featured items
ia simplelists "$COLLECTION/$FEATURED_LIST" --add=item1,item2,item3
# View current featured items
echo "Current featured items:"
ia simplelists "$COLLECTION/$FEATURED_LIST" --view
# Remove outdated items
ia simplelists "$COLLECTION/$FEATURED_LIST" --remove=old-item1,old-item2
# Create a backup list
ia simplelists "$COLLECTION/featured-backup" --create
ia simplelists "$COLLECTION/$FEATURED_LIST" --copy-to="$COLLECTION/featured-backup"The ia command looks for configuration in these locations:
~/.config/internetarchive/ia.ini (Linux/macOS)~/.ia (legacy location)--config-fileSupported environment variables:
export IA_CONFIG_FILE=/path/to/config.ini
export IA_ACCESS_KEY=your-access-key
export IA_SECRET_KEY=your-secret-key# Add tab completion (bash)
eval "$(ia --bash-completion)"
# Add to .bashrc for permanent completion
echo 'eval "$(ia --bash-completion)"' >> ~/.bashrc# Enable debug output
ia --debug download example-item
# Check authentication
ia configure --check
# Test connection
ia metadata archive.org
# Validate identifiers
ia upload test-item file.txt --dry-run# Enable detailed logging
ia --verbose --debug upload my-item files/ 2>&1 | tee upload.log
# Monitor long-running operations
ia download large-item --verbose | while read line; do
echo "$(date): $line"
doneInstall with Tessl CLI
npx tessl i tessl/pypi-internetarchive