CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-internetarchive

A Python interface to archive.org for programmatic access to the Internet Archive's digital library

Pending
Overview
Eval results
Files

cli-interface.mddocs/

Command Line Interface

The Internet Archive Python library provides a comprehensive command-line interface through the ia command, offering access to all major Archive.org operations directly from the terminal.

Core Commands

Configuration

Set up and manage Archive.org credentials.

# Configure credentials interactively
ia configure

# Configure with specific credentials
ia configure --username your-email@example.com --password your-password

# Configure with specific config file
ia configure --config-file /path/to/config.ini

Item Operations

Manage Archive.org items and their metadata.

# Upload files to create or update an item
ia upload my-item-id file1.pdf file2.txt --metadata='title:My Document Collection' --metadata='creator:Your Name'

# Upload with metadata file
ia upload my-item-id files/ --metadata=metadata.json

# Upload with specific options
ia upload my-item-id file.pdf --verify --checksum --queue-derive

Download Operations

Download files from Archive.org items with extensive filtering options.

# Download all files from an item
ia download example-item

# Download specific files
ia download example-item file1.pdf file2.txt

# Download by format
ia download example-item --format=pdf --format=epub

# Download with pattern matching
ia download example-item --glob='*.txt'

# Download to specific directory
ia download example-item --destdir=./downloads

# Download with verification
ia download example-item --checksum --verify

# Dry run (show what would be downloaded)
ia download example-item --dry-run

Search Operations

Search Archive.org with advanced query options.

# Basic search
ia search 'collection:nasa'

# Search with field selection
ia search 'collection:movies' --fields=identifier,title,creator

# Search with sorting
ia search 'collection:books' --sort='downloads desc'

# Search with output formats
ia search 'mediatype:texts' --output-format=json
ia search 'mediatype:texts' --output-format=csv

# Full-text search
ia search 'artificial intelligence' --full-text

# Advanced search with parameters
ia search 'collection:opensource' --rows=100 --page=2

File Management

List and delete files from items.

# List all files in an item
ia list example-item

# List with specific formats
ia list example-item --format=pdf

# List with glob pattern
ia list example-item --glob='*.txt'

# List with detailed information
ia list example-item --columns=name,size,format,md5

# Delete files
ia delete example-item file1.pdf file2.txt

# Delete by format
ia delete example-item --format=tmp

# Delete with pattern
ia delete example-item --glob='*_backup.*'

# Delete with cascade (derived files)
ia delete example-item file.pdf --cascade

Metadata Operations

View and modify item metadata.

# View item metadata
ia metadata example-item

# View specific metadata fields
ia metadata example-item --fields=title,creator,date

# Modify metadata
ia metadata example-item --modify='title:New Title' --modify='creator:New Author'

# Append to metadata
ia metadata example-item --append='subject:new-keyword'

# Modify metadata from file
ia metadata example-item --modify=metadata.json

# Target specific metadata section
ia metadata example-item --target=files/document.pdf --modify='title:Chapter Title'

Task Management

Manage Archive.org catalog tasks.

# View tasks for an item
ia tasks example-item

# View all user tasks
ia tasks --submitter=username

# View task summary
ia tasks example-item --summary

# View completed tasks
ia tasks example-item --history

# View queued/running tasks
ia tasks example-item --catalog

# Get task log
ia tasks --task-id=12345 --log

Copy and Move Operations

Copy or move files between items.

# Copy files between items
ia copy source-item target-item file1.pdf file2.txt

# Copy all files
ia copy source-item target-item

# Copy with metadata
ia copy source-item target-item --metadata='title:Copied Item'

# Move files between items
ia move source-item target-item file1.pdf

# Move with metadata update
ia move source-item target-item --metadata='collection:new-collection'

Account Management

Manage Archive.org user accounts (requires admin privileges).

# View account information
ia account info username

# Lock account
ia account lock username --comment='Policy violation'

# Unlock account
ia account unlock username --comment='Issue resolved'

Review Management

Manage item reviews.

# View item reviews
ia reviews example-item

# Submit item review
ia reviews example-item --review='Excellent content' --stars=5

# Moderate reviews (requires privileges)
ia reviews example-item --moderate --approve=review-id

Flag Management

Flag items for administrative review and content moderation.

# Flag an item with reason
ia flag example-item --reason='Copyright concern'
ia flag example-item --reason='Inappropriate content'
ia flag example-item --reason='Spam'

# View flags for an item
ia flag example-item --list

# Remove a flag (requires privileges)
ia flag example-item --unflag --reason='Issue resolved'

# Flag with detailed comment
ia flag example-item --reason='Copyright violation' --comment='DMCA request received'

Simple Lists Management

Manage simple lists within collections for organizing items.

# Add items to a simple list
ia simplelists collection-name/list-name --add=item1,item2,item3

# Remove items from a simple list
ia simplelists collection-name/list-name --remove=item1,item2

# View items in a simple list
ia simplelists collection-name/list-name --view

# Create a new simple list
ia simplelists collection-name/new-list --create --add=initial-item

# List all simple lists in a collection
ia simplelists collection-name --list-all

# Clear all items from a simple list
ia simplelists collection-name/list-name --clear

# Copy items from one list to another
ia simplelists source-collection/source-list --copy-to=target-collection/target-list

Command Options and Parameters

Global Options

Options available for most commands:

# Configuration
--config-file PATH          # Use specific config file
--access-key KEY            # Override access key
--secret-key KEY            # Override secret key

# Output control
--verbose                   # Enable verbose output
--quiet                     # Suppress output
--debug                     # Enable debug logging
--no-color                  # Disable colored output

# Format options
--output-format FORMAT      # Output format (json, csv, yaml)
--columns COLUMNS           # Specify output columns

Upload-Specific Options

--metadata KEY:VALUE        # Set metadata field
--header KEY:VALUE          # Set HTTP header
--verify                    # Verify upload integrity
--checksum                  # Calculate checksums
--queue-derive              # Queue derive task after upload
--delete                    # Delete local files after upload
--retries N                 # Number of retry attempts
--size-hint N               # Expected upload size
--no-derive                 # Skip derive task
--spreadsheet               # Upload spreadsheet data
--file-metadata             # Include file-level metadata
--status-check              # Check upload status

Download-Specific Options

--format FORMAT             # Download specific formats
--glob PATTERN              # Download files matching pattern
--exclude PATTERN           # Exclude files matching pattern
--destdir PATH              # Destination directory
--no-directory              # Don't create item directory
--dry-run                   # Show what would be downloaded
--checksum                  # Verify checksums
--ignore-existing           # Re-download existing files
--on-the-fly                # Include on-the-fly files
--timeout N                 # Request timeout

Search-Specific Options

--fields FIELDS             # Comma-separated field list
--sort CRITERIA             # Sort criteria
--rows N                    # Number of results per page
--page N                    # Page number
--full-text                 # Enable full-text search
--dsl-fts                   # Enable DSL full-text search
--output-format FORMAT      # Output format
--itemlist                  # Output as item list

Flag-Specific Options

--reason REASON             # Reason for flagging
--comment COMMENT           # Additional comment for flag
--list                      # List existing flags
--unflag                    # Remove a flag

Simplelists-Specific Options

--add ITEMS                 # Add comma-separated items to list
--remove ITEMS              # Remove comma-separated items from list
--view                      # View items in list
--create                    # Create new list
--list-all                  # List all lists in collection
--clear                     # Clear all items from list
--copy-to TARGET            # Copy items to target list

Usage Examples

Complete Upload Workflow

# Create comprehensive item upload
ia upload my-research-2024 \
    paper.pdf slides.pptx data.csv \
    --metadata='title:My Research Project 2024' \
    --metadata='creator:Dr. Jane Smith' \
    --metadata='description:Research findings on climate change' \
    --metadata='subject:climate change' \
    --metadata='subject:research' \
    --metadata='date:2024-01-15' \
    --metadata='collection:opensource' \
    --verify --checksum --queue-derive

Batch Download Operations

# Download all PDFs from NASA collection items
ia search 'collection:nasa AND mediatype:texts' --fields=identifier |\
while read identifier; do
    echo "Downloading PDFs from $identifier"
    ia download "$identifier" --format=pdf --destdir=./nasa-pdfs/
done

Metadata Management

# Bulk metadata update
echo "item1\nitem2\nitem3" | while read item; do
    ia metadata "$item" \
        --modify='subject:updated-2024' \
        --modify='contributor:Metadata Team' \
        --append
done

Search and Analysis

# Generate CSV report of collection items
ia search 'collection:mydata' \
    --fields=identifier,title,creator,date,downloads \
    --sort='downloads desc' \
    --rows=1000 \
    --output-format=csv > collection_report.csv

Task Monitoring

# Monitor derive tasks for an item
while true; do
    echo "$(date): Checking tasks for my-item"
    ia tasks my-item --summary
    
    # Check if no tasks pending
    if ia tasks my-item --summary | grep -q "queued.*0.*running.*0"; then
        echo "All tasks completed!"
        break
    fi
    
    sleep 30
done

Content Moderation Workflow

#!/bin/bash
# Content moderation workflow

# Flag suspicious items
for item in $(ia search 'uploader:suspicious-user' --fields=identifier); do
    echo "Reviewing item: $item"
    
    # Check item metadata
    ia metadata "$item" --fields=title,description,creator
    
    # Flag for review if needed
    read -p "Flag this item? (y/n): " flag_choice
    if [ "$flag_choice" = "y" ]; then
        read -p "Enter reason: " reason
        ia flag "$item" --reason="$reason"
        echo "Flagged $item for review"
    fi
done

Collection Management

#!/bin/bash
# Manage featured items in a collection

COLLECTION="my-collection"
FEATURED_LIST="featured-items"

# Add new featured items
ia simplelists "$COLLECTION/$FEATURED_LIST" --add=item1,item2,item3

# View current featured items
echo "Current featured items:"
ia simplelists "$COLLECTION/$FEATURED_LIST" --view

# Remove outdated items
ia simplelists "$COLLECTION/$FEATURED_LIST" --remove=old-item1,old-item2

# Create a backup list
ia simplelists "$COLLECTION/featured-backup" --create
ia simplelists "$COLLECTION/$FEATURED_LIST" --copy-to="$COLLECTION/featured-backup"

Configuration and Environment

Configuration File Locations

The ia command looks for configuration in these locations:

  1. ~/.config/internetarchive/ia.ini (Linux/macOS)
  2. ~/.ia (legacy location)
  3. Path specified with --config-file

Environment Variables

Supported environment variables:

export IA_CONFIG_FILE=/path/to/config.ini
export IA_ACCESS_KEY=your-access-key
export IA_SECRET_KEY=your-secret-key

Shell Integration

# Add tab completion (bash)
eval "$(ia --bash-completion)"

# Add to .bashrc for permanent completion
echo 'eval "$(ia --bash-completion)"' >> ~/.bashrc

Error Handling and Debugging

Common Error Resolution

# Enable debug output
ia --debug download example-item

# Check authentication
ia configure --check

# Test connection
ia metadata archive.org

# Validate identifiers
ia upload test-item file.txt --dry-run

Logging and Monitoring

# Enable detailed logging
ia --verbose --debug upload my-item files/ 2>&1 | tee upload.log

# Monitor long-running operations
ia download large-item --verbose | while read line; do
    echo "$(date): $line"
done

Install with Tessl CLI

npx tessl i tessl/pypi-internetarchive

docs

account-management.md

cli-interface.md

configuration-auth.md

file-management.md

index.md

item-operations.md

metadata-operations.md

search-operations.md

session-management.md

task-management.md

tile.json