0
# Task Management\n\nTask management operations provide access to Archive.org's catalog system for monitoring and managing background tasks like derive operations, item processing, and maintenance tasks.\n\n## Capabilities\n\n### Task Retrieval\n\nRetrieve tasks from the Archive.org catalog system with filtering options.\n\n```python { .api }\ndef get_tasks(identifier=\"\", params=None, config=None, config_file=None, archive_session=None, http_adapter_kwargs=None, request_kwargs=None):\n \"\"\"\n Get tasks from the Archive.org catalog system.\n \n Args:\n identifier (str, optional): Filter tasks by item identifier\n params (dict, optional): Additional task query parameters:\n - 'catalog': bool, include queued/running tasks\n - 'history': bool, include completed tasks \n - 'summary': bool, return task count summary\n - 'submitter': str, filter by task submitter\n - 'cmd': str, filter by command type\n config (dict, optional): Configuration for new session\n config_file (str, optional): Config file for new session\n archive_session (ArchiveSession, optional): Existing session to use\n http_adapter_kwargs (dict, optional): HTTP adapter arguments\n request_kwargs (dict, optional): Additional request arguments\n \n Returns:\n set: Set of CatalogTask objects\n \n Raises:\n AuthenticationError: If authentication is required but not provided\n requests.RequestException: If API request fails\n \"\"\"\n\nclass CatalogTask:\n \"\"\"\n Represents a catalog task in the Archive.org system.\n \"\"\"\n \n def __init__(self, archive_session, task_dict):\n \"\"\"\n Initialize CatalogTask object.\n \n Args:\n archive_session (ArchiveSession): Session object\n task_dict (dict): Task data from API\n \"\"\"\n```\n\n### Task Properties\n\nAccess task information and status.\n\n```python { .api }\nclass CatalogTask:\n @property\n def task_id(self):\n \"\"\"int: Unique task identifier.\"\"\"\n \n @property\n def identifier(self):\n \"\"\"str: Item identifier associated with this task.\"\"\"\n \n @property\n def server(self):\n \"\"\"str: Server processing this task.\"\"\"\n \n @property\n def cmd(self):\n \"\"\"str: Task command (e.g., 'derive.php', 'fixer.php').\"\"\"\n \n @property\n def args(self):\n \"\"\"dict: Command arguments and parameters.\"\"\"\n \n @property\n def submitter(self):\n \"\"\"str: Username of task submitter.\"\"\"\n \n @property\n def priority(self):\n \"\"\"int: Task priority (-5 to 10, higher = more priority).\"\"\"\n \n @property\n def submittime(self):\n \"\"\"str: ISO timestamp when task was submitted.\"\"\"\n \n @property\n def color(self):\n \"\"\"str: Task status color (green=completed, red=failed, blue=running, etc.).\"\"\"\n \n @property\n def category(self):\n \"\"\"str: Task category.\"\"\"\n```\n\n### Task Log Access\n\nRetrieve detailed logs for specific tasks.\n\n```python { .api }\nclass CatalogTask:\n @staticmethod\n def get_task_log(task_id, archive_session, request_kwargs=None):\n \"\"\"\n Get log output for a specific task.\n \n Args:\n task_id (int): Task ID to get log for\n archive_session (ArchiveSession): Session object for authentication\n request_kwargs (dict, optional): Additional request arguments\n \n Returns:\n str: Task log content\n \n Raises:\n AuthenticationError: If authentication fails\n ValueError: If task_id is invalid\n \"\"\"\n```\n\n### Catalog Interface\n\nDirect interface to the catalog system for advanced operations.\n\n```python { .api }\nclass Catalog:\n \"\"\"\n Interface to Archive.org catalog/tasks system.\n \"\"\"\n \n def __init__(self, archive_session, request_kwargs=None):\n \"\"\"\n Initialize Catalog interface.\n \n Args:\n archive_session (ArchiveSession): Session object\n request_kwargs (dict, optional): Default request arguments\n \"\"\"\n \n @property\n def session(self):\n \"\"\"ArchiveSession: Session object used by this catalog.\"\"\"\n \n @property\n def auth(self):\n \"\"\"S3Auth: Authentication object if available.\"\"\"\n \n @property\n def url(self):\n \"\"\"str: Tasks API base URL.\"\"\"\n```\n\n### Catalog Operations\n\nPerform various catalog operations through the Catalog interface.\n\n```python { .api }\nclass Catalog:\n def get_summary(self, identifier=\"\", params=None):\n \"\"\"\n Get task count summary by status.\n \n Args:\n identifier (str, optional): Filter by item identifier\n params (dict, optional): Additional query parameters\n \n Returns:\n dict: Task counts by status with keys:\n - 'queued': Number of queued tasks\n - 'running': Number of running tasks\n - 'finished': Number of finished tasks\n - 'failed': Number of failed tasks\n \"\"\"\n \n def make_tasks_request(self, params):\n \"\"\"\n Make a request to the Tasks API.\n \n Args:\n params (dict): Query parameters for the request\n \n Returns:\n Response: HTTP response from Tasks API\n \"\"\"\n \n def iter_tasks(self, params=None):\n \"\"\"\n Iterate over tasks with optional filtering.\n \n Args:\n params (dict, optional): Query parameters:\n - 'identifier': str, filter by item\n - 'submitter': str, filter by submitter\n - 'cmd': str, filter by command\n - 'catalog': bool, include queued/running\n - 'history': bool, include completed\n \n Yields:\n CatalogTask: Task objects matching criteria\n \"\"\"\n \n def get_tasks(self, identifier=\"\", params=None):\n \"\"\"\n Get list of tasks with filtering.\n \n Args:\n identifier (str, optional): Filter by item identifier\n params (dict, optional): Query parameters\n \n Returns:\n list: List of CatalogTask objects\n \"\"\"\n \n def submit_task(self, identifier, cmd, comment=\"\", priority=0, data=None, headers=None):\n \"\"\"\n Submit a new task to the catalog system.\n \n Args:\n identifier (str): Item identifier for the task\n cmd (str): Task command to execute:\n - 'derive.php': Generate derived files\n - 'fixer.php': Fix item issues\n - 'make_dark.php': Make item dark\n - 'make_undark.php': Make item public\n comment (str): Task comment/description\n priority (int): Task priority (-5 to 10)\n data (dict, optional): Additional task data\n headers (dict, optional): Additional HTTP headers\n \n Returns:\n Response: Task submission response\n \n Raises:\n AuthenticationError: If authentication fails\n \"\"\"\n \n def get_rate_limit(self, cmd=\"derive.php\"):\n \"\"\"\n Get rate limit information for task commands.\n \n Args:\n cmd (str): Command to check rate limit for\n \n Returns:\n dict: Rate limit information with available quota\n \"\"\"\n```\n\n### Session Task Operations\n\nTask operations available through ArchiveSession.\n\n```python { .api }\nclass ArchiveSession:\n def submit_task(self, identifier, cmd, comment=\"\", priority=0, data=None, headers=None, reduced_priority=False, request_kwargs=None):\n \"\"\"\n Submit a task to Archive.org catalog system.\n \n Args:\n identifier (str): Item identifier\n cmd (str): Task command\n comment (str): Task comment\n priority (int): Task priority\n data (dict, optional): Additional task data\n headers (dict, optional): HTTP headers\n reduced_priority (bool): Use reduced priority queue\n request_kwargs (dict, optional): Request arguments\n \n Returns:\n Response: Task submission response\n \"\"\"\n \n def get_tasks(self, identifier=\"\", params=None, request_kwargs=None):\n \"\"\"\n Get tasks using this session.\n \n Returns:\n set: Set of CatalogTask objects\n \"\"\"\n \n def get_my_catalog(self, params=None, request_kwargs=None):\n \"\"\"\n Get current user's queued and running tasks.\n \n Returns:\n set: Set of user's CatalogTask objects\n \"\"\"\n \n def get_task_log(self, task_id, request_kwargs=None):\n \"\"\"\n Get log for specific task.\n \n Returns:\n str: Task log content\n \"\"\"\n \n def iter_history(self, identifier=None, params=None, request_kwargs=None):\n \"\"\"\n Iterate over completed tasks.\n \n Yields:\n CatalogTask: Completed task objects\n \"\"\"\n \n def iter_catalog(self, identifier=None, params=None, request_kwargs=None):\n \"\"\"\n Iterate over queued/running tasks.\n \n Yields:\n CatalogTask: Queued/running task objects\n \"\"\"\n \n def get_tasks_summary(self, identifier=\"\", params=None, request_kwargs=None):\n \"\"\"\n Get task count summary.\n \n Returns:\n dict: Task counts by status\n \"\"\"\n```\n\n## Usage Examples\n\n### Basic Task Retrieval\n\n```python\nimport internetarchive\n\n# Get all tasks for an item\ntasks = internetarchive.get_tasks('example-item')\n\nprint(f\"Found {len(tasks)} tasks\")\nfor task in tasks:\n print(f\"Task {task.task_id}: {task.cmd} ({task.color})\")\n print(f\" Submitted: {task.submittime}\")\n print(f\" Priority: {task.priority}\")\n print(f\" Server: {task.server}\")\n print(\"---\")\n```\n\n### Task Filtering\n\n```python\nimport internetarchive\n\n# Get only completed tasks\ncompleted_tasks = internetarchive.get_tasks(\n 'example-item',\n params={'history': True}\n)\n\n# Get only running/queued tasks\nactive_tasks = internetarchive.get_tasks(\n 'example-item', \n params={'catalog': True}\n)\n\n# Get tasks by specific submitter\nuser_tasks = internetarchive.get_tasks(\n params={'submitter': 'username'}\n)\n```\n\n### Task Status Monitoring\n\n```python\nimport internetarchive\nimport time\n\n# Monitor task progress\nitem = internetarchive.get_item('example-item')\n\n# Submit a derive task\nresponse = item.derive(priority=5)\nprint(f\"Submitted derive task\")\n\n# Monitor until completion\nwhile True:\n # Check if any tasks are pending\n if item.no_tasks_pending():\n print(\"All tasks completed!\")\n break\n \n # Get task summary\n summary = item.get_task_summary()\n print(f\"Tasks - Queued: {summary.get('queued', 0)}, Running: {summary.get('running', 0)}\")\n \n time.sleep(30) # Wait 30 seconds\n```\n\n### Task Log Analysis\n\n```python\nimport internetarchive\n\n# Get session for authentication\nsession = internetarchive.get_session()\n\n# Get recent tasks\ntasks = session.get_tasks('example-item', params={'history': True})\n\n# Analyze failed tasks\nfailed_tasks = [task for task in tasks if task.color == 'red']\n\nprint(f\"Found {len(failed_tasks)} failed tasks\")\n\nfor task in failed_tasks[:5]: # Check first 5 failed tasks\n print(f\"\\nFailed Task {task.task_id} ({task.cmd}):\")\n \n # Get task log\n try:\n log = session.get_task_log(task.task_id)\n print(f\"Log excerpt: {log[-500:]}\") # Last 500 characters\n except Exception as e:\n print(f\"Could not retrieve log: {e}\")\n```\n\n### Bulk Task Operations\n\n```python\nimport internetarchive\n\n# Submit derive tasks for multiple items\nitems_to_derive = ['item1', 'item2', 'item3']\nsession = internetarchive.get_session()\n\nfor identifier in items_to_derive:\n try:\n response = session.submit_task(\n identifier,\n 'derive.php',\n comment='Bulk derive operation',\n priority=2\n )\n print(f\"Submitted derive task for {identifier}\")\n except Exception as e:\n print(f\"Failed to submit task for {identifier}: {e}\")\n```\n\n### Task Statistics\n\n```python\nimport internetarchive\nfrom collections import Counter\n\n# Analyze task patterns\nsession = internetarchive.get_session()\ntasks = session.get_tasks(params={'history': True})\n\n# Count by command type\ncmd_counts = Counter(task.cmd for task in tasks)\nprint(\"Tasks by command:\")\nfor cmd, count in cmd_counts.most_common():\n print(f\" {cmd}: {count}\")\n\n# Count by status\nstatus_counts = Counter(task.color for task in tasks)\nprint(\"\\nTasks by status:\")\nfor status, count in status_counts.most_common():\n print(f\" {status}: {count}\")\n\n# Average priority\nif tasks:\n avg_priority = sum(task.priority for task in tasks) / len(tasks)\n print(f\"\\nAverage task priority: {avg_priority:.2f}\")\n```\n\n### Rate Limit Management\n\n```python\nimport internetarchive\n\n# Check rate limits before submitting tasks\nsession = internetarchive.get_session()\ncatalog = internetarchive.Catalog(session)\n\n# Check derive task rate limit\nrate_limit = catalog.get_rate_limit('derive.php')\nprint(f\"Derive tasks available: {rate_limit}\")\n\nif rate_limit.get('available', 0) > 0:\n # Submit task if quota available\n session.submit_task('my-item', 'derive.php')\n print(\"Task submitted\")\nelse:\n print(\"Rate limit exceeded, waiting...\")\n```"}]