0
# Metadata Operations
1
2
Metadata operations enable viewing and modifying Archive.org item metadata with support for various update strategies including appending, targeting specific sections, and batch operations.
3
4
## Capabilities
5
6
### Metadata Modification
7
8
Modify item metadata with flexible update strategies and priority control.
9
10
```python { .api }
11
def modify_metadata(identifier, metadata, target=None, append=False, append_list=False, priority=0, access_key=None, secret_key=None, debug=False, request_kwargs=None, **get_item_kwargs):
12
"""
13
Modify metadata of an existing Archive.org item.
14
15
Args:
16
identifier (str): Item identifier
17
metadata (dict): Metadata changes to apply with keys:
18
- Standard fields: 'title', 'creator', 'description', 'date', 'subject'
19
- Collection: 'collection' (str or list of collection identifiers)
20
- Custom fields: Any valid metadata field name
21
target (str, optional): Target specific metadata section:
22
- 'metadata' (default): Modify main metadata
23
- 'collection': Modify collection membership only
24
- 'files/<filename>': Modify specific file metadata
25
append (bool): Append values to existing metadata fields instead of replacing
26
append_list (bool): Append to metadata list fields (like 'subject')
27
priority (int): Task priority for metadata update (-5 to 10, higher = more priority)
28
access_key (str, optional): IA-S3 access key (overrides config)
29
secret_key (str, optional): IA-S3 secret key (overrides config)
30
debug (bool): Enable debug logging
31
request_kwargs (dict, optional): Additional request arguments
32
**get_item_kwargs: Additional arguments passed to get_item
33
34
Returns:
35
Request or Response: Metadata modification result
36
37
Raises:
38
AuthenticationError: If authentication fails
39
ItemLocateError: If item cannot be located
40
ValueError: If metadata format is invalid
41
"""
42
```
43
44
### Item Metadata Access
45
46
Access and refresh item metadata through Item objects.
47
48
```python { .api }
49
class Item:
50
def modify_metadata(self, metadata, target=None, append=False, append_list=False, priority=0, access_key=None, secret_key=None, debug=False, request_kwargs=None):
51
"""
52
Modify metadata of this item using the same parameters as the module function.
53
54
Returns:
55
Request or Response: Metadata modification result
56
"""
57
58
def refresh(self, item_metadata=None, **kwargs):
59
"""
60
Refresh item metadata from Archive.org.
61
62
Args:
63
item_metadata (dict, optional): Use specific metadata instead of fetching
64
**kwargs: Additional arguments passed to get_metadata
65
66
Note:
67
Updates the item's metadata property with fresh data from Archive.org
68
"""
69
70
@property
71
def metadata(self):
72
"""
73
dict: Complete item metadata dictionary containing:
74
- 'identifier': Item identifier
75
- 'title': Item title
76
- 'creator': Creator/author information
77
- 'description': Item description
78
- 'date': Creation/publication date
79
- 'subject': Subject tags/keywords
80
- 'collection': Collections this item belongs to
81
- 'mediatype': Media type (texts, movies, audio, etc.)
82
- 'uploader': User who uploaded the item
83
- 'addeddate': When item was added to Archive.org
84
- 'publicdate': When item became publicly available
85
- And many other Archive.org specific fields
86
"""
87
```
88
89
### Session Metadata Operations
90
91
Retrieve metadata through ArchiveSession objects.
92
93
```python { .api }
94
class ArchiveSession:
95
def get_metadata(self, identifier, request_kwargs=None):
96
"""
97
Get item metadata from Archive.org API.
98
99
Args:
100
identifier (str): Archive.org item identifier
101
request_kwargs (dict, optional): Additional request arguments:
102
- 'timeout': Request timeout in seconds
103
- 'headers': Additional HTTP headers
104
105
Returns:
106
dict: Item metadata dictionary from API
107
108
Raises:
109
ItemLocateError: If item cannot be located
110
requests.RequestException: If API request fails
111
"""
112
```
113
114
## Metadata Field Reference
115
116
### Standard Metadata Fields
117
118
Common metadata fields supported by Archive.org:
119
120
```python { .api }
121
# Core descriptive fields
122
metadata_fields = {
123
'title': str, # Item title
124
'creator': str, # Creator/author name
125
'description': str, # Item description
126
'date': str, # Creation/publication date (YYYY-MM-DD)
127
'subject': list, # Subject tags/keywords
128
'collection': list, # Collection identifiers
129
'mediatype': str, # Media type (texts, movies, audio, etc.)
130
'language': str, # Language code (eng, fra, etc.)
131
'publisher': str, # Publisher name
132
'contributor': str, # Contributors
133
'coverage': str, # Geographic/temporal coverage
134
'rights': str, # Rights/license information
135
'source': str, # Source information
136
'relation': str, # Related items
137
'format': str, # Physical format
138
'type': str, # Resource type
139
}
140
141
# Archive.org specific fields
142
archive_fields = {
143
'identifier': str, # Unique item identifier (read-only)
144
'uploader': str, # Username of uploader (read-only)
145
'addeddate': str, # Date added to archive (read-only)
146
'publicdate': str, # Date made public (read-only)
147
'updatedate': str, # Last update date (read-only)
148
'scanner': str, # Scanning equipment used
149
'sponsor': str, # Digitization sponsor
150
'contributor': str, # Additional contributors
151
'call_number': str, # Library call number
152
'isbn': str, # ISBN for books
153
'oclc': str, # OCLC number
154
'lccn': str, # Library of Congress Control Number
155
}
156
```
157
158
## Usage Examples
159
160
### Basic Metadata Modification
161
162
```python
163
import internetarchive
164
165
# Update basic metadata
166
internetarchive.modify_metadata(
167
'my-item-identifier',
168
metadata={
169
'title': 'Updated Title',
170
'creator': 'New Author Name',
171
'description': 'Updated description of the item',
172
'subject': ['keyword1', 'keyword2', 'new-topic']
173
}
174
)
175
```
176
177
### Append to Existing Metadata
178
179
```python
180
import internetarchive
181
182
# Append to existing subjects without replacing
183
internetarchive.modify_metadata(
184
'my-item-identifier',
185
metadata={
186
'subject': ['additional-keyword', 'another-topic']
187
},
188
append_list=True
189
)
190
191
# Append to description
192
internetarchive.modify_metadata(
193
'my-item-identifier',
194
metadata={
195
'description': '\\n\\nAdditional information appended to existing description.'
196
},
197
append=True
198
)
199
```
200
201
### Collection Management
202
203
```python
204
import internetarchive
205
206
# Add item to collections
207
internetarchive.modify_metadata(
208
'my-item-identifier',
209
metadata={
210
'collection': ['opensource', 'community']
211
}
212
)
213
214
# Add to existing collections (append)
215
internetarchive.modify_metadata(
216
'my-item-identifier',
217
metadata={
218
'collection': ['new-collection']
219
},
220
append_list=True
221
)
222
```
223
224
### File-Specific Metadata
225
226
```python
227
import internetarchive
228
229
# Modify metadata for a specific file
230
internetarchive.modify_metadata(
231
'my-item-identifier',
232
metadata={
233
'title': 'Chapter 1: Introduction',
234
'creator': 'Specific Author'
235
},
236
target='files/chapter1.pdf'
237
)
238
```
239
240
### Priority and Authentication
241
242
```python
243
import internetarchive
244
245
# High-priority metadata update with specific credentials
246
internetarchive.modify_metadata(
247
'important-item',
248
metadata={
249
'title': 'Critical Update',
250
'description': 'Updated with high priority'
251
},
252
priority=5, # Higher priority
253
access_key='your-access-key',
254
secret_key='your-secret-key'
255
)
256
```
257
258
### Working with Item Objects
259
260
```python
261
import internetarchive
262
263
# Get item and modify metadata
264
item = internetarchive.get_item('my-item-identifier')
265
266
# Check current metadata
267
print(f\"Current title: {item.metadata.get('title')}\")\nprint(f\"Current creator: {item.metadata.get('creator')}\")\n\n# Update metadata\nitem.modify_metadata({\n 'title': 'New Title',\n 'description': 'Updated description'\n})\n\n# Refresh to get updated metadata\nitem.refresh()\nprint(f\"Updated title: {item.metadata.get('title')}\")\n```\n\n### Batch Metadata Operations\n\n```python\nimport internetarchive\n\n# Update multiple items with similar metadata\nitems_to_update = ['item1', 'item2', 'item3']\ncommon_metadata = {\n 'creator': 'Updated Author',\n 'subject': ['batch-update', 'corrected-metadata']\n}\n\nfor identifier in items_to_update:\n try:\n internetarchive.modify_metadata(\n identifier,\n metadata=common_metadata,\n priority=1\n )\n print(f\"Updated {identifier}\")\n except Exception as e:\n print(f\"Failed to update {identifier}: {e}\")\n```\n\n### Metadata Validation and Cleanup\n\n```python\nimport internetarchive\n\n# Get item metadata for analysis\nitem = internetarchive.get_item('example-item')\nmetadata = item.metadata\n\n# Clean up and standardize metadata\ncleanup_metadata = {}\n\n# Standardize date format\nif 'date' in metadata:\n date_str = metadata['date']\n # Convert various date formats to YYYY-MM-DD\n if len(date_str) == 4: # Year only\n cleanup_metadata['date'] = f\"{date_str}-01-01\"\n\n# Ensure subjects are properly formatted\nif 'subject' in metadata:\n subjects = metadata['subject']\n if isinstance(subjects, str):\n # Convert single string to list\n cleanup_metadata['subject'] = [subjects]\n else:\n # Clean up subject list\n cleanup_metadata['subject'] = [s.strip().lower() for s in subjects if s.strip()]\n\n# Apply cleanup if needed\nif cleanup_metadata:\n item.modify_metadata(cleanup_metadata)\n print(f\"Applied metadata cleanup: {cleanup_metadata}\")\n```\n\n### Metadata Field Analysis\n\n```python\nimport internetarchive\nfrom collections import Counter\n\n# Analyze metadata across multiple items\nsearch = internetarchive.search_items(\n 'collection:opensource',\n fields=['identifier', 'title', 'creator', 'subject']\n)\n\n# Collect metadata statistics\nall_subjects = []\ncreator_count = Counter()\n\nfor result in search:\n # Count creators\n if 'creator' in result:\n creator_count[result['creator']] += 1\n \n # Collect all subjects\n if 'subject' in result:\n subjects = result['subject']\n if isinstance(subjects, list):\n all_subjects.extend(subjects)\n else:\n all_subjects.append(subjects)\n\n# Analysis results\nprint(\"Top 10 creators:\")\nfor creator, count in creator_count.most_common(10):\n print(f\" {creator}: {count} items\")\n\nprint(\"\\nTop 10 subjects:\")\nsubject_count = Counter(all_subjects)\nfor subject, count in subject_count.most_common(10):\n print(f\" {subject}: {count} items\")\n```"}]