or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

account-management.mdcli-interface.mdconfiguration-auth.mdfile-management.mdindex.mditem-operations.mdmetadata-operations.mdsearch-operations.mdsession-management.mdtask-management.md

metadata-operations.mddocs/

0

# Metadata Operations

1

2

Metadata operations enable viewing and modifying Archive.org item metadata with support for various update strategies including appending, targeting specific sections, and batch operations.

3

4

## Capabilities

5

6

### Metadata Modification

7

8

Modify item metadata with flexible update strategies and priority control.

9

10

```python { .api }

11

def modify_metadata(identifier, metadata, target=None, append=False, append_list=False, priority=0, access_key=None, secret_key=None, debug=False, request_kwargs=None, **get_item_kwargs):

12

"""

13

Modify metadata of an existing Archive.org item.

14

15

Args:

16

identifier (str): Item identifier

17

metadata (dict): Metadata changes to apply with keys:

18

- Standard fields: 'title', 'creator', 'description', 'date', 'subject'

19

- Collection: 'collection' (str or list of collection identifiers)

20

- Custom fields: Any valid metadata field name

21

target (str, optional): Target specific metadata section:

22

- 'metadata' (default): Modify main metadata

23

- 'collection': Modify collection membership only

24

- 'files/<filename>': Modify specific file metadata

25

append (bool): Append values to existing metadata fields instead of replacing

26

append_list (bool): Append to metadata list fields (like 'subject')

27

priority (int): Task priority for metadata update (-5 to 10, higher = more priority)

28

access_key (str, optional): IA-S3 access key (overrides config)

29

secret_key (str, optional): IA-S3 secret key (overrides config)

30

debug (bool): Enable debug logging

31

request_kwargs (dict, optional): Additional request arguments

32

**get_item_kwargs: Additional arguments passed to get_item

33

34

Returns:

35

Request or Response: Metadata modification result

36

37

Raises:

38

AuthenticationError: If authentication fails

39

ItemLocateError: If item cannot be located

40

ValueError: If metadata format is invalid

41

"""

42

```

43

44

### Item Metadata Access

45

46

Access and refresh item metadata through Item objects.

47

48

```python { .api }

49

class Item:

50

def modify_metadata(self, metadata, target=None, append=False, append_list=False, priority=0, access_key=None, secret_key=None, debug=False, request_kwargs=None):

51

"""

52

Modify metadata of this item using the same parameters as the module function.

53

54

Returns:

55

Request or Response: Metadata modification result

56

"""

57

58

def refresh(self, item_metadata=None, **kwargs):

59

"""

60

Refresh item metadata from Archive.org.

61

62

Args:

63

item_metadata (dict, optional): Use specific metadata instead of fetching

64

**kwargs: Additional arguments passed to get_metadata

65

66

Note:

67

Updates the item's metadata property with fresh data from Archive.org

68

"""

69

70

@property

71

def metadata(self):

72

"""

73

dict: Complete item metadata dictionary containing:

74

- 'identifier': Item identifier

75

- 'title': Item title

76

- 'creator': Creator/author information

77

- 'description': Item description

78

- 'date': Creation/publication date

79

- 'subject': Subject tags/keywords

80

- 'collection': Collections this item belongs to

81

- 'mediatype': Media type (texts, movies, audio, etc.)

82

- 'uploader': User who uploaded the item

83

- 'addeddate': When item was added to Archive.org

84

- 'publicdate': When item became publicly available

85

- And many other Archive.org specific fields

86

"""

87

```

88

89

### Session Metadata Operations

90

91

Retrieve metadata through ArchiveSession objects.

92

93

```python { .api }

94

class ArchiveSession:

95

def get_metadata(self, identifier, request_kwargs=None):

96

"""

97

Get item metadata from Archive.org API.

98

99

Args:

100

identifier (str): Archive.org item identifier

101

request_kwargs (dict, optional): Additional request arguments:

102

- 'timeout': Request timeout in seconds

103

- 'headers': Additional HTTP headers

104

105

Returns:

106

dict: Item metadata dictionary from API

107

108

Raises:

109

ItemLocateError: If item cannot be located

110

requests.RequestException: If API request fails

111

"""

112

```

113

114

## Metadata Field Reference

115

116

### Standard Metadata Fields

117

118

Common metadata fields supported by Archive.org:

119

120

```python { .api }

121

# Core descriptive fields

122

metadata_fields = {

123

'title': str, # Item title

124

'creator': str, # Creator/author name

125

'description': str, # Item description

126

'date': str, # Creation/publication date (YYYY-MM-DD)

127

'subject': list, # Subject tags/keywords

128

'collection': list, # Collection identifiers

129

'mediatype': str, # Media type (texts, movies, audio, etc.)

130

'language': str, # Language code (eng, fra, etc.)

131

'publisher': str, # Publisher name

132

'contributor': str, # Contributors

133

'coverage': str, # Geographic/temporal coverage

134

'rights': str, # Rights/license information

135

'source': str, # Source information

136

'relation': str, # Related items

137

'format': str, # Physical format

138

'type': str, # Resource type

139

}

140

141

# Archive.org specific fields

142

archive_fields = {

143

'identifier': str, # Unique item identifier (read-only)

144

'uploader': str, # Username of uploader (read-only)

145

'addeddate': str, # Date added to archive (read-only)

146

'publicdate': str, # Date made public (read-only)

147

'updatedate': str, # Last update date (read-only)

148

'scanner': str, # Scanning equipment used

149

'sponsor': str, # Digitization sponsor

150

'contributor': str, # Additional contributors

151

'call_number': str, # Library call number

152

'isbn': str, # ISBN for books

153

'oclc': str, # OCLC number

154

'lccn': str, # Library of Congress Control Number

155

}

156

```

157

158

## Usage Examples

159

160

### Basic Metadata Modification

161

162

```python

163

import internetarchive

164

165

# Update basic metadata

166

internetarchive.modify_metadata(

167

'my-item-identifier',

168

metadata={

169

'title': 'Updated Title',

170

'creator': 'New Author Name',

171

'description': 'Updated description of the item',

172

'subject': ['keyword1', 'keyword2', 'new-topic']

173

}

174

)

175

```

176

177

### Append to Existing Metadata

178

179

```python

180

import internetarchive

181

182

# Append to existing subjects without replacing

183

internetarchive.modify_metadata(

184

'my-item-identifier',

185

metadata={

186

'subject': ['additional-keyword', 'another-topic']

187

},

188

append_list=True

189

)

190

191

# Append to description

192

internetarchive.modify_metadata(

193

'my-item-identifier',

194

metadata={

195

'description': '\\n\\nAdditional information appended to existing description.'

196

},

197

append=True

198

)

199

```

200

201

### Collection Management

202

203

```python

204

import internetarchive

205

206

# Add item to collections

207

internetarchive.modify_metadata(

208

'my-item-identifier',

209

metadata={

210

'collection': ['opensource', 'community']

211

}

212

)

213

214

# Add to existing collections (append)

215

internetarchive.modify_metadata(

216

'my-item-identifier',

217

metadata={

218

'collection': ['new-collection']

219

},

220

append_list=True

221

)

222

```

223

224

### File-Specific Metadata

225

226

```python

227

import internetarchive

228

229

# Modify metadata for a specific file

230

internetarchive.modify_metadata(

231

'my-item-identifier',

232

metadata={

233

'title': 'Chapter 1: Introduction',

234

'creator': 'Specific Author'

235

},

236

target='files/chapter1.pdf'

237

)

238

```

239

240

### Priority and Authentication

241

242

```python

243

import internetarchive

244

245

# High-priority metadata update with specific credentials

246

internetarchive.modify_metadata(

247

'important-item',

248

metadata={

249

'title': 'Critical Update',

250

'description': 'Updated with high priority'

251

},

252

priority=5, # Higher priority

253

access_key='your-access-key',

254

secret_key='your-secret-key'

255

)

256

```

257

258

### Working with Item Objects

259

260

```python

261

import internetarchive

262

263

# Get item and modify metadata

264

item = internetarchive.get_item('my-item-identifier')

265

266

# Check current metadata

267

print(f\"Current title: {item.metadata.get('title')}\")\nprint(f\"Current creator: {item.metadata.get('creator')}\")\n\n# Update metadata\nitem.modify_metadata({\n 'title': 'New Title',\n 'description': 'Updated description'\n})\n\n# Refresh to get updated metadata\nitem.refresh()\nprint(f\"Updated title: {item.metadata.get('title')}\")\n```\n\n### Batch Metadata Operations\n\n```python\nimport internetarchive\n\n# Update multiple items with similar metadata\nitems_to_update = ['item1', 'item2', 'item3']\ncommon_metadata = {\n 'creator': 'Updated Author',\n 'subject': ['batch-update', 'corrected-metadata']\n}\n\nfor identifier in items_to_update:\n try:\n internetarchive.modify_metadata(\n identifier,\n metadata=common_metadata,\n priority=1\n )\n print(f\"Updated {identifier}\")\n except Exception as e:\n print(f\"Failed to update {identifier}: {e}\")\n```\n\n### Metadata Validation and Cleanup\n\n```python\nimport internetarchive\n\n# Get item metadata for analysis\nitem = internetarchive.get_item('example-item')\nmetadata = item.metadata\n\n# Clean up and standardize metadata\ncleanup_metadata = {}\n\n# Standardize date format\nif 'date' in metadata:\n date_str = metadata['date']\n # Convert various date formats to YYYY-MM-DD\n if len(date_str) == 4: # Year only\n cleanup_metadata['date'] = f\"{date_str}-01-01\"\n\n# Ensure subjects are properly formatted\nif 'subject' in metadata:\n subjects = metadata['subject']\n if isinstance(subjects, str):\n # Convert single string to list\n cleanup_metadata['subject'] = [subjects]\n else:\n # Clean up subject list\n cleanup_metadata['subject'] = [s.strip().lower() for s in subjects if s.strip()]\n\n# Apply cleanup if needed\nif cleanup_metadata:\n item.modify_metadata(cleanup_metadata)\n print(f\"Applied metadata cleanup: {cleanup_metadata}\")\n```\n\n### Metadata Field Analysis\n\n```python\nimport internetarchive\nfrom collections import Counter\n\n# Analyze metadata across multiple items\nsearch = internetarchive.search_items(\n 'collection:opensource',\n fields=['identifier', 'title', 'creator', 'subject']\n)\n\n# Collect metadata statistics\nall_subjects = []\ncreator_count = Counter()\n\nfor result in search:\n # Count creators\n if 'creator' in result:\n creator_count[result['creator']] += 1\n \n # Collect all subjects\n if 'subject' in result:\n subjects = result['subject']\n if isinstance(subjects, list):\n all_subjects.extend(subjects)\n else:\n all_subjects.append(subjects)\n\n# Analysis results\nprint(\"Top 10 creators:\")\nfor creator, count in creator_count.most_common(10):\n print(f\" {creator}: {count} items\")\n\nprint(\"\\nTop 10 subjects:\")\nsubject_count = Counter(all_subjects)\nfor subject, count in subject_count.most_common(10):\n print(f\" {subject}: {count} items\")\n```"}]