0
# Hash Operations and Verification
1
2
Data integrity operations including hash generation, validation, and file comparison with support for multiple hash algorithms and comprehensive checksum verification workflows.
3
4
## Capabilities
5
6
### Hash Generation and Validation
7
8
Generate hashes for files or validate existing checksums using various hash algorithms supported by different storage backends.
9
10
```python { .api }
11
def hash(hash: Union[str, HashTypes], path: str, download=False,
12
checkfile: Optional[str] = None, output_file: Optional[str] = None,
13
args: List[str] = None) -> Union[None, str, bool, Dict[str, str], Dict[str, bool]]:
14
"""
15
Generates or validates file hashes using specified algorithm.
16
17
Parameters:
18
- hash (Union[str, HashTypes]): Hash algorithm (e.g., 'sha1', HashTypes.md5)
19
- path (str): File or directory path to hash
20
- download (bool): Download files to hash locally (useful for unsupported algorithms)
21
- checkfile (str, optional): Validate against existing checksum file
22
- output_file (str, optional): Write hashes to file instead of returning
23
- args (List[str]): Additional rclone hashsum flags
24
25
Returns:
26
- None: When output_file is specified
27
- str: Hash value for single file (when no checkfile/output_file)
28
- bool: Validation result for single file (when checkfile specified)
29
- Dict[str, str]: Filename -> hash mapping for multiple files
30
- Dict[str, bool]: Filename -> validation result for multiple files
31
32
Raises:
33
RcloneException: If hash operation fails
34
"""
35
```
36
37
### File Comparison
38
39
Compare files between source and destination to identify differences, missing files, and data integrity issues.
40
41
```python { .api }
42
def check(source: str, dest: str, combined: str = None, size_only: bool = False,
43
download: bool = False, one_way: bool = False, args: List[str] = None
44
) -> Tuple[bool, List[Tuple[str, str]]]:
45
"""
46
Compares files between source and destination locations.
47
48
Parameters:
49
- source (str): Source path to compare from
50
- dest (str): Destination path to compare to
51
- combined (str, optional): Output file path for detailed results
52
- size_only (bool): Compare only file sizes, not hashes (faster)
53
- download (bool): Download files for local comparison
54
- one_way (bool): Only check source -> dest, ignore extra dest files
55
- args (List[str]): Additional rclone check flags
56
57
Returns:
58
Tuple[bool, List[Tuple[str, str]]]:
59
- bool: True if all files match
60
- List of (symbol, filepath) tuples where symbols mean:
61
"=" - File identical in source and destination
62
"-" - File missing from source (only in destination)
63
"+" - File missing from destination (only in source)
64
"*" - File exists in both but differs
65
"!" - Error reading or hashing file
66
67
Raises:
68
RcloneException: If comparison operation fails
69
"""
70
```
71
72
## Usage Examples
73
74
### Basic Hash Generation
75
76
```python
77
from rclone_python import rclone
78
from rclone_python.hash_types import HashTypes
79
80
# Generate SHA1 hash for single file
81
sha1_hash = rclone.hash(HashTypes.sha1, 'onedrive:document.pdf')
82
print(f"SHA1: {sha1_hash}")
83
84
# Generate MD5 hashes for directory
85
md5_hashes = rclone.hash('md5', 'dropbox:photos')
86
for filename, hash_value in md5_hashes.items():
87
print(f"{filename}: {hash_value}")
88
89
# Use string hash type
90
sha256_hash = rclone.hash('sha256', 'box:archive.zip')
91
```
92
93
### Hash Validation
94
95
```python
96
from rclone_python import rclone
97
from rclone_python.hash_types import HashTypes
98
99
# Create checksum file first
100
rclone.hash(HashTypes.sha1, 'backup:files', output_file='checksums.sha1')
101
102
# Later, validate against checksum file
103
validation_results = rclone.hash(
104
HashTypes.sha1,
105
'backup:files',
106
checkfile='checksums.sha1'
107
)
108
109
# Check results
110
if isinstance(validation_results, dict):
111
for filename, is_valid in validation_results.items():
112
status = "✓ Valid" if is_valid else "✗ Invalid"
113
print(f"{filename}: {status}")
114
else:
115
# Single file result
116
print(f"File validation: {'✓ Valid' if validation_results else '✗ Invalid'}")
117
```
118
119
### File Comparison
120
121
```python
122
from rclone_python import rclone
123
124
# Compare directories
125
matches, file_list = rclone.check('local:source', 'onedrive:backup')
126
127
print(f"Directories match: {matches}")
128
print("\nFile comparison results:")
129
130
for symbol, filepath in file_list:
131
status_map = {
132
'=': '✓ Identical',
133
'+': '→ Missing from destination',
134
'-': '← Extra in destination',
135
'*': '≠ Different',
136
'!': '⚠ Error'
137
}
138
139
status = status_map.get(symbol, symbol)
140
print(f"{filepath}: {status}")
141
```
142
143
### Comprehensive Data Integrity Check
144
145
```python
146
from rclone_python import rclone
147
from rclone_python.hash_types import HashTypes
148
149
def verify_backup_integrity(source_path, backup_path):
150
"""Comprehensive backup integrity verification"""
151
152
print(f"Verifying backup integrity...")
153
print(f"Source: {source_path}")
154
print(f"Backup: {backup_path}")
155
156
# First, do a quick size-only check
157
print("\n1. Quick size comparison...")
158
size_match, size_results = rclone.check(
159
source_path, backup_path,
160
size_only=True
161
)
162
163
if not size_match:
164
print("⚠ Size differences detected:")
165
for symbol, filepath in size_results:
166
if symbol != '=':
167
print(f" {symbol} {filepath}")
168
169
# Ask if user wants to continue with hash check
170
response = input("\nContinue with hash verification? (y/n): ")
171
if response.lower() != 'y':
172
return
173
174
# Full hash-based comparison
175
print("\n2. Hash-based verification...")
176
hash_match, hash_results = rclone.check(source_path, backup_path)
177
178
# Categorize results
179
identical = [f for s, f in hash_results if s == '=']
180
different = [f for s, f in hash_results if s == '*']
181
missing_dest = [f for s, f in hash_results if s == '+']
182
extra_dest = [f for s, f in hash_results if s == '-']
183
errors = [f for s, f in hash_results if s == '!']
184
185
print(f"\nResults:")
186
print(f" ✓ Identical files: {len(identical)}")
187
print(f" ≠ Different files: {len(different)}")
188
print(f" → Missing from backup: {len(missing_dest)}")
189
print(f" ← Extra in backup: {len(extra_dest)}")
190
print(f" ⚠ Errors: {len(errors)}")
191
192
# Show problems
193
if different:
194
print(f"\nFiles with differences:")
195
for filepath in different[:10]: # Show first 10
196
print(f" {filepath}")
197
198
if missing_dest:
199
print(f"\nFiles missing from backup:")
200
for filepath in missing_dest[:10]:
201
print(f" {filepath}")
202
203
return hash_match
204
205
# Run integrity check
206
verify_backup_integrity('important_data', 'onedrive:backup/important_data')
207
```
208
209
### Hash-based File Deduplication
210
211
```python
212
from rclone_python import rclone
213
from rclone_python.hash_types import HashTypes
214
from collections import defaultdict
215
216
def find_duplicate_files(path):
217
"""Find duplicate files based on hash values"""
218
219
print(f"Scanning for duplicates in: {path}")
220
221
# Generate hashes for all files
222
hashes = rclone.hash(HashTypes.md5, path)
223
224
if isinstance(hashes, str):
225
print("Only one file found, no duplicates possible")
226
return
227
228
# Group files by hash
229
hash_groups = defaultdict(list)
230
for filename, hash_value in hashes.items():
231
hash_groups[hash_value].append(filename)
232
233
# Find duplicates
234
duplicates = {h: files for h, files in hash_groups.items() if len(files) > 1}
235
236
if duplicates:
237
print(f"\nFound {len(duplicates)} sets of duplicate files:")
238
239
total_wasted_space = 0
240
for hash_value, files in duplicates.items():
241
print(f"\nDuplicate set (hash: {hash_value[:8]}...):")
242
243
# Get file sizes
244
file_sizes = []
245
for filename in files:
246
file_list = rclone.ls(path, files_only=True)
247
file_info = next((f for f in file_list if f['Name'] == filename), None)
248
if file_info:
249
file_sizes.append(file_info['Size'])
250
print(f" {filename} ({file_info['Size']} bytes)")
251
252
if file_sizes:
253
wasted = sum(file_sizes[1:]) # All but first file are duplicates
254
total_wasted_space += wasted
255
256
print(f"\nTotal wasted space: {total_wasted_space / (1024**2):.2f} MB")
257
else:
258
print("No duplicate files found")
259
260
# Find duplicates in directory
261
find_duplicate_files('downloads:')
262
```
263
264
### Periodic Integrity Monitoring
265
266
```python
267
from rclone_python import rclone
268
from rclone_python.hash_types import HashTypes
269
import json
270
import os
271
from datetime import datetime
272
273
def create_integrity_baseline(path, baseline_file):
274
"""Create integrity baseline for monitoring"""
275
276
print(f"Creating integrity baseline for: {path}")
277
278
# Generate hashes for all files
279
hashes = rclone.hash(HashTypes.sha256, path)
280
281
# Get file metadata
282
files = rclone.ls(path, files_only=True)
283
file_metadata = {f['Name']: f for f in files}
284
285
# Create baseline data
286
baseline = {
287
'created': datetime.now().isoformat(),
288
'path': path,
289
'files': {}
290
}
291
292
if isinstance(hashes, dict):
293
for filename, hash_value in hashes.items():
294
metadata = file_metadata.get(filename, {})
295
baseline['files'][filename] = {
296
'hash': hash_value,
297
'size': metadata.get('Size', 0),
298
'modified': metadata.get('ModTime', '')
299
}
300
301
# Save baseline
302
with open(baseline_file, 'w') as f:
303
json.dump(baseline, f, indent=2)
304
305
print(f"Baseline saved to: {baseline_file}")
306
print(f"Tracked files: {len(baseline['files'])}")
307
308
def check_against_baseline(path, baseline_file):
309
"""Check current state against integrity baseline"""
310
311
if not os.path.exists(baseline_file):
312
print(f"Baseline file not found: {baseline_file}")
313
return
314
315
# Load baseline
316
with open(baseline_file, 'r') as f:
317
baseline = json.load(f)
318
319
print(f"Checking integrity against baseline from {baseline['created']}")
320
321
# Get current hashes
322
current_hashes = rclone.hash(HashTypes.sha256, path)
323
324
if isinstance(current_hashes, str):
325
current_hashes = {path.split('/')[-1]: current_hashes}
326
327
# Compare with baseline
328
issues = []
329
330
# Check for modified files
331
for filename, current_hash in current_hashes.items():
332
if filename in baseline['files']:
333
baseline_hash = baseline['files'][filename]['hash']
334
if current_hash != baseline_hash:
335
issues.append(f"MODIFIED: {filename}")
336
337
# Check for missing files
338
for filename in baseline['files']:
339
if filename not in current_hashes:
340
issues.append(f"MISSING: {filename}")
341
342
# Check for new files
343
for filename in current_hashes:
344
if filename not in baseline['files']:
345
issues.append(f"NEW: {filename}")
346
347
if issues:
348
print(f"\n⚠ Integrity issues detected:")
349
for issue in issues:
350
print(f" {issue}")
351
else:
352
print("✓ All files match baseline - integrity verified")
353
354
return len(issues) == 0
355
356
# Create and use integrity monitoring
357
create_integrity_baseline('critical_data:', 'critical_data_baseline.json')
358
359
# Later, check integrity
360
check_against_baseline('critical_data:', 'critical_data_baseline.json')
361
```
362
363
## Supported Hash Algorithms
364
365
The HashTypes enum provides access to all hash algorithms supported by rclone:
366
367
### Common Hash Types
368
- **md5**: MD5 hash (widely supported)
369
- **sha1**: SHA-1 hash (common for Git, legacy systems)
370
- **sha256**: SHA-256 hash (recommended for security)
371
- **sha512**: SHA-512 hash (highest security)
372
373
### Specialized Hash Types
374
- **crc32**: CRC32 checksum (fast, error detection)
375
- **dropbox**: Dropbox content hash
376
- **quickxor**: Microsoft OneDrive QuickXOR
377
- **whirlpool**: Whirlpool hash algorithm
378
379
### Backend-Specific Hashes
380
- **hidrive**: HiDrive-specific hash
381
- **mailru**: Mail.ru Cloud hash
382
383
## Hash Algorithm Selection
384
385
Choose hash algorithms based on your needs and backend support:
386
387
```python
388
from rclone_python import rclone
389
from rclone_python.hash_types import HashTypes
390
391
# Check what hashes a backend supports
392
def check_backend_hashes(remote_path):
393
"""Test which hash algorithms work with a backend"""
394
395
test_algorithms = [HashTypes.md5, HashTypes.sha1, HashTypes.sha256, HashTypes.crc32]
396
supported = []
397
398
for algorithm in test_algorithms:
399
try:
400
result = rclone.hash(algorithm, remote_path)
401
supported.append(algorithm.value)
402
except Exception:
403
pass # Algorithm not supported
404
405
print(f"Supported hashes for {remote_path}: {supported}")
406
return supported
407
408
# Test backend support
409
check_backend_hashes('onedrive:test.txt')
410
```
411
412
## Advanced Verification Workflows
413
414
### Multi-Algorithm Verification
415
416
```python
417
from rclone_python import rclone
418
from rclone_python.hash_types import HashTypes
419
420
def multi_hash_verification(path):
421
"""Verify files using multiple hash algorithms"""
422
423
algorithms = [HashTypes.md5, HashTypes.sha1, HashTypes.sha256]
424
results = {}
425
426
for algorithm in algorithms:
427
try:
428
hashes = rclone.hash(algorithm, path)
429
results[algorithm.value] = hashes
430
print(f"✓ {algorithm.value}: Generated hashes for {len(hashes) if isinstance(hashes, dict) else 1} files")
431
except Exception as e:
432
print(f"✗ {algorithm.value}: Failed - {e}")
433
434
return results
435
436
# Generate multiple hash types
437
multi_hashes = multi_hash_verification('important:files/')
438
```