or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdfile-listing.mdfile-management.mdfile-transfer.mdhash-operations.mdindex.mdpublic-links.mdremote-management.md

hash-operations.mddocs/

0

# Hash Operations and Verification

1

2

Data integrity operations including hash generation, validation, and file comparison with support for multiple hash algorithms and comprehensive checksum verification workflows.

3

4

## Capabilities

5

6

### Hash Generation and Validation

7

8

Generate hashes for files or validate existing checksums using various hash algorithms supported by different storage backends.

9

10

```python { .api }

11

def hash(hash: Union[str, HashTypes], path: str, download=False,

12

checkfile: Optional[str] = None, output_file: Optional[str] = None,

13

args: List[str] = None) -> Union[None, str, bool, Dict[str, str], Dict[str, bool]]:

14

"""

15

Generates or validates file hashes using specified algorithm.

16

17

Parameters:

18

- hash (Union[str, HashTypes]): Hash algorithm (e.g., 'sha1', HashTypes.md5)

19

- path (str): File or directory path to hash

20

- download (bool): Download files to hash locally (useful for unsupported algorithms)

21

- checkfile (str, optional): Validate against existing checksum file

22

- output_file (str, optional): Write hashes to file instead of returning

23

- args (List[str]): Additional rclone hashsum flags

24

25

Returns:

26

- None: When output_file is specified

27

- str: Hash value for single file (when no checkfile/output_file)

28

- bool: Validation result for single file (when checkfile specified)

29

- Dict[str, str]: Filename -> hash mapping for multiple files

30

- Dict[str, bool]: Filename -> validation result for multiple files

31

32

Raises:

33

RcloneException: If hash operation fails

34

"""

35

```

36

37

### File Comparison

38

39

Compare files between source and destination to identify differences, missing files, and data integrity issues.

40

41

```python { .api }

42

def check(source: str, dest: str, combined: str = None, size_only: bool = False,

43

download: bool = False, one_way: bool = False, args: List[str] = None

44

) -> Tuple[bool, List[Tuple[str, str]]]:

45

"""

46

Compares files between source and destination locations.

47

48

Parameters:

49

- source (str): Source path to compare from

50

- dest (str): Destination path to compare to

51

- combined (str, optional): Output file path for detailed results

52

- size_only (bool): Compare only file sizes, not hashes (faster)

53

- download (bool): Download files for local comparison

54

- one_way (bool): Only check source -> dest, ignore extra dest files

55

- args (List[str]): Additional rclone check flags

56

57

Returns:

58

Tuple[bool, List[Tuple[str, str]]]:

59

- bool: True if all files match

60

- List of (symbol, filepath) tuples where symbols mean:

61

"=" - File identical in source and destination

62

"-" - File missing from source (only in destination)

63

"+" - File missing from destination (only in source)

64

"*" - File exists in both but differs

65

"!" - Error reading or hashing file

66

67

Raises:

68

RcloneException: If comparison operation fails

69

"""

70

```

71

72

## Usage Examples

73

74

### Basic Hash Generation

75

76

```python

77

from rclone_python import rclone

78

from rclone_python.hash_types import HashTypes

79

80

# Generate SHA1 hash for single file

81

sha1_hash = rclone.hash(HashTypes.sha1, 'onedrive:document.pdf')

82

print(f"SHA1: {sha1_hash}")

83

84

# Generate MD5 hashes for directory

85

md5_hashes = rclone.hash('md5', 'dropbox:photos')

86

for filename, hash_value in md5_hashes.items():

87

print(f"{filename}: {hash_value}")

88

89

# Use string hash type

90

sha256_hash = rclone.hash('sha256', 'box:archive.zip')

91

```

92

93

### Hash Validation

94

95

```python

96

from rclone_python import rclone

97

from rclone_python.hash_types import HashTypes

98

99

# Create checksum file first

100

rclone.hash(HashTypes.sha1, 'backup:files', output_file='checksums.sha1')

101

102

# Later, validate against checksum file

103

validation_results = rclone.hash(

104

HashTypes.sha1,

105

'backup:files',

106

checkfile='checksums.sha1'

107

)

108

109

# Check results

110

if isinstance(validation_results, dict):

111

for filename, is_valid in validation_results.items():

112

status = "✓ Valid" if is_valid else "✗ Invalid"

113

print(f"{filename}: {status}")

114

else:

115

# Single file result

116

print(f"File validation: {'✓ Valid' if validation_results else '✗ Invalid'}")

117

```

118

119

### File Comparison

120

121

```python

122

from rclone_python import rclone

123

124

# Compare directories

125

matches, file_list = rclone.check('local:source', 'onedrive:backup')

126

127

print(f"Directories match: {matches}")

128

print("\nFile comparison results:")

129

130

for symbol, filepath in file_list:

131

status_map = {

132

'=': '✓ Identical',

133

'+': '→ Missing from destination',

134

'-': '← Extra in destination',

135

'*': '≠ Different',

136

'!': '⚠ Error'

137

}

138

139

status = status_map.get(symbol, symbol)

140

print(f"{filepath}: {status}")

141

```

142

143

### Comprehensive Data Integrity Check

144

145

```python

146

from rclone_python import rclone

147

from rclone_python.hash_types import HashTypes

148

149

def verify_backup_integrity(source_path, backup_path):

150

"""Comprehensive backup integrity verification"""

151

152

print(f"Verifying backup integrity...")

153

print(f"Source: {source_path}")

154

print(f"Backup: {backup_path}")

155

156

# First, do a quick size-only check

157

print("\n1. Quick size comparison...")

158

size_match, size_results = rclone.check(

159

source_path, backup_path,

160

size_only=True

161

)

162

163

if not size_match:

164

print("⚠ Size differences detected:")

165

for symbol, filepath in size_results:

166

if symbol != '=':

167

print(f" {symbol} {filepath}")

168

169

# Ask if user wants to continue with hash check

170

response = input("\nContinue with hash verification? (y/n): ")

171

if response.lower() != 'y':

172

return

173

174

# Full hash-based comparison

175

print("\n2. Hash-based verification...")

176

hash_match, hash_results = rclone.check(source_path, backup_path)

177

178

# Categorize results

179

identical = [f for s, f in hash_results if s == '=']

180

different = [f for s, f in hash_results if s == '*']

181

missing_dest = [f for s, f in hash_results if s == '+']

182

extra_dest = [f for s, f in hash_results if s == '-']

183

errors = [f for s, f in hash_results if s == '!']

184

185

print(f"\nResults:")

186

print(f" ✓ Identical files: {len(identical)}")

187

print(f" ≠ Different files: {len(different)}")

188

print(f" → Missing from backup: {len(missing_dest)}")

189

print(f" ← Extra in backup: {len(extra_dest)}")

190

print(f" ⚠ Errors: {len(errors)}")

191

192

# Show problems

193

if different:

194

print(f"\nFiles with differences:")

195

for filepath in different[:10]: # Show first 10

196

print(f" {filepath}")

197

198

if missing_dest:

199

print(f"\nFiles missing from backup:")

200

for filepath in missing_dest[:10]:

201

print(f" {filepath}")

202

203

return hash_match

204

205

# Run integrity check

206

verify_backup_integrity('important_data', 'onedrive:backup/important_data')

207

```

208

209

### Hash-based File Deduplication

210

211

```python

212

from rclone_python import rclone

213

from rclone_python.hash_types import HashTypes

214

from collections import defaultdict

215

216

def find_duplicate_files(path):

217

"""Find duplicate files based on hash values"""

218

219

print(f"Scanning for duplicates in: {path}")

220

221

# Generate hashes for all files

222

hashes = rclone.hash(HashTypes.md5, path)

223

224

if isinstance(hashes, str):

225

print("Only one file found, no duplicates possible")

226

return

227

228

# Group files by hash

229

hash_groups = defaultdict(list)

230

for filename, hash_value in hashes.items():

231

hash_groups[hash_value].append(filename)

232

233

# Find duplicates

234

duplicates = {h: files for h, files in hash_groups.items() if len(files) > 1}

235

236

if duplicates:

237

print(f"\nFound {len(duplicates)} sets of duplicate files:")

238

239

total_wasted_space = 0

240

for hash_value, files in duplicates.items():

241

print(f"\nDuplicate set (hash: {hash_value[:8]}...):")

242

243

# Get file sizes

244

file_sizes = []

245

for filename in files:

246

file_list = rclone.ls(path, files_only=True)

247

file_info = next((f for f in file_list if f['Name'] == filename), None)

248

if file_info:

249

file_sizes.append(file_info['Size'])

250

print(f" {filename} ({file_info['Size']} bytes)")

251

252

if file_sizes:

253

wasted = sum(file_sizes[1:]) # All but first file are duplicates

254

total_wasted_space += wasted

255

256

print(f"\nTotal wasted space: {total_wasted_space / (1024**2):.2f} MB")

257

else:

258

print("No duplicate files found")

259

260

# Find duplicates in directory

261

find_duplicate_files('downloads:')

262

```

263

264

### Periodic Integrity Monitoring

265

266

```python

267

from rclone_python import rclone

268

from rclone_python.hash_types import HashTypes

269

import json

270

import os

271

from datetime import datetime

272

273

def create_integrity_baseline(path, baseline_file):

274

"""Create integrity baseline for monitoring"""

275

276

print(f"Creating integrity baseline for: {path}")

277

278

# Generate hashes for all files

279

hashes = rclone.hash(HashTypes.sha256, path)

280

281

# Get file metadata

282

files = rclone.ls(path, files_only=True)

283

file_metadata = {f['Name']: f for f in files}

284

285

# Create baseline data

286

baseline = {

287

'created': datetime.now().isoformat(),

288

'path': path,

289

'files': {}

290

}

291

292

if isinstance(hashes, dict):

293

for filename, hash_value in hashes.items():

294

metadata = file_metadata.get(filename, {})

295

baseline['files'][filename] = {

296

'hash': hash_value,

297

'size': metadata.get('Size', 0),

298

'modified': metadata.get('ModTime', '')

299

}

300

301

# Save baseline

302

with open(baseline_file, 'w') as f:

303

json.dump(baseline, f, indent=2)

304

305

print(f"Baseline saved to: {baseline_file}")

306

print(f"Tracked files: {len(baseline['files'])}")

307

308

def check_against_baseline(path, baseline_file):

309

"""Check current state against integrity baseline"""

310

311

if not os.path.exists(baseline_file):

312

print(f"Baseline file not found: {baseline_file}")

313

return

314

315

# Load baseline

316

with open(baseline_file, 'r') as f:

317

baseline = json.load(f)

318

319

print(f"Checking integrity against baseline from {baseline['created']}")

320

321

# Get current hashes

322

current_hashes = rclone.hash(HashTypes.sha256, path)

323

324

if isinstance(current_hashes, str):

325

current_hashes = {path.split('/')[-1]: current_hashes}

326

327

# Compare with baseline

328

issues = []

329

330

# Check for modified files

331

for filename, current_hash in current_hashes.items():

332

if filename in baseline['files']:

333

baseline_hash = baseline['files'][filename]['hash']

334

if current_hash != baseline_hash:

335

issues.append(f"MODIFIED: {filename}")

336

337

# Check for missing files

338

for filename in baseline['files']:

339

if filename not in current_hashes:

340

issues.append(f"MISSING: {filename}")

341

342

# Check for new files

343

for filename in current_hashes:

344

if filename not in baseline['files']:

345

issues.append(f"NEW: {filename}")

346

347

if issues:

348

print(f"\n⚠ Integrity issues detected:")

349

for issue in issues:

350

print(f" {issue}")

351

else:

352

print("✓ All files match baseline - integrity verified")

353

354

return len(issues) == 0

355

356

# Create and use integrity monitoring

357

create_integrity_baseline('critical_data:', 'critical_data_baseline.json')

358

359

# Later, check integrity

360

check_against_baseline('critical_data:', 'critical_data_baseline.json')

361

```

362

363

## Supported Hash Algorithms

364

365

The HashTypes enum provides access to all hash algorithms supported by rclone:

366

367

### Common Hash Types

368

- **md5**: MD5 hash (widely supported)

369

- **sha1**: SHA-1 hash (common for Git, legacy systems)

370

- **sha256**: SHA-256 hash (recommended for security)

371

- **sha512**: SHA-512 hash (highest security)

372

373

### Specialized Hash Types

374

- **crc32**: CRC32 checksum (fast, error detection)

375

- **dropbox**: Dropbox content hash

376

- **quickxor**: Microsoft OneDrive QuickXOR

377

- **whirlpool**: Whirlpool hash algorithm

378

379

### Backend-Specific Hashes

380

- **hidrive**: HiDrive-specific hash

381

- **mailru**: Mail.ru Cloud hash

382

383

## Hash Algorithm Selection

384

385

Choose hash algorithms based on your needs and backend support:

386

387

```python

388

from rclone_python import rclone

389

from rclone_python.hash_types import HashTypes

390

391

# Check what hashes a backend supports

392

def check_backend_hashes(remote_path):

393

"""Test which hash algorithms work with a backend"""

394

395

test_algorithms = [HashTypes.md5, HashTypes.sha1, HashTypes.sha256, HashTypes.crc32]

396

supported = []

397

398

for algorithm in test_algorithms:

399

try:

400

result = rclone.hash(algorithm, remote_path)

401

supported.append(algorithm.value)

402

except Exception:

403

pass # Algorithm not supported

404

405

print(f"Supported hashes for {remote_path}: {supported}")

406

return supported

407

408

# Test backend support

409

check_backend_hashes('onedrive:test.txt')

410

```

411

412

## Advanced Verification Workflows

413

414

### Multi-Algorithm Verification

415

416

```python

417

from rclone_python import rclone

418

from rclone_python.hash_types import HashTypes

419

420

def multi_hash_verification(path):

421

"""Verify files using multiple hash algorithms"""

422

423

algorithms = [HashTypes.md5, HashTypes.sha1, HashTypes.sha256]

424

results = {}

425

426

for algorithm in algorithms:

427

try:

428

hashes = rclone.hash(algorithm, path)

429

results[algorithm.value] = hashes

430

print(f"✓ {algorithm.value}: Generated hashes for {len(hashes) if isinstance(hashes, dict) else 1} files")

431

except Exception as e:

432

print(f"✗ {algorithm.value}: Failed - {e}")

433

434

return results

435

436

# Generate multiple hash types

437

multi_hashes = multi_hash_verification('important:files/')

438

```