CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-xxhash

Python binding for xxHash library providing fast non-cryptographic hash algorithms

Overall
score

80%

Evaluation80%

1.03x

Agent success when using this tile

Overview
Eval results
Files

task.mdevals/scenario-3/

File Deduplication Tool

Build a command-line file deduplication tool that identifies duplicate files in a directory by computing and comparing their hashes using multiple hash algorithms.

Requirements

The tool should:

  1. Accept a directory path and compute hashes for all files in that directory
  2. Support multiple hash algorithm options (32-bit, 64-bit, and 128-bit)
  3. Group files by their hash values to identify duplicates
  4. Output a report showing which files are duplicates and their hash values

Specifications

Command-line Interface

The tool should be invokable from the command line with the following arguments:

  • directory: Path to the directory to scan (required)
  • --algorithm: Hash algorithm to use - choices: 32, 64, 128 (required)
  • --output: Output file path for the report (optional, defaults to stdout)

Hash Algorithm Mapping

  • 32: Use 32-bit hash algorithm
  • 64: Use 64-bit hash algorithm
  • 128: Use 128-bit hash algorithm

Output Format

The tool should output JSON with the following structure:

{
  "algorithm": "64",
  "total_files": 10,
  "unique_files": 7,
  "duplicate_groups": [
    {
      "hash": "a1b2c3d4e5f6g7h8",
      "files": [
        "/path/to/file1.txt",
        "/path/to/file2.txt"
      ]
    }
  ]
}

Test Cases

  • Given a directory with 3 files where 2 are identical, using 64-bit algorithm identifies 1 duplicate group @test
  • Given a directory with 4 unique files, using 32-bit algorithm reports 0 duplicate groups @test
  • Given a directory with 5 files where 3 are identical and 2 are unique, using 128-bit algorithm identifies 1 duplicate group with 3 files @test

Implementation

@generates

API

def compute_file_hash(file_path: str, algorithm: str) -> str:
    """
    Compute hash of a file using the specified algorithm.

    Args:
        file_path: Path to the file to hash
        algorithm: Hash algorithm to use ('32', '64', or '128')

    Returns:
        Hexadecimal hash string
    """
    pass

def find_duplicates(directory: str, algorithm: str) -> dict:
    """
    Find duplicate files in a directory using the specified hash algorithm.

    Args:
        directory: Path to directory to scan
        algorithm: Hash algorithm to use ('32', '64', or '128')

    Returns:
        Dictionary with keys: algorithm, total_files, unique_files, duplicate_groups
    """
    pass

def main():
    """Main entry point for the command-line tool."""
    pass

Dependencies { .dependencies }

xxhash { .dependency }

Provides fast non-cryptographic hash functions with multiple algorithm variants.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/pypi-xxhash

tile.json