or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/python-levenshtein@0.27.x
tile.json

tessl/pypi-python-levenshtein

tessl install tessl/pypi-python-levenshtein@0.27.0

Python compatibility wrapper for computing string edit distances and similarities using fast Levenshtein algorithms.

Agent Success

Agent success rate when using this tile

88%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.38x

Baseline

Agent success rate without this tile

64%

task.mdevals/scenario-1/

Name Matching System

Build a simple name matching system that identifies potential duplicate or similar names in a database.

Background

You're building a system to help identify potential duplicate person records in a database. The system should use string similarity metrics to find names that are likely referring to the same person, even when there are typos, spelling variations, or slight differences.

Requirements

Implement a name matching module that provides the following functionality:

  1. Calculate Similarity: Compute a similarity score between two names. The score should range from 0.0 (completely different) to 1.0 (identical).

  2. Find Matches Above Threshold: Given a target name and a list of candidate names, return all names that have a similarity score above a specified threshold.

  3. Find Best Match: Given a target name and a list of candidate names, return the single best matching name (the one with the highest similarity score).

Specifications

Your implementation should handle:

  • Case-insensitive comparison (e.g., "John" should match "john")
  • Similarity scoring that is tolerant of minor character differences
  • Empty string handling (should return appropriate similarity scores)
  • Finding matches efficiently across multiple candidates

The similarity algorithm should be particularly suited for proper nouns and names, giving appropriate weight to character matching and transpositions.

Test Cases

  • Comparing "Martha" (case-insensitive) with "Marhta" should return a similarity score greater than 0.9 @test
  • Finding matches for "Dwayne" in ["Duane", "Wayne", "Diana", "Dwayne"] with threshold 0.85 should return a list containing "Duane" and "Dwayne" @test
  • Finding the best match for "Jon Smith" in ["John Smith", "Jane Smith", "Jon Smyth"] should return "John Smith" @test

Implementation

@generates

API

def calculate_similarity(name1: str, name2: str) -> float:
    """
    Calculate similarity score between two names.

    Args:
        name1: First name to compare (case-insensitive)
        name2: Second name to compare (case-insensitive)

    Returns:
        Float between 0.0 and 1.0 representing similarity
    """
    pass

def find_matches(target: str, candidates: list[str], threshold: float) -> list[str]:
    """
    Find all names in candidates that match target above the threshold.

    Args:
        target: Name to match against (case-insensitive)
        candidates: List of candidate names to search
        threshold: Minimum similarity score (0.0 to 1.0)

    Returns:
        List of names with similarity >= threshold, in original order
    """
    pass

def find_best_match(target: str, candidates: list[str]) -> str:
    """
    Find the single best matching name from candidates.

    Args:
        target: Name to match against (case-insensitive)
        candidates: List of candidate names to search

    Returns:
        The candidate name with the highest similarity score
    """
    pass

Dependencies { .dependencies }

Levenshtein { .dependency }

Provides string similarity and distance computation functions.