or run

tessl search

tessl/pypi-deeplake

tessl install tessl/pypi-deeplake@4.3.0

Database for AI powered by a storage format optimized for deep-learning applications.

Agent Success

Agent success rate when using this tile

75%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.6x

Baseline

Agent success rate without this tile

47%

Image Dataset Query Interface

Build a query interface for an image dataset that supports filtering, aggregation, and similarity search operations.

Background

You are building a system to query a dataset containing images with metadata (labels, descriptions, embeddings). The dataset supports advanced query capabilities that you need to expose through a simple Python interface.

Requirements

Create a Python module query_interface.py that implements the following functionality:

Filter by Label

Implement a function filter_by_label(dataset_path: str, label: str) -> dict that:

Queries the dataset to find all images with a specific label
Returns a dictionary with keys: count (number of matching images) and sample_ids (list of IDs)

Aggregate Label Statistics

Implement a function get_label_statistics(dataset_path: str) -> list[dict] that:

Computes statistics grouped by label
Returns a list of dictionaries, each containing label and count keys
Results should be sorted by count in descending order

Find Similar Images

Implement a function find_similar_images(dataset_path: str, query_embedding: list[float], top_k: int = 5) -> list[int] that:

Finds the top K most similar images based on embedding similarity
Uses cosine similarity as the distance metric
Returns a list of image IDs ordered by similarity (most similar first)

Combined Filter Query

Implement a function filter_by_multiple_conditions(dataset_path: str, min_id: int, label: str) -> int that:

Counts images where ID is greater than min_id AND label equals the specified label
Returns the count as an integer

Implementation Notes

The dataset at dataset_path contains columns: id, label, description, embedding
The embedding column contains vector embeddings for similarity search
All functions should handle the dataset operations efficiently
Error handling for invalid paths or missing data is not required for this exercise

@generates

API

def filter_by_label(dataset_path: str, label: str) -> dict:
    """
    Filter images by a specific label.

    Args:
        dataset_path: Path to the dataset
        label: Label to filter by

    Returns:
        Dictionary with 'count' and 'sample_ids' keys
    """
    pass

def get_label_statistics(dataset_path: str) -> list[dict]:
    """
    Get aggregated statistics grouped by label.

    Args:
        dataset_path: Path to the dataset

    Returns:
        List of dicts with 'label' and 'count', sorted by count descending
    """
    pass

def find_similar_images(dataset_path: str, query_embedding: list[float], top_k: int = 5) -> list[int]:
    """
    Find top K most similar images using cosine similarity.

    Args:
        dataset_path: Path to the dataset
        query_embedding: Query vector for similarity search
        top_k: Number of similar images to return

    Returns:
        List of image IDs ordered by similarity
    """
    pass

def filter_by_multiple_conditions(dataset_path: str, min_id: int, label: str) -> int:
    """
    Count images matching multiple conditions.

    Args:
        dataset_path: Path to the dataset
        min_id: Minimum ID threshold
        label: Label to filter by

    Returns:
        Count of matching images
    """
    pass

Tests

filter_by_label returns correct count and IDs for images with label "cat" @test
get_label_statistics returns aggregated counts grouped by label in descending order @test
find_similar_images returns the 3 most similar image IDs using cosine similarity @test
filter_by_multiple_conditions correctly counts images where id > 100 AND label = "dog" @test

Dependencies { .dependencies }

deeplake { .dependency }

Provides dataset query capabilities.

Version

tessl/pypi-deeplake

task.mdevals/scenario-4/

Image Dataset Query Interface

Background

Requirements

Filter by Label

Aggregate Label Statistics

Find Similar Images

Combined Filter Query

Implementation Notes

API

Tests

Dependencies { .dependencies }

deeplake { .dependency }

Version

tessl/pypi-deeplake

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-4/

Image Dataset Query Interface

Background

Requirements

Filter by Label

Aggregate Label Statistics

Find Similar Images

Combined Filter Query

Implementation Notes

API

Tests

Dependencies { .dependencies }

deeplake { .dependency }

task.mdevals/scenario-4/