tessl install tessl/pypi-deeplake@4.3.0Database for AI powered by a storage format optimized for deep-learning applications.
Agent Success
Agent success rate when using this tile
75%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.6x
Baseline
Agent success rate without this tile
47%
Build a query interface for an image dataset that supports filtering, aggregation, and similarity search operations.
You are building a system to query a dataset containing images with metadata (labels, descriptions, embeddings). The dataset supports advanced query capabilities that you need to expose through a simple Python interface.
Create a Python module query_interface.py that implements the following functionality:
Implement a function filter_by_label(dataset_path: str, label: str) -> dict that:
count (number of matching images) and sample_ids (list of IDs)Implement a function get_label_statistics(dataset_path: str) -> list[dict] that:
label and count keysImplement a function find_similar_images(dataset_path: str, query_embedding: list[float], top_k: int = 5) -> list[int] that:
Implement a function filter_by_multiple_conditions(dataset_path: str, min_id: int, label: str) -> int that:
min_id AND label equals the specified labeldataset_path contains columns: id, label, description, embeddingembedding column contains vector embeddings for similarity search@generates
def filter_by_label(dataset_path: str, label: str) -> dict:
"""
Filter images by a specific label.
Args:
dataset_path: Path to the dataset
label: Label to filter by
Returns:
Dictionary with 'count' and 'sample_ids' keys
"""
pass
def get_label_statistics(dataset_path: str) -> list[dict]:
"""
Get aggregated statistics grouped by label.
Args:
dataset_path: Path to the dataset
Returns:
List of dicts with 'label' and 'count', sorted by count descending
"""
pass
def find_similar_images(dataset_path: str, query_embedding: list[float], top_k: int = 5) -> list[int]:
"""
Find top K most similar images using cosine similarity.
Args:
dataset_path: Path to the dataset
query_embedding: Query vector for similarity search
top_k: Number of similar images to return
Returns:
List of image IDs ordered by similarity
"""
pass
def filter_by_multiple_conditions(dataset_path: str, min_id: int, label: str) -> int:
"""
Count images matching multiple conditions.
Args:
dataset_path: Path to the dataset
min_id: Minimum ID threshold
label: Label to filter by
Returns:
Count of matching images
"""
passfilter_by_label returns correct count and IDs for images with label "cat" @testget_label_statistics returns aggregated counts grouped by label in descending order @testfind_similar_images returns the 3 most similar image IDs using cosine similarity @testfilter_by_multiple_conditions correctly counts images where id > 100 AND label = "dog" @testProvides dataset query capabilities.