tessl install tessl/pypi-deeplake@4.3.0Database for AI powered by a storage format optimized for deep-learning applications.
Agent Success
Agent success rate when using this tile
75%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.6x
Baseline
Agent success rate without this tile
47%
Build a dataset management system that can work seamlessly across different storage environments including local filesystem and cloud storage.
Your system must support the following operations:
Dataset Initialization: Create datasets in different storage locations based on URL patterns:
./local_data, /tmp/datasets)s3://bucket/path, gcs://bucket/path, azure://container/path)Dataset Existence Check: Before creating a dataset, verify if it already exists at a given location to avoid overwriting existing data.
Dataset Operations: Implement basic operations that work consistently regardless of storage backend:
Cross-Storage Migration: Copy an existing dataset from one storage location to another (e.g., from local to cloud or vice versa).
Dataset Cleanup: Remove datasets from storage when no longer needed.
Given a local path ./test_dataset, create a dataset with columns text (text type) and value (integer type), add 3 sample rows with data, and verify the dataset exists at that location @test
Given a dataset exists at ./source_dataset, copy it to ./destination_dataset and verify both datasets exist and contain the same data @test
Given a dataset path ./cleanup_test, create a dataset, verify it exists, then delete it and verify it no longer exists @test
@generates
class DatasetManager:
"""Manages datasets across different storage backends."""
def dataset_exists(self, path: str) -> bool:
"""Check if a dataset exists at the given path."""
pass
def create_dataset(self, path: str) -> None:
"""Create a new dataset with text and value columns at the specified path."""
pass
def add_sample_data(self, path: str) -> None:
"""Add 3 sample rows to the dataset."""
pass
def copy_dataset(self, source_path: str, destination_path: str) -> None:
"""Copy dataset from source to destination."""
pass
def delete_dataset(self, path: str) -> None:
"""Delete the dataset at the given path."""
passProvides dataset storage and management with multi-cloud support.