Database for AI powered by a storage format optimized for deep-learning applications.
75
Evaluation — 75%
↑ 1.59xAgent success when using this tile
Create a utility for generating test datasets with random data for unit testing machine learning pipelines.
Your task is to implement a test data generator that creates Deep Lake datasets populated with random data. The generator should support creating datasets with multiple columns of different types and configurable numbers of samples.
Create a function generate_test_dataset(path, schema, num_samples) that:
The generator should support the following column types:
Create a function clear_test_cache() that clears any cached data to ensure clean test runs.
@generates
def generate_test_dataset(path: str, schema: dict, num_samples: int):
"""
Generate a test dataset with random data.
Args:
path: Path where the dataset will be created
schema: Dictionary mapping column names to type strings
Supported types: 'text', 'int', 'float', 'embedding'
num_samples: Number of random samples to generate
Returns:
The created dataset object
"""
pass
def clear_test_cache():
"""
Clear cached data for clean test runs.
"""
passProvides dataset creation and management capabilities including random data generation and cache management.
@satisfied-by
Install with Tessl CLI
npx tessl i tessl/pypi-deeplakedocs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10