CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-deeplake

Database for AI powered by a storage format optimized for deep-learning applications.

75

1.59x

Evaluation75%

1.59x

Agent success when using this tile

Overview
Eval results
Files

task.mdevals/scenario-1/

Dataset Test Data Generator

Create a utility for generating test datasets with random data for unit testing machine learning pipelines.

Requirements

Your task is to implement a test data generator that creates Deep Lake datasets populated with random data. The generator should support creating datasets with multiple columns of different types and configurable numbers of samples.

Core Functionality

  1. Create a function generate_test_dataset(path, schema, num_samples) that:

    • Creates a new dataset at the specified path
    • Adds columns based on the schema dictionary (keys are column names, values are column types)
    • Populates the dataset with the specified number of random samples
    • Returns the created dataset
  2. The generator should support the following column types:

    • Text columns: Generate random strings of 5-15 characters
    • Integer columns: Generate random integers between 0 and 100
    • Float columns: Generate random floats between 0.0 and 1.0
    • Embedding columns: Generate random float arrays of dimension 128
  3. Create a function clear_test_cache() that clears any cached data to ensure clean test runs.

@generates

Test Cases

  • Given a path "./test_ds", schema {"name": "text", "score": "int"}, and 10 samples, the function creates a dataset with 10 rows containing random text strings and integers @test
  • Given a schema {"embedding": "embedding"} with dimension 128, the function generates 5 samples with random embedding vectors of length 128 @test
  • Calling clear_test_cache() successfully clears cached data without raising errors @test

API

def generate_test_dataset(path: str, schema: dict, num_samples: int):
    """
    Generate a test dataset with random data.

    Args:
        path: Path where the dataset will be created
        schema: Dictionary mapping column names to type strings
                Supported types: 'text', 'int', 'float', 'embedding'
        num_samples: Number of random samples to generate

    Returns:
        The created dataset object
    """
    pass

def clear_test_cache():
    """
    Clear cached data for clean test runs.
    """
    pass

Dependencies { .dependencies }

deeplake { .dependency }

Provides dataset creation and management capabilities including random data generation and cache management.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/pypi-deeplake

tile.json