CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-azure-storage-file-datalake

Microsoft Azure File DataLake Storage Client Library for Python

Overall
score

92%

Overview
Eval results
Files

task.mdevals/scenario-9/

Data Lake Query Processor

Build a utility that processes data files stored in Azure Data Lake Storage Gen2 by executing SQL-like queries and converting between different data formats.

Requirements

Your solution should implement a function that:

  1. Queries CSV data files stored in Azure Data Lake Storage Gen2 using SQL-like SELECT statements
  2. Supports filtering and projection operations on the data
  3. Returns query results in JSON format for easy parsing
  4. Handles both delimited text (CSV) and JSON input formats
  5. Provides appropriate error handling for query operations

The query processor should work with data files that contain structured records with fields like customer information, order details, or log entries.

Test Cases

  • Query a CSV file with SELECT statement to filter rows and return results as JSON @test
  • Query a JSON file with SELECT statement to project specific columns @test
  • Handle query errors gracefully when invalid SQL syntax is provided @test

Implementation

@generates

API

from typing import Iterator, Dict, Any, Optional
from azure.storage.filedatalake import DataLakeFileClient

def query_file_to_json(
    file_client: DataLakeFileClient,
    query_expression: str,
    input_format: str = "csv",
    csv_delimiter: str = ",",
    csv_has_header: bool = True
) -> Iterator[Dict[str, Any]]:
    """
    Execute a SQL-like query on a data file in Azure Data Lake Storage.

    Args:
        file_client: The DataLakeFileClient pointing to the file to query
        query_expression: SQL SELECT statement to execute on the file
        input_format: Input format of the file ("csv" or "json")
        csv_delimiter: Delimiter character for CSV files
        csv_has_header: Whether CSV file has a header row

    Returns:
        Iterator of dictionaries containing the query results

    Raises:
        ValueError: If query_expression is invalid or input_format is unsupported
    """
    pass

Dependencies { .dependencies }

azure-storage-file-datalake { .dependency }

Provides Azure Data Lake Storage Gen2 client functionality including file query capabilities.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/pypi-azure-storage-file-datalake

tile.json