CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-azure-storage-file-datalake

Microsoft Azure File DataLake Storage Client Library for Python

Overall
score

92%

Overview
Eval results
Files

task.mdevals/scenario-8/

DataLake Path Analyzer

A utility for analyzing and reporting on path structures in Azure Data Lake Storage Gen2 file systems. This tool helps users understand their storage organization by providing insights into directory structures, file distributions, and path hierarchies.

Capabilities

Recursive Path Listing

List all paths (files and directories) within a file system recursively, capturing complete hierarchy information.

  • Given a file system with nested directories and files, listing paths recursively returns all paths including subdirectories @test
  • When listing recursively from the root with no path filter, all paths in the entire file system are returned @test

Non-Recursive Path Listing

List only top-level paths within a specified directory without traversing subdirectories.

  • Given a directory with files and subdirectories, listing paths non-recursively returns only immediate children @test
  • When a directory has no contents, non-recursive listing returns an empty result @test

Path Filtering by Prefix

Filter paths by a specific prefix to narrow down results to a particular subdirectory or path pattern.

  • Given a file system with multiple directories, listing paths with a specific path prefix returns only matching paths @test
  • When the path prefix doesn't match any existing paths, listing returns an empty result @test

Path Type Identification

Distinguish between directories and files in the listed results.

  • Listed paths correctly identify directories using the is_directory property @test
  • Listed paths correctly identify files by the absence of the is_directory flag @test

Implementation

@generates

API

from typing import List, Dict, Any

def analyze_file_system(
    fs_client,
    path_prefix: str = None,
    recursive: bool = True
) -> Dict[str, Any]:
    """
    Analyze path structure in a file system and return statistics.

    Args:
        fs_client: An initialized FileSystemClient instance
        path_prefix: Optional path prefix to filter results
        recursive: Whether to list paths recursively (default: True)

    Returns:
        dict: Dictionary containing:
            - total_paths: Total number of paths found
            - total_files: Number of files
            - total_directories: Number of directories
            - paths: List of path names
    """

def list_directory_contents(
    directory_client,
    recursive: bool = False,
    max_results: int = None
) -> List[str]:
    """
    List contents of a specific directory.

    Args:
        directory_client: An initialized DataLakeDirectoryClient instance
        recursive: Whether to list recursively through subdirectories
        max_results: Optional maximum number of results per page

    Returns:
        list: List of path names within the directory
    """

def filter_paths_by_type(
    fs_client,
    path_type: str,
    path_prefix: str = None
) -> List[str]:
    """
    Filter and return paths by their type (file or directory).

    Args:
        fs_client: An initialized FileSystemClient instance
        path_type: Type to filter by - either "file" or "directory"
        path_prefix: Optional path prefix to narrow results

    Returns:
        list: List of path names matching the specified type
    """

Dependencies { .dependencies }

azure-storage-file-datalake { .dependency }

Provides Azure Data Lake Storage Gen2 client library for Python, including path listing and enumeration capabilities.

Install with Tessl CLI

npx tessl i tessl/pypi-azure-storage-file-datalake

tile.json