CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-fastcore

Python supercharged for fastai development

56

1.36x
Overview
Eval results
Files

task.mdevals/scenario-6/

Code Search and Analysis Tool

Overview

Build a command-line tool that searches through a codebase directory to find and analyze Python files based on various criteria. The tool should support filtering by file patterns, searching for specific content within files, and generating reports about the files found.

Requirements

1. File Discovery

Create a function find_python_files(root_dir, file_pattern=None, exclude_dirs=None) that:

  • Searches recursively through root_dir to find Python files
  • Optionally filters files using glob patterns (e.g., "test_*.py", "*_helper.py")
  • Excludes specified directories (e.g., [".git", "__pycache__", "venv"])
  • Returns a list of file paths

2. Content Search

Create a function search_in_files(file_paths, pattern, use_regex=False) that:

  • Searches for a text pattern in the given list of file paths
  • Supports both plain text and regex pattern matching based on the use_regex flag
  • Returns a dictionary mapping file paths to lists of matching line numbers
  • Handles files that cannot be read (e.g., binary files, encoding issues) gracefully

3. File Statistics

Create a function generate_file_stats(file_paths) that:

  • Calculates the total number of files
  • Calculates the total size in bytes across all files
  • Finds the largest file (by size)
  • Returns a dictionary with keys: "total_files", "total_size", "largest_file"

4. Report Generation

Create a function create_report(stats, matches) that:

  • Takes file statistics and search matches as input
  • Returns a formatted string report containing:
    • Summary of files found
    • Total size information
    • List of files containing matches (if any)
    • Total count of matches

Implementation Notes

  • Use appropriate file system operations to handle paths across different operating systems
  • Ensure the tool works efficiently even with large directory structures
  • Handle edge cases like empty directories, permission errors, and symbolic links

Dependencies { .dependencies }

fastcore { .dependency }

Provides enhanced file and path operations for efficient directory traversal and file searching.

Test Cases

Test 1: Basic File Discovery @test

Setup: Create a test directory structure:

test_project/
├── main.py
├── utils/
│   ├── helper.py
│   └── test_helper.py
└── __pycache__/
    └── main.cpython-39.pyc

Test:

# test_file_search.py
from file_search_tool import find_python_files

def test_basic_file_discovery():
    files = find_python_files("test_project", exclude_dirs=["__pycache__"])
    assert len(files) == 3
    assert any("main.py" in str(f) for f in files)
    assert any("helper.py" in str(f) for f in files)
    assert any("test_helper.py" in str(f) for f in files)

Test 2: Pattern Filtering @test

Test:

# test_file_search.py
from file_search_tool import find_python_files

def test_pattern_filtering():
    files = find_python_files("test_project", file_pattern="test_*.py", exclude_dirs=["__pycache__"])
    assert len(files) == 1
    assert any("test_helper.py" in str(f) for f in files)

Test 3: Content Search @test

Setup: Assume test_project/main.py contains:

def calculate(x, y):
    return x + y

result = calculate(5, 3)

Test:

# test_file_search.py
from file_search_tool import search_in_files, find_python_files

def test_content_search():
    files = find_python_files("test_project", exclude_dirs=["__pycache__"])
    matches = search_in_files(files, "calculate")
    assert len(matches) > 0
    # At least one file should contain "calculate"
    assert any(len(line_nums) > 0 for line_nums in matches.values())

Test 4: File Statistics @test

Test:

# test_file_search.py
from file_search_tool import find_python_files, generate_file_stats

def test_file_statistics():
    files = find_python_files("test_project", exclude_dirs=["__pycache__"])
    stats = generate_file_stats(files)
    assert stats["total_files"] == 3
    assert stats["total_size"] > 0
    assert "largest_file" in stats

Deliverables

  1. A Python module file_search_tool.py containing all required functions
  2. A test file test_file_search.py with the test cases above
  3. All tests should pass when run with pytest

Install with Tessl CLI

npx tessl i tessl/pypi-fastcore

tile.json