Ctrl + k

or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/connectorx@0.4.x
tile.json

tessl/pypi-connectorx

tessl install tessl/pypi-connectorx@0.4.0

Load data from databases to dataframes, the fastest way.

Agent Success

Agent success rate when using this tile

86%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.05x

Baseline

Agent success rate without this tile

82%

task.mdevals/scenario-7/

Large Dataset Batch Processor

Build a Python module that processes large database query results in batches to compute aggregate statistics without loading all data into memory at once.

Requirements

Create a function process_large_query that takes a database connection string and a SQL query, processes the results in batches, and returns aggregated statistics.

Function Signature

def process_large_query(connection_string: str, query: str, batch_size: int = 5000) -> dict

Inputs

  • connection_string: Database connection string (e.g., postgresql://user:pass@host:port/db)
  • query: SQL query to execute (e.g., SELECT order_id, amount, region FROM orders)
  • batch_size: Number of rows to process in each batch (default: 5000)

Processing Logic

The function must:

  1. Execute the SQL query against the database
  2. Process results in batches using streaming/iterative approach
  3. For each batch, calculate subtotals by region (sum of amount field, count of rows)
  4. Combine all batch results into final aggregates
  5. Return a dictionary with region as key and tuple of (total_amount, order_count) as value

Output Format

{
    "North": (1500000.50, 15000),
    "South": (2300000.75, 23000),
    "East": (1800000.25, 18000),
    "West": (2100000.00, 21000)
}

Constraints

  • Must use batch/streaming processing - cannot load entire result set into memory
  • Must work with result sets of arbitrary size (millions of rows)
  • Batch size must be configurable
  • Must handle iterating through all batches until completion

Implementation

@generates

Tests

  • Given a query returning 10,000 rows across 4 regions with batch_size=2000, the function returns correct aggregated totals for each region @test
  • Given a query returning 5,500 rows with batch_size=2000, the function processes 3 batches (2000 + 2000 + 1500 rows) and returns correct aggregates @test
  • The function processes results in batches without loading the entire result set into memory @test

Dependencies { .dependencies }

connectorx { .dependency }

A fast database connector for loading data into Python with support for streaming and batch processing.