CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-chdb

chDB is an in-process SQL OLAP Engine powered by ClickHouse

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

query-functions.mddocs/

Core Query Functions

Direct SQL execution functions that form the foundation of chDB's query capabilities. These functions provide immediate SQL execution with various output formats and minimal setup requirements.

Capabilities

Main Query Function

Executes SQL queries with configurable output formats, supporting both file-based queries and in-memory operations.

def query(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""):
    """
    Execute SQL query with specified output format.
    
    Parameters:
    - sql: SQL query string to execute
    - output_format: Output format ("CSV", "JSON", "DataFrame", "ArrowTable", "Arrow", "Pretty", etc.)
    - path: Optional database path for stateful operations
    - udf_path: Optional path to user-defined function configurations
    
    Returns:
    Query result in specified format, or formatted string for text formats
    
    Raises:
    ChdbError: If query execution fails or has syntax errors
    """

SQL Alias Function

Convenience alias for the main query function with identical functionality.

def sql(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""):
    """
    Alias for query() function with identical parameters and behavior.
    """

Result Conversion Functions

Convert query results between different data formats for flexible data processing workflows.

def to_df(result):
    """
    Convert query result to pandas DataFrame.
    
    Parameters:
    - result: Query result object from chdb.query()
    
    Returns:
    pandas.DataFrame: Converted DataFrame
    
    Raises:
    ImportError: If pandas or pyarrow not installed
    """

def to_arrowTable(result):
    """
    Convert query result to PyArrow Table.
    
    Parameters:
    - result: Query result object from chdb.query()
    
    Returns:
    pyarrow.Table: Converted Arrow Table
    
    Raises:
    ImportError: If pyarrow not installed
    """

Usage Examples

Basic Queries

import chdb

# Simple query with default CSV output
result = chdb.query("SELECT 1 as id, 'hello' as message")
print(result)  # Outputs CSV format

# JSON output
json_result = chdb.query("SELECT 1 as id, 'hello' as message", "JSON")
print(json_result)

# Pretty formatted output
pretty_result = chdb.query("SELECT version()", "Pretty")
print(pretty_result)

File-based Queries

# Query different file formats
parquet_data = chdb.query('SELECT * FROM file("data.parquet", Parquet)', 'DataFrame')
csv_data = chdb.query('SELECT * FROM file("data.csv", CSV)', 'JSON')
json_data = chdb.query('SELECT * FROM file("data.json", JSONEachRow)', 'DataFrame')

# Complex analytical queries
result = chdb.query('''
    SELECT 
        category,
        COUNT(*) as count,
        AVG(price) as avg_price
    FROM file("sales.parquet", Parquet)
    GROUP BY category
    ORDER BY count DESC
''', 'DataFrame')

Working with DataFrames

import pandas as pd

# Get DataFrame directly
df_result = chdb.query('SELECT * FROM file("data.parquet", Parquet)', 'DataFrame')

# Convert result to DataFrame manually
csv_result = chdb.query('SELECT * FROM file("data.csv", CSV)', 'Arrow')
df = chdb.to_df(csv_result)

# Get PyArrow Table
arrow_result = chdb.query('SELECT * FROM file("data.parquet", Parquet)', 'ArrowTable')

Using SQL Alias

# sql() function works identically to query()
result = chdb.sql("SELECT COUNT(*) FROM file('data.parquet', Parquet)")
df_result = chdb.sql("SELECT * FROM file('data.csv', CSV)", "DataFrame")

Error Handling

from chdb import ChdbError

try:
    result = chdb.query("SELECT * FROM nonexistent_table")
except ChdbError as e:
    print(f"Query failed: {e}")

Supported Output Formats

  • CSV: Comma-separated values (default)
  • JSON: JSON format with records
  • JSONEachRow: JSON with one object per line
  • DataFrame: pandas DataFrame (requires pandas and pyarrow)
  • ArrowTable: PyArrow Table (requires pyarrow)
  • Arrow: Arrow format bytes
  • Pretty: Human-readable formatted output
  • TabSeparated: Tab-separated values
  • Parquet: Parquet format bytes
  • ORC: ORC format bytes
  • And 50+ more formats supported by ClickHouse

Supported Input Formats

Files can be queried using the file() function with format specification:

  • Parquet: file("data.parquet", Parquet)
  • CSV: file("data.csv", CSV)
  • JSON: file("data.json", JSONEachRow)
  • Arrow: file("data.arrow", Arrow)
  • ORC: file("data.orc", ORC)
  • TSV: file("data.tsv", TabSeparated)
  • And 60+ more formats supported by ClickHouse

Install with Tessl CLI

npx tessl i tessl/pypi-chdb

docs

dataframe.md

dbapi.md

index.md

query-functions.md

sessions.md

udf.md

utils.md

tile.json