chDB is an in-process SQL OLAP Engine powered by ClickHouse
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
chDB is an in-process SQL OLAP (Online Analytical Processing) engine that brings the power of ClickHouse directly into Python applications without requiring a separate ClickHouse installation. It provides high-performance analytical queries on various data formats including Parquet, CSV, JSON, Arrow, and ORC, with support for 60+ input/output formats.
pip install chdbimport chdbFor DB API 2.0 compatibility:
import chdb.dbapi as dbapiFor stateful sessions:
from chdb import sessionFor DataFrame integration:
import chdb.dataframe as cdfFor user-defined functions:
from chdb.udf import chdb_udfimport chdb
# Execute SQL query with default CSV output
result = chdb.query('SELECT version()')
print(result)
# Query with different output formats
df_result = chdb.query('SELECT 1 as id, "hello" as message', 'DataFrame')
json_result = chdb.query('SELECT 1 as id, "hello" as message', 'JSON')
# Query data from files
parquet_data = chdb.query('SELECT * FROM file("data.parquet", Parquet)', 'DataFrame')
csv_data = chdb.query('SELECT * FROM file("data.csv", CSV)', 'CSV')
# Using alias function
result = chdb.sql('SELECT COUNT(*) FROM file("data.parquet", Parquet)')# Execute SQL from command line with default CSV output
python -m chdb "SELECT version()"
# Specify output format
python -m chdb "SELECT 1 as id, 'hello' as message" Pretty
python -m chdb "SELECT 1 as id, 'hello' as message" JSON
# Query files
python -m chdb "SELECT COUNT(*) FROM file('data.parquet', Parquet)" CSVchDB provides multiple interfaces for different use cases:
chdb.query() and chdb.sql()Direct SQL execution with various output formats including CSV, JSON, DataFrame, and Arrow formats. Supports querying files, in-memory data, and complex analytical operations.
def query(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...
def sql(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...Python Database API 2.0 compliant interface providing connection objects, cursors, and standard database operations for integration with existing database workflows.
def connect(*args, **kwargs): ...
class Connection:
def cursor(self, cursor=None): ...
def commit(self): ...
def rollback(self): ...Persistent database sessions that maintain state across queries, enabling DDL operations, temporary tables, views, and multi-query workflows.
class Session:
def __init__(self, path: str = None): ...
def query(self, sql: str, fmt: str = "CSV", udf_path: str = ""): ...Integration with pandas DataFrames and PyArrow Tables, enabling SQL queries on DataFrames and seamless data conversion between formats.
class Table:
def __init__(self, dataframe=None, arrow_table=None, parquet_path=None, ...): ...
def to_pandas(self) -> pd.DataFrame: ...Python UDF support using decorators to register custom functions for use in SQL queries, with automatic type handling and execution management.
def chdb_udf(return_type: str = "String"): ...
def generate_udf(func_name: str, args: list, return_type: str, udf_body: str): ...Data type utilities for converting between formats, inferring data types, and handling nested data structures in analytical workflows.
def flatten_dict(d: dict, parent_key: str = "", sep: str = "_") -> dict: ...
def convert_to_columnar(items: list) -> dict: ...
def infer_data_type(values: list) -> str: ...class ChdbError(Exception):
"""Base class for exceptions in chDB."""# Version information
chdb_version: tuple # Version tuple (major, minor, patch)
engine_version: str # ClickHouse engine version string
__version__: str # Package version string
# Binary extension objects
PyReader: type # C++ binding reader classInstall with Tessl CLI
npx tessl i tessl/pypi-chdb