tessl/pypi-chdb

chDB is an in-process SQL OLAP Engine powered by ClickHouse

—

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

—

The risk profile of this skill

Overview

Eval results

Files

chDB

Name: tessl/pypi-chdb
Author: tessl

chDB is an in-process SQL OLAP (Online Analytical Processing) engine that brings the power of ClickHouse directly into Python applications without requiring a separate ClickHouse installation. It provides high-performance analytical queries on various data formats including Parquet, CSV, JSON, Arrow, and ORC, with support for 60+ input/output formats.

Package Information

Package Name: chdb
Package Type: Python package
Language: Python with C++ extensions
Installation: pip install chdb

Core Imports

import chdb

For DB API 2.0 compatibility:

import chdb.dbapi as dbapi

For stateful sessions:

from chdb import session

For DataFrame integration:

import chdb.dataframe as cdf

For user-defined functions:

from chdb.udf import chdb_udf

Basic Usage

Python API

import chdb

# Execute SQL query with default CSV output
result = chdb.query('SELECT version()')
print(result)

# Query with different output formats
df_result = chdb.query('SELECT 1 as id, "hello" as message', 'DataFrame') 
json_result = chdb.query('SELECT 1 as id, "hello" as message', 'JSON')

# Query data from files
parquet_data = chdb.query('SELECT * FROM file("data.parquet", Parquet)', 'DataFrame')
csv_data = chdb.query('SELECT * FROM file("data.csv", CSV)', 'CSV')

# Using alias function
result = chdb.sql('SELECT COUNT(*) FROM file("data.parquet", Parquet)')

Command Line Interface

# Execute SQL from command line with default CSV output
python -m chdb "SELECT version()"

# Specify output format
python -m chdb "SELECT 1 as id, 'hello' as message" Pretty
python -m chdb "SELECT 1 as id, 'hello' as message" JSON

# Query files
python -m chdb "SELECT COUNT(*) FROM file('data.parquet', Parquet)" CSV

Architecture

chDB provides multiple interfaces for different use cases:

Core Query Engine: Direct SQL execution with chdb.query() and chdb.sql()
DB API 2.0: Standard Python database interface for connection-based workflows
Stateful Sessions: Persistent database state for DDL operations and temporary tables
DataFrame Integration: Seamless integration with pandas DataFrames and PyArrow Tables
User-Defined Functions: Python UDF support with decorator-based registration
Memory Optimization: Minimal data copying between C++ and Python using memory views

Capabilities

Core Query Functions

Direct SQL execution with various output formats including CSV, JSON, DataFrame, and Arrow formats. Supports querying files, in-memory data, and complex analytical operations.

def query(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...
def sql(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...

Core Query Functions

DB API 2.0 Interface

Python Database API 2.0 compliant interface providing connection objects, cursors, and standard database operations for integration with existing database workflows.

def connect(*args, **kwargs): ...
class Connection:
    def cursor(self, cursor=None): ...
    def commit(self): ...
    def rollback(self): ...

DB API 2.0 Interface

Stateful Sessions

Persistent database sessions that maintain state across queries, enabling DDL operations, temporary tables, views, and multi-query workflows.

class Session:
    def __init__(self, path: str = None): ...
    def query(self, sql: str, fmt: str = "CSV", udf_path: str = ""): ...

Stateful Sessions

DataFrame Integration

Integration with pandas DataFrames and PyArrow Tables, enabling SQL queries on DataFrames and seamless data conversion between formats.

class Table:
    def __init__(self, dataframe=None, arrow_table=None, parquet_path=None, ...): ...
    def to_pandas(self) -> pd.DataFrame: ...

DataFrame Integration

User-Defined Functions

Python UDF support using decorators to register custom functions for use in SQL queries, with automatic type handling and execution management.

def chdb_udf(return_type: str = "String"): ...
def generate_udf(func_name: str, args: list, return_type: str, udf_body: str): ...

User-Defined Functions

Utility Functions

Data type utilities for converting between formats, inferring data types, and handling nested data structures in analytical workflows.

def flatten_dict(d: dict, parent_key: str = "", sep: str = "_") -> dict: ...
def convert_to_columnar(items: list) -> dict: ...
def infer_data_type(values: list) -> str: ...

Utility Functions

Exception Types

class ChdbError(Exception):
    """Base class for exceptions in chDB."""

Core Types

# Version information
chdb_version: tuple  # Version tuple (major, minor, patch)
engine_version: str  # ClickHouse engine version string
__version__: str     # Package version string

# Binary extension objects
PyReader: type  # C++ binding reader class