CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-chdb

chDB is an in-process SQL OLAP Engine powered by ClickHouse

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

chDB

chDB is an in-process SQL OLAP (Online Analytical Processing) engine that brings the power of ClickHouse directly into Python applications without requiring a separate ClickHouse installation. It provides high-performance analytical queries on various data formats including Parquet, CSV, JSON, Arrow, and ORC, with support for 60+ input/output formats.

Package Information

  • Package Name: chdb
  • Package Type: Python package
  • Language: Python with C++ extensions
  • Installation: pip install chdb

Core Imports

import chdb

For DB API 2.0 compatibility:

import chdb.dbapi as dbapi

For stateful sessions:

from chdb import session

For DataFrame integration:

import chdb.dataframe as cdf

For user-defined functions:

from chdb.udf import chdb_udf

Basic Usage

Python API

import chdb

# Execute SQL query with default CSV output
result = chdb.query('SELECT version()')
print(result)

# Query with different output formats
df_result = chdb.query('SELECT 1 as id, "hello" as message', 'DataFrame') 
json_result = chdb.query('SELECT 1 as id, "hello" as message', 'JSON')

# Query data from files
parquet_data = chdb.query('SELECT * FROM file("data.parquet", Parquet)', 'DataFrame')
csv_data = chdb.query('SELECT * FROM file("data.csv", CSV)', 'CSV')

# Using alias function
result = chdb.sql('SELECT COUNT(*) FROM file("data.parquet", Parquet)')

Command Line Interface

# Execute SQL from command line with default CSV output
python -m chdb "SELECT version()"

# Specify output format
python -m chdb "SELECT 1 as id, 'hello' as message" Pretty
python -m chdb "SELECT 1 as id, 'hello' as message" JSON

# Query files
python -m chdb "SELECT COUNT(*) FROM file('data.parquet', Parquet)" CSV

Architecture

chDB provides multiple interfaces for different use cases:

  • Core Query Engine: Direct SQL execution with chdb.query() and chdb.sql()
  • DB API 2.0: Standard Python database interface for connection-based workflows
  • Stateful Sessions: Persistent database state for DDL operations and temporary tables
  • DataFrame Integration: Seamless integration with pandas DataFrames and PyArrow Tables
  • User-Defined Functions: Python UDF support with decorator-based registration
  • Memory Optimization: Minimal data copying between C++ and Python using memory views

Capabilities

Core Query Functions

Direct SQL execution with various output formats including CSV, JSON, DataFrame, and Arrow formats. Supports querying files, in-memory data, and complex analytical operations.

def query(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...
def sql(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...

Core Query Functions

DB API 2.0 Interface

Python Database API 2.0 compliant interface providing connection objects, cursors, and standard database operations for integration with existing database workflows.

def connect(*args, **kwargs): ...
class Connection:
    def cursor(self, cursor=None): ...
    def commit(self): ...
    def rollback(self): ...

DB API 2.0 Interface

Stateful Sessions

Persistent database sessions that maintain state across queries, enabling DDL operations, temporary tables, views, and multi-query workflows.

class Session:
    def __init__(self, path: str = None): ...
    def query(self, sql: str, fmt: str = "CSV", udf_path: str = ""): ...

Stateful Sessions

DataFrame Integration

Integration with pandas DataFrames and PyArrow Tables, enabling SQL queries on DataFrames and seamless data conversion between formats.

class Table:
    def __init__(self, dataframe=None, arrow_table=None, parquet_path=None, ...): ...
    def to_pandas(self) -> pd.DataFrame: ...

DataFrame Integration

User-Defined Functions

Python UDF support using decorators to register custom functions for use in SQL queries, with automatic type handling and execution management.

def chdb_udf(return_type: str = "String"): ...
def generate_udf(func_name: str, args: list, return_type: str, udf_body: str): ...

User-Defined Functions

Utility Functions

Data type utilities for converting between formats, inferring data types, and handling nested data structures in analytical workflows.

def flatten_dict(d: dict, parent_key: str = "", sep: str = "_") -> dict: ...
def convert_to_columnar(items: list) -> dict: ...
def infer_data_type(values: list) -> str: ...

Utility Functions

Exception Types

class ChdbError(Exception):
    """Base class for exceptions in chDB."""

Core Types

# Version information
chdb_version: tuple  # Version tuple (major, minor, patch)
engine_version: str  # ClickHouse engine version string
__version__: str     # Package version string

# Binary extension objects
PyReader: type  # C++ binding reader class

Install with Tessl CLI

npx tessl i tessl/pypi-chdb
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/chdb@2.1.x
Publish Source
CLI
Badge
tessl/pypi-chdb badge