0
# chDB
1
2
chDB is an in-process SQL OLAP (Online Analytical Processing) engine that brings the power of ClickHouse directly into Python applications without requiring a separate ClickHouse installation. It provides high-performance analytical queries on various data formats including Parquet, CSV, JSON, Arrow, and ORC, with support for 60+ input/output formats.
3
4
## Package Information
5
6
- **Package Name**: chdb
7
- **Package Type**: Python package
8
- **Language**: Python with C++ extensions
9
- **Installation**: `pip install chdb`
10
11
## Core Imports
12
13
```python
14
import chdb
15
```
16
17
For DB API 2.0 compatibility:
18
19
```python
20
import chdb.dbapi as dbapi
21
```
22
23
For stateful sessions:
24
25
```python
26
from chdb import session
27
```
28
29
For DataFrame integration:
30
31
```python
32
import chdb.dataframe as cdf
33
```
34
35
For user-defined functions:
36
37
```python
38
from chdb.udf import chdb_udf
39
```
40
41
## Basic Usage
42
43
### Python API
44
45
```python
46
import chdb
47
48
# Execute SQL query with default CSV output
49
result = chdb.query('SELECT version()')
50
print(result)
51
52
# Query with different output formats
53
df_result = chdb.query('SELECT 1 as id, "hello" as message', 'DataFrame')
54
json_result = chdb.query('SELECT 1 as id, "hello" as message', 'JSON')
55
56
# Query data from files
57
parquet_data = chdb.query('SELECT * FROM file("data.parquet", Parquet)', 'DataFrame')
58
csv_data = chdb.query('SELECT * FROM file("data.csv", CSV)', 'CSV')
59
60
# Using alias function
61
result = chdb.sql('SELECT COUNT(*) FROM file("data.parquet", Parquet)')
62
```
63
64
### Command Line Interface
65
66
```bash
67
# Execute SQL from command line with default CSV output
68
python -m chdb "SELECT version()"
69
70
# Specify output format
71
python -m chdb "SELECT 1 as id, 'hello' as message" Pretty
72
python -m chdb "SELECT 1 as id, 'hello' as message" JSON
73
74
# Query files
75
python -m chdb "SELECT COUNT(*) FROM file('data.parquet', Parquet)" CSV
76
```
77
78
## Architecture
79
80
chDB provides multiple interfaces for different use cases:
81
82
- **Core Query Engine**: Direct SQL execution with `chdb.query()` and `chdb.sql()`
83
- **DB API 2.0**: Standard Python database interface for connection-based workflows
84
- **Stateful Sessions**: Persistent database state for DDL operations and temporary tables
85
- **DataFrame Integration**: Seamless integration with pandas DataFrames and PyArrow Tables
86
- **User-Defined Functions**: Python UDF support with decorator-based registration
87
- **Memory Optimization**: Minimal data copying between C++ and Python using memory views
88
89
## Capabilities
90
91
### Core Query Functions
92
93
Direct SQL execution with various output formats including CSV, JSON, DataFrame, and Arrow formats. Supports querying files, in-memory data, and complex analytical operations.
94
95
```python { .api }
96
def query(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...
97
def sql(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...
98
```
99
100
[Core Query Functions](./query-functions.md)
101
102
### DB API 2.0 Interface
103
104
Python Database API 2.0 compliant interface providing connection objects, cursors, and standard database operations for integration with existing database workflows.
105
106
```python { .api }
107
def connect(*args, **kwargs): ...
108
class Connection:
109
def cursor(self, cursor=None): ...
110
def commit(self): ...
111
def rollback(self): ...
112
```
113
114
[DB API 2.0 Interface](./dbapi.md)
115
116
### Stateful Sessions
117
118
Persistent database sessions that maintain state across queries, enabling DDL operations, temporary tables, views, and multi-query workflows.
119
120
```python { .api }
121
class Session:
122
def __init__(self, path: str = None): ...
123
def query(self, sql: str, fmt: str = "CSV", udf_path: str = ""): ...
124
```
125
126
[Stateful Sessions](./sessions.md)
127
128
### DataFrame Integration
129
130
Integration with pandas DataFrames and PyArrow Tables, enabling SQL queries on DataFrames and seamless data conversion between formats.
131
132
```python { .api }
133
class Table:
134
def __init__(self, dataframe=None, arrow_table=None, parquet_path=None, ...): ...
135
def to_pandas(self) -> pd.DataFrame: ...
136
```
137
138
[DataFrame Integration](./dataframe.md)
139
140
### User-Defined Functions
141
142
Python UDF support using decorators to register custom functions for use in SQL queries, with automatic type handling and execution management.
143
144
```python { .api }
145
def chdb_udf(return_type: str = "String"): ...
146
def generate_udf(func_name: str, args: list, return_type: str, udf_body: str): ...
147
```
148
149
[User-Defined Functions](./udf.md)
150
151
### Utility Functions
152
153
Data type utilities for converting between formats, inferring data types, and handling nested data structures in analytical workflows.
154
155
```python { .api }
156
def flatten_dict(d: dict, parent_key: str = "", sep: str = "_") -> dict: ...
157
def convert_to_columnar(items: list) -> dict: ...
158
def infer_data_type(values: list) -> str: ...
159
```
160
161
[Utility Functions](./utils.md)
162
163
## Exception Types
164
165
```python { .api }
166
class ChdbError(Exception):
167
"""Base class for exceptions in chDB."""
168
```
169
170
## Core Types
171
172
```python { .api }
173
# Version information
174
chdb_version: tuple # Version tuple (major, minor, patch)
175
engine_version: str # ClickHouse engine version string
176
__version__: str # Package version string
177
178
# Binary extension objects
179
PyReader: type # C++ binding reader class
180
```