or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

dataframe.mddbapi.mdindex.mdquery-functions.mdsessions.mdudf.mdutils.md

index.mddocs/

0

# chDB

1

2

chDB is an in-process SQL OLAP (Online Analytical Processing) engine that brings the power of ClickHouse directly into Python applications without requiring a separate ClickHouse installation. It provides high-performance analytical queries on various data formats including Parquet, CSV, JSON, Arrow, and ORC, with support for 60+ input/output formats.

3

4

## Package Information

5

6

- **Package Name**: chdb

7

- **Package Type**: Python package

8

- **Language**: Python with C++ extensions

9

- **Installation**: `pip install chdb`

10

11

## Core Imports

12

13

```python

14

import chdb

15

```

16

17

For DB API 2.0 compatibility:

18

19

```python

20

import chdb.dbapi as dbapi

21

```

22

23

For stateful sessions:

24

25

```python

26

from chdb import session

27

```

28

29

For DataFrame integration:

30

31

```python

32

import chdb.dataframe as cdf

33

```

34

35

For user-defined functions:

36

37

```python

38

from chdb.udf import chdb_udf

39

```

40

41

## Basic Usage

42

43

### Python API

44

45

```python

46

import chdb

47

48

# Execute SQL query with default CSV output

49

result = chdb.query('SELECT version()')

50

print(result)

51

52

# Query with different output formats

53

df_result = chdb.query('SELECT 1 as id, "hello" as message', 'DataFrame')

54

json_result = chdb.query('SELECT 1 as id, "hello" as message', 'JSON')

55

56

# Query data from files

57

parquet_data = chdb.query('SELECT * FROM file("data.parquet", Parquet)', 'DataFrame')

58

csv_data = chdb.query('SELECT * FROM file("data.csv", CSV)', 'CSV')

59

60

# Using alias function

61

result = chdb.sql('SELECT COUNT(*) FROM file("data.parquet", Parquet)')

62

```

63

64

### Command Line Interface

65

66

```bash

67

# Execute SQL from command line with default CSV output

68

python -m chdb "SELECT version()"

69

70

# Specify output format

71

python -m chdb "SELECT 1 as id, 'hello' as message" Pretty

72

python -m chdb "SELECT 1 as id, 'hello' as message" JSON

73

74

# Query files

75

python -m chdb "SELECT COUNT(*) FROM file('data.parquet', Parquet)" CSV

76

```

77

78

## Architecture

79

80

chDB provides multiple interfaces for different use cases:

81

82

- **Core Query Engine**: Direct SQL execution with `chdb.query()` and `chdb.sql()`

83

- **DB API 2.0**: Standard Python database interface for connection-based workflows

84

- **Stateful Sessions**: Persistent database state for DDL operations and temporary tables

85

- **DataFrame Integration**: Seamless integration with pandas DataFrames and PyArrow Tables

86

- **User-Defined Functions**: Python UDF support with decorator-based registration

87

- **Memory Optimization**: Minimal data copying between C++ and Python using memory views

88

89

## Capabilities

90

91

### Core Query Functions

92

93

Direct SQL execution with various output formats including CSV, JSON, DataFrame, and Arrow formats. Supports querying files, in-memory data, and complex analytical operations.

94

95

```python { .api }

96

def query(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...

97

def sql(sql: str, output_format: str = "CSV", path: str = "", udf_path: str = ""): ...

98

```

99

100

[Core Query Functions](./query-functions.md)

101

102

### DB API 2.0 Interface

103

104

Python Database API 2.0 compliant interface providing connection objects, cursors, and standard database operations for integration with existing database workflows.

105

106

```python { .api }

107

def connect(*args, **kwargs): ...

108

class Connection:

109

def cursor(self, cursor=None): ...

110

def commit(self): ...

111

def rollback(self): ...

112

```

113

114

[DB API 2.0 Interface](./dbapi.md)

115

116

### Stateful Sessions

117

118

Persistent database sessions that maintain state across queries, enabling DDL operations, temporary tables, views, and multi-query workflows.

119

120

```python { .api }

121

class Session:

122

def __init__(self, path: str = None): ...

123

def query(self, sql: str, fmt: str = "CSV", udf_path: str = ""): ...

124

```

125

126

[Stateful Sessions](./sessions.md)

127

128

### DataFrame Integration

129

130

Integration with pandas DataFrames and PyArrow Tables, enabling SQL queries on DataFrames and seamless data conversion between formats.

131

132

```python { .api }

133

class Table:

134

def __init__(self, dataframe=None, arrow_table=None, parquet_path=None, ...): ...

135

def to_pandas(self) -> pd.DataFrame: ...

136

```

137

138

[DataFrame Integration](./dataframe.md)

139

140

### User-Defined Functions

141

142

Python UDF support using decorators to register custom functions for use in SQL queries, with automatic type handling and execution management.

143

144

```python { .api }

145

def chdb_udf(return_type: str = "String"): ...

146

def generate_udf(func_name: str, args: list, return_type: str, udf_body: str): ...

147

```

148

149

[User-Defined Functions](./udf.md)

150

151

### Utility Functions

152

153

Data type utilities for converting between formats, inferring data types, and handling nested data structures in analytical workflows.

154

155

```python { .api }

156

def flatten_dict(d: dict, parent_key: str = "", sep: str = "_") -> dict: ...

157

def convert_to_columnar(items: list) -> dict: ...

158

def infer_data_type(values: list) -> str: ...

159

```

160

161

[Utility Functions](./utils.md)

162

163

## Exception Types

164

165

```python { .api }

166

class ChdbError(Exception):

167

"""Base class for exceptions in chDB."""

168

```

169

170

## Core Types

171

172

```python { .api }

173

# Version information

174

chdb_version: tuple # Version tuple (major, minor, patch)

175

engine_version: str # ClickHouse engine version string

176

__version__: str # Package version string

177

178

# Binary extension objects

179

PyReader: type # C++ binding reader class

180

```