0
# ClickHouse Connect
1
2
A high-performance Python database driver for connecting ClickHouse databases to Python applications, with specialized support for data science workflows through Pandas DataFrames, NumPy arrays, and PyArrow tables. Features comprehensive integration with Apache Superset for business intelligence and includes a minimal SQLAlchemy implementation for compatibility with SQL-based tools and frameworks.
3
4
## Package Information
5
6
- **Package Name**: clickhouse-connect
7
- **Language**: Python
8
- **Installation**: `pip install clickhouse-connect`
9
- **Optional Dependencies**:
10
- `pip install clickhouse-connect[sqlalchemy]` for SQLAlchemy support
11
- `pip install clickhouse-connect[numpy]` for NumPy support
12
- `pip install clickhouse-connect[pandas]` for Pandas support
13
- `pip install clickhouse-connect[arrow]` for PyArrow support
14
- `pip install clickhouse-connect[orjson]` for faster JSON processing
15
16
## Core Imports
17
18
```python
19
import clickhouse_connect
20
```
21
22
Standard client creation:
23
24
```python
25
from clickhouse_connect import create_client, create_async_client
26
```
27
28
Legacy aliases (deprecated):
29
30
```python
31
from clickhouse_connect import get_client, get_async_client
32
```
33
34
## Basic Usage
35
36
```python
37
import clickhouse_connect
38
39
# Create a client
40
client = clickhouse_connect.create_client(
41
host='localhost',
42
port=8123,
43
username='default',
44
password='',
45
database='default'
46
)
47
48
# Execute a simple query
49
result = client.query('SELECT version()')
50
print(result.result_set[0][0])
51
52
# Query with parameters
53
result = client.query(
54
'SELECT * FROM system.tables WHERE database = {db:String}',
55
parameters={'db': 'system'}
56
)
57
58
# Iterate through results
59
for row in result.result_set:
60
print(row)
61
62
# Insert data
63
data = [
64
['John', 25, 'Engineer'],
65
['Jane', 30, 'Manager'],
66
['Bob', 35, 'Developer']
67
]
68
69
client.insert(
70
'users',
71
data,
72
column_names=['name', 'age', 'position']
73
)
74
75
# Work with pandas DataFrames (requires pandas)
76
import pandas as pd
77
df = pd.DataFrame({
78
'name': ['Alice', 'Bob'],
79
'age': [25, 30],
80
'city': ['NYC', 'LA']
81
})
82
83
# Insert DataFrame
84
client.insert_df('users_df', df)
85
86
# Query to DataFrame
87
df_result = client.query_df('SELECT * FROM users_df')
88
print(df_result)
89
90
# Close the connection
91
client.close()
92
```
93
94
## Architecture
95
96
ClickHouse Connect provides multiple API layers for different use cases:
97
98
- **High-level Client API**: Primary interface with automatic connection management, query optimization, and data format conversion
99
- **DBAPI 2.0 Interface**: Standard Python database API for compatibility with existing tools
100
- **SQLAlchemy Integration**: Dialect for ORM and SQL toolkit support
101
- **Async Support**: Full async/await compatibility using ThreadPoolExecutor
102
- **Data Format Support**: Native integration with NumPy, Pandas, and PyArrow for scientific computing workflows
103
104
The driver utilizes ClickHouse's HTTP interface for maximum compatibility and offers both synchronous and asynchronous operation modes, built-in support for various data compression formats (zstandard, lz4), and optional Cython-compiled extensions for enhanced performance.
105
106
## Capabilities
107
108
### Client Creation and Connection
109
110
Factory functions for creating synchronous and asynchronous client instances with comprehensive connection configuration options including authentication, security, compression, and connection pooling.
111
112
```python { .api }
113
def create_client(
114
host: str | None = None,
115
username: str | None = None,
116
password: str = '',
117
access_token: str | None = None,
118
database: str = '__default__',
119
interface: str | None = None,
120
port: int = 0,
121
secure: bool | str = False,
122
dsn: str | None = None,
123
settings: dict[str, Any] | None = None,
124
generic_args: dict[str, Any] | None = None,
125
compress: bool | str = False,
126
query_limit: int = 0,
127
connect_timeout: int = 10,
128
send_receive_timeout: int = 300,
129
client_name: str | None = None,
130
verify: bool | str = True,
131
ca_cert: str | None = None,
132
client_cert: str | None = None,
133
client_cert_key: str | None = None,
134
session_id: str | None = None,
135
pool_mgr: Any | None = None,
136
http_proxy: str | None = None,
137
https_proxy: str | None = None,
138
server_host_name: str | None = None,
139
apply_server_timezone: str | bool | None = None,
140
show_clickhouse_errors: bool | None = None,
141
autogenerate_session_id: bool | None = None,
142
**kwargs
143
) -> Client: ...
144
145
def create_async_client(
146
host: str | None = None,
147
username: str | None = None,
148
password: str = '',
149
access_token: str | None = None,
150
database: str = '__default__',
151
interface: str | None = None,
152
port: int = 0,
153
secure: bool | str = False,
154
dsn: str | None = None,
155
settings: dict[str, Any] | None = None,
156
generic_args: dict[str, Any] | None = None,
157
executor_threads: int | None = None,
158
compress: bool | str = False,
159
query_limit: int = 0,
160
connect_timeout: int = 10,
161
send_receive_timeout: int = 300,
162
client_name: str | None = None,
163
verify: bool | str = True,
164
ca_cert: str | None = None,
165
client_cert: str | None = None,
166
client_cert_key: str | None = None,
167
session_id: str | None = None,
168
pool_mgr: Any | None = None,
169
http_proxy: str | None = None,
170
https_proxy: str | None = None,
171
server_host_name: str | None = None,
172
apply_server_timezone: str | bool | None = None,
173
show_clickhouse_errors: bool | None = None,
174
autogenerate_session_id: bool | None = None,
175
**kwargs
176
) -> AsyncClient: ...
177
178
# Legacy aliases (deprecated but still available)
179
get_client = create_client
180
get_async_client = create_async_client
181
```
182
183
[Client API](./client-api.md)
184
185
### Data Format Integration
186
187
Seamless integration with scientific Python ecosystem including NumPy arrays, Pandas DataFrames, and PyArrow tables for high-performance data processing and analysis workflows.
188
189
```python { .api }
190
def query_df(self, query: str, **kwargs) -> pd.DataFrame: ...
191
def query_np(self, query: str, **kwargs) -> np.ndarray: ...
192
def query_arrow(self, query: str, **kwargs) -> pa.Table: ...
193
def insert_df(self, table: str, df: pd.DataFrame, **kwargs): ...
194
def insert_arrow(self, table: str, arrow_table: pa.Table, **kwargs): ...
195
```
196
197
[Data Formats](./data-formats.md)
198
199
### DB-API 2.0 Interface
200
201
Standard Python Database API 2.0 implementation providing Connection and Cursor objects for compatibility with existing database tools and frameworks.
202
203
```python { .api }
204
def connect(
205
host: str | None = None,
206
database: str | None = None,
207
username: str | None = '',
208
password: str | None = '',
209
port: int | None = None,
210
**kwargs
211
) -> Connection: ...
212
213
class Connection:
214
def cursor(self) -> Cursor: ...
215
def close(self): ...
216
def commit(self): ...
217
def rollback(self): ...
218
219
class Cursor:
220
def execute(self, operation: str, parameters=None): ...
221
def fetchone(self) -> Sequence | None: ...
222
def fetchmany(self, size: int = None) -> Sequence[Sequence]: ...
223
def fetchall(self) -> Sequence[Sequence]: ...
224
```
225
226
[DB-API Interface](./dbapi.md)
227
228
### SQLAlchemy Integration
229
230
Complete SQLAlchemy dialect implementation enabling ORM capabilities, query building, and integration with SQL-based tools and frameworks.
231
232
```python { .api }
233
class ClickHouseDialect:
234
name = 'clickhousedb'
235
# SQLAlchemy dialect implementation
236
```
237
238
[SQLAlchemy Integration](./sqlalchemy.md)
239
240
### Exception Handling
241
242
Comprehensive exception hierarchy providing detailed error information and compatibility with DB-API 2.0 exception model for robust error handling.
243
244
```python { .api }
245
class ClickHouseError(Exception): ...
246
class DatabaseError(Error): ...
247
class OperationalError(DatabaseError): ...
248
class ProgrammingError(DatabaseError): ...
249
class StreamClosedError(ClickHouseError): ...
250
```
251
252
[Exception Handling](./exceptions.md)
253
254
### Utilities and Tools
255
256
Development and testing utilities including data generation tools, external data support, and configuration management for enhanced developer experience.
257
258
```python { .api }
259
def random_col_data(ch_type: str, size: int, nullable: bool = False) -> list: ...
260
def version() -> str: ...
261
def get_setting(name: str) -> Any: ...
262
def set_setting(name: str, value: Any) -> None: ...
263
class ExternalData: ...
264
class TableContext: ...
265
```
266
267
[Utilities](./utilities.md)
268
269
### Data Models and Return Types
270
271
Core data structures returned by query and insert operations, providing structured access to results, metadata, and execution summaries.
272
273
```python { .api }
274
class QueryResult:
275
result_set: List[List[Any]]
276
column_names: Tuple[str, ...]
277
column_types: Tuple[ClickHouseType, ...]
278
summary: Dict[str, Any]
279
def named_results(self) -> List[Dict[str, Any]]: ...
280
@property
281
def first_row(self) -> List[Any] | None: ...
282
@property
283
def first_item(self) -> Any | None: ...
284
285
class QuerySummary:
286
query_id: str
287
summary: Dict[str, Any]
288
def as_query_result(self) -> QueryResult: ...
289
290
class StreamContext:
291
def column_block_stream(self) -> Iterator[List[List[Any]]]: ...
292
def row_block_stream(self) -> Iterator[List[List[Any]]]: ...
293
def rows_stream(self) -> Iterator[List[Any]]: ...
294
def np_stream(self) -> Iterator[np.ndarray]: ...
295
def df_stream(self) -> Iterator[pd.DataFrame]: ...
296
```
297
298
## Common Configuration
299
300
### Connection Settings
301
302
```python
303
# Basic connection
304
client = create_client(host='clickhouse.example.com')
305
306
# With authentication
307
client = create_client(
308
host='clickhouse.example.com',
309
username='myuser',
310
password='mypassword',
311
database='analytics'
312
)
313
314
# Secure connection
315
client = create_client(
316
host='clickhouse.example.com',
317
secure=True, # Use HTTPS
318
verify=True, # Verify SSL certificate
319
ca_cert='/path/to/ca.pem'
320
)
321
322
# With compression
323
client = create_client(
324
host='clickhouse.example.com',
325
compress='lz4' # or 'zstd', 'gzip', 'brotli'
326
)
327
```
328
329
### ClickHouse Settings
330
331
```python
332
# Set global settings
333
client = create_client(
334
host='clickhouse.example.com',
335
settings={'max_threads': 4, 'max_memory_usage': '4G'}
336
)
337
338
# Runtime settings
339
client.set_client_setting('max_block_size', 65536)
340
setting_value = client.get_client_setting('max_block_size')
341
```
342
343
### Session Management
344
345
```python
346
# Auto-generate session IDs
347
client = create_client(
348
host='clickhouse.example.com',
349
autogenerate_session_id=True
350
)
351
352
# Manual session ID
353
client = create_client(
354
host='clickhouse.example.com',
355
session_id='my-session-123'
356
)
357
```