or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

client-api.mddata-formats.mddbapi.mdexceptions.mdindex.mdsqlalchemy.mdutilities.md

index.mddocs/

0

# ClickHouse Connect

1

2

A high-performance Python database driver for connecting ClickHouse databases to Python applications, with specialized support for data science workflows through Pandas DataFrames, NumPy arrays, and PyArrow tables. Features comprehensive integration with Apache Superset for business intelligence and includes a minimal SQLAlchemy implementation for compatibility with SQL-based tools and frameworks.

3

4

## Package Information

5

6

- **Package Name**: clickhouse-connect

7

- **Language**: Python

8

- **Installation**: `pip install clickhouse-connect`

9

- **Optional Dependencies**:

10

- `pip install clickhouse-connect[sqlalchemy]` for SQLAlchemy support

11

- `pip install clickhouse-connect[numpy]` for NumPy support

12

- `pip install clickhouse-connect[pandas]` for Pandas support

13

- `pip install clickhouse-connect[arrow]` for PyArrow support

14

- `pip install clickhouse-connect[orjson]` for faster JSON processing

15

16

## Core Imports

17

18

```python

19

import clickhouse_connect

20

```

21

22

Standard client creation:

23

24

```python

25

from clickhouse_connect import create_client, create_async_client

26

```

27

28

Legacy aliases (deprecated):

29

30

```python

31

from clickhouse_connect import get_client, get_async_client

32

```

33

34

## Basic Usage

35

36

```python

37

import clickhouse_connect

38

39

# Create a client

40

client = clickhouse_connect.create_client(

41

host='localhost',

42

port=8123,

43

username='default',

44

password='',

45

database='default'

46

)

47

48

# Execute a simple query

49

result = client.query('SELECT version()')

50

print(result.result_set[0][0])

51

52

# Query with parameters

53

result = client.query(

54

'SELECT * FROM system.tables WHERE database = {db:String}',

55

parameters={'db': 'system'}

56

)

57

58

# Iterate through results

59

for row in result.result_set:

60

print(row)

61

62

# Insert data

63

data = [

64

['John', 25, 'Engineer'],

65

['Jane', 30, 'Manager'],

66

['Bob', 35, 'Developer']

67

]

68

69

client.insert(

70

'users',

71

data,

72

column_names=['name', 'age', 'position']

73

)

74

75

# Work with pandas DataFrames (requires pandas)

76

import pandas as pd

77

df = pd.DataFrame({

78

'name': ['Alice', 'Bob'],

79

'age': [25, 30],

80

'city': ['NYC', 'LA']

81

})

82

83

# Insert DataFrame

84

client.insert_df('users_df', df)

85

86

# Query to DataFrame

87

df_result = client.query_df('SELECT * FROM users_df')

88

print(df_result)

89

90

# Close the connection

91

client.close()

92

```

93

94

## Architecture

95

96

ClickHouse Connect provides multiple API layers for different use cases:

97

98

- **High-level Client API**: Primary interface with automatic connection management, query optimization, and data format conversion

99

- **DBAPI 2.0 Interface**: Standard Python database API for compatibility with existing tools

100

- **SQLAlchemy Integration**: Dialect for ORM and SQL toolkit support

101

- **Async Support**: Full async/await compatibility using ThreadPoolExecutor

102

- **Data Format Support**: Native integration with NumPy, Pandas, and PyArrow for scientific computing workflows

103

104

The driver utilizes ClickHouse's HTTP interface for maximum compatibility and offers both synchronous and asynchronous operation modes, built-in support for various data compression formats (zstandard, lz4), and optional Cython-compiled extensions for enhanced performance.

105

106

## Capabilities

107

108

### Client Creation and Connection

109

110

Factory functions for creating synchronous and asynchronous client instances with comprehensive connection configuration options including authentication, security, compression, and connection pooling.

111

112

```python { .api }

113

def create_client(

114

host: str | None = None,

115

username: str | None = None,

116

password: str = '',

117

access_token: str | None = None,

118

database: str = '__default__',

119

interface: str | None = None,

120

port: int = 0,

121

secure: bool | str = False,

122

dsn: str | None = None,

123

settings: dict[str, Any] | None = None,

124

generic_args: dict[str, Any] | None = None,

125

compress: bool | str = False,

126

query_limit: int = 0,

127

connect_timeout: int = 10,

128

send_receive_timeout: int = 300,

129

client_name: str | None = None,

130

verify: bool | str = True,

131

ca_cert: str | None = None,

132

client_cert: str | None = None,

133

client_cert_key: str | None = None,

134

session_id: str | None = None,

135

pool_mgr: Any | None = None,

136

http_proxy: str | None = None,

137

https_proxy: str | None = None,

138

server_host_name: str | None = None,

139

apply_server_timezone: str | bool | None = None,

140

show_clickhouse_errors: bool | None = None,

141

autogenerate_session_id: bool | None = None,

142

**kwargs

143

) -> Client: ...

144

145

def create_async_client(

146

host: str | None = None,

147

username: str | None = None,

148

password: str = '',

149

access_token: str | None = None,

150

database: str = '__default__',

151

interface: str | None = None,

152

port: int = 0,

153

secure: bool | str = False,

154

dsn: str | None = None,

155

settings: dict[str, Any] | None = None,

156

generic_args: dict[str, Any] | None = None,

157

executor_threads: int | None = None,

158

compress: bool | str = False,

159

query_limit: int = 0,

160

connect_timeout: int = 10,

161

send_receive_timeout: int = 300,

162

client_name: str | None = None,

163

verify: bool | str = True,

164

ca_cert: str | None = None,

165

client_cert: str | None = None,

166

client_cert_key: str | None = None,

167

session_id: str | None = None,

168

pool_mgr: Any | None = None,

169

http_proxy: str | None = None,

170

https_proxy: str | None = None,

171

server_host_name: str | None = None,

172

apply_server_timezone: str | bool | None = None,

173

show_clickhouse_errors: bool | None = None,

174

autogenerate_session_id: bool | None = None,

175

**kwargs

176

) -> AsyncClient: ...

177

178

# Legacy aliases (deprecated but still available)

179

get_client = create_client

180

get_async_client = create_async_client

181

```

182

183

[Client API](./client-api.md)

184

185

### Data Format Integration

186

187

Seamless integration with scientific Python ecosystem including NumPy arrays, Pandas DataFrames, and PyArrow tables for high-performance data processing and analysis workflows.

188

189

```python { .api }

190

def query_df(self, query: str, **kwargs) -> pd.DataFrame: ...

191

def query_np(self, query: str, **kwargs) -> np.ndarray: ...

192

def query_arrow(self, query: str, **kwargs) -> pa.Table: ...

193

def insert_df(self, table: str, df: pd.DataFrame, **kwargs): ...

194

def insert_arrow(self, table: str, arrow_table: pa.Table, **kwargs): ...

195

```

196

197

[Data Formats](./data-formats.md)

198

199

### DB-API 2.0 Interface

200

201

Standard Python Database API 2.0 implementation providing Connection and Cursor objects for compatibility with existing database tools and frameworks.

202

203

```python { .api }

204

def connect(

205

host: str | None = None,

206

database: str | None = None,

207

username: str | None = '',

208

password: str | None = '',

209

port: int | None = None,

210

**kwargs

211

) -> Connection: ...

212

213

class Connection:

214

def cursor(self) -> Cursor: ...

215

def close(self): ...

216

def commit(self): ...

217

def rollback(self): ...

218

219

class Cursor:

220

def execute(self, operation: str, parameters=None): ...

221

def fetchone(self) -> Sequence | None: ...

222

def fetchmany(self, size: int = None) -> Sequence[Sequence]: ...

223

def fetchall(self) -> Sequence[Sequence]: ...

224

```

225

226

[DB-API Interface](./dbapi.md)

227

228

### SQLAlchemy Integration

229

230

Complete SQLAlchemy dialect implementation enabling ORM capabilities, query building, and integration with SQL-based tools and frameworks.

231

232

```python { .api }

233

class ClickHouseDialect:

234

name = 'clickhousedb'

235

# SQLAlchemy dialect implementation

236

```

237

238

[SQLAlchemy Integration](./sqlalchemy.md)

239

240

### Exception Handling

241

242

Comprehensive exception hierarchy providing detailed error information and compatibility with DB-API 2.0 exception model for robust error handling.

243

244

```python { .api }

245

class ClickHouseError(Exception): ...

246

class DatabaseError(Error): ...

247

class OperationalError(DatabaseError): ...

248

class ProgrammingError(DatabaseError): ...

249

class StreamClosedError(ClickHouseError): ...

250

```

251

252

[Exception Handling](./exceptions.md)

253

254

### Utilities and Tools

255

256

Development and testing utilities including data generation tools, external data support, and configuration management for enhanced developer experience.

257

258

```python { .api }

259

def random_col_data(ch_type: str, size: int, nullable: bool = False) -> list: ...

260

def version() -> str: ...

261

def get_setting(name: str) -> Any: ...

262

def set_setting(name: str, value: Any) -> None: ...

263

class ExternalData: ...

264

class TableContext: ...

265

```

266

267

[Utilities](./utilities.md)

268

269

### Data Models and Return Types

270

271

Core data structures returned by query and insert operations, providing structured access to results, metadata, and execution summaries.

272

273

```python { .api }

274

class QueryResult:

275

result_set: List[List[Any]]

276

column_names: Tuple[str, ...]

277

column_types: Tuple[ClickHouseType, ...]

278

summary: Dict[str, Any]

279

def named_results(self) -> List[Dict[str, Any]]: ...

280

@property

281

def first_row(self) -> List[Any] | None: ...

282

@property

283

def first_item(self) -> Any | None: ...

284

285

class QuerySummary:

286

query_id: str

287

summary: Dict[str, Any]

288

def as_query_result(self) -> QueryResult: ...

289

290

class StreamContext:

291

def column_block_stream(self) -> Iterator[List[List[Any]]]: ...

292

def row_block_stream(self) -> Iterator[List[List[Any]]]: ...

293

def rows_stream(self) -> Iterator[List[Any]]: ...

294

def np_stream(self) -> Iterator[np.ndarray]: ...

295

def df_stream(self) -> Iterator[pd.DataFrame]: ...

296

```

297

298

## Common Configuration

299

300

### Connection Settings

301

302

```python

303

# Basic connection

304

client = create_client(host='clickhouse.example.com')

305

306

# With authentication

307

client = create_client(

308

host='clickhouse.example.com',

309

username='myuser',

310

password='mypassword',

311

database='analytics'

312

)

313

314

# Secure connection

315

client = create_client(

316

host='clickhouse.example.com',

317

secure=True, # Use HTTPS

318

verify=True, # Verify SSL certificate

319

ca_cert='/path/to/ca.pem'

320

)

321

322

# With compression

323

client = create_client(

324

host='clickhouse.example.com',

325

compress='lz4' # or 'zstd', 'gzip', 'brotli'

326

)

327

```

328

329

### ClickHouse Settings

330

331

```python

332

# Set global settings

333

client = create_client(

334

host='clickhouse.example.com',

335

settings={'max_threads': 4, 'max_memory_usage': '4G'}

336

)

337

338

# Runtime settings

339

client.set_client_setting('max_block_size', 65536)

340

setting_value = client.get_client_setting('max_block_size')

341

```

342

343

### Session Management

344

345

```python

346

# Auto-generate session IDs

347

client = create_client(

348

host='clickhouse.example.com',

349

autogenerate_session_id=True

350

)

351

352

# Manual session ID

353

client = create_client(

354

host='clickhouse.example.com',

355

session_id='my-session-123'

356

)

357

```