0
# Impyla
1
2
A comprehensive Python client library for HiveServer2 implementations, specifically designed for Impala and Hive distributed query engines. Impyla provides full DB API 2.0 (PEP 249) compliance, making it similar to other database clients like sqlite or MySQL clients, while supporting advanced features like Kerberos authentication, LDAP, SSL connections, and SQLAlchemy integration.
3
4
## Package Information
5
6
- **Package Name**: impyla
7
- **Language**: Python
8
- **Installation**: `pip install impyla`
9
- **License**: Apache 2.0
10
- **Requirements**: Python 2.7+ or 3.5+
11
12
## Core Imports
13
14
Standard DB API import:
15
16
```python
17
import impala.dbapi as dbapi
18
```
19
20
Direct connection import:
21
22
```python
23
from impala.dbapi import connect
24
```
25
26
Error handling:
27
28
```python
29
from impala.error import Error, DatabaseError, OperationalError
30
```
31
32
## Basic Usage
33
34
```python
35
from impala.dbapi import connect
36
37
# Connect to Impala
38
conn = connect(host='your-impala-host', port=21050)
39
cursor = conn.cursor()
40
41
# Execute a query
42
cursor.execute('SELECT * FROM my_table LIMIT 10')
43
results = cursor.fetchall()
44
45
# Print results
46
for row in results:
47
print(row)
48
49
# Clean up
50
cursor.close()
51
conn.close()
52
```
53
54
## Architecture
55
56
Impyla follows the DB API 2.0 specification with these key components:
57
58
- **Connection Objects**: Manage database connections with authentication and transport options
59
- **Cursor Objects**: Execute queries and fetch results with full DB API 2.0 compliance
60
- **Error Hierarchy**: Comprehensive exception handling following DB API 2.0 standards
61
- **Transport Layer**: Support for both binary Thrift and HTTP transport protocols
62
- **Authentication**: Multiple mechanisms including NOSASL, PLAIN, GSSAPI (Kerberos), LDAP, and JWT
63
64
The library serves as a bridge between Python applications and big data platforms, enabling seamless integration with the Python data science ecosystem through pandas DataFrame conversion capabilities.
65
66
## Capabilities
67
68
### Core Database API
69
70
Full DB API 2.0 compliant interface providing connection management, query execution, and result fetching. Supports all standard database operations with comprehensive parameter binding and transaction control.
71
72
```python { .api }
73
def connect(host='localhost', port=21050, database=None, timeout=None,
74
use_ssl=False, ca_cert=None, auth_mechanism='NOSASL', user=None,
75
password=None, kerberos_service_name='impala', use_ldap=None,
76
ldap_user=None, ldap_password=None, use_kerberos=None,
77
protocol=None, krb_host=None, use_http_transport=False,
78
http_path='', auth_cookie_names=None, http_cookie_names=None,
79
retries=3, jwt=None, user_agent=None,
80
get_user_custom_headers_func=None):
81
"""
82
Get a connection to HiveServer2 (HS2).
83
84
Returns:
85
HiveServer2Connection: Connection object implementing DB API 2.0
86
"""
87
```
88
89
[Core Database API](./core-database-api.md)
90
91
### Error Handling
92
93
Comprehensive exception hierarchy following DB API 2.0 standards, with specialized exceptions for RPC communication, HiveServer2-specific errors, and transport issues.
94
95
```python { .api }
96
class Error(Exception):
97
"""Base exception class for all database errors."""
98
99
class DatabaseError(Error):
100
"""Exception for database-related errors."""
101
102
class OperationalError(DatabaseError):
103
"""Exception for operational errors (connection issues, etc.)."""
104
105
class ProgrammingError(DatabaseError):
106
"""Exception for programming errors (SQL syntax, etc.)."""
107
```
108
109
[Error Handling](./error-handling.md)
110
111
### Data Utilities
112
113
Utility functions for data conversion and integration with the Python data science ecosystem, including pandas DataFrame conversion and database management functions.
114
115
```python { .api }
116
def as_pandas(cursor, coerce_float=False):
117
"""
118
Convert cursor results to pandas DataFrame.
119
120
Parameters:
121
cursor: Active cursor with executed query
122
coerce_float (bool): Coerce numeric columns to float
123
124
Returns:
125
pandas.DataFrame: Results as DataFrame
126
"""
127
```
128
129
[Data Utilities](./data-utilities.md)
130
131
### SQLAlchemy Integration
132
133
SQLAlchemy dialect support for Impala, enabling ORM and core SQLAlchemy functionality with Impala and Hive backends.
134
135
```python { .api }
136
class ImpalaDialect:
137
"""SQLAlchemy dialect for Impala."""
138
139
class Impala4Dialect:
140
"""SQLAlchemy dialect for Impala 4.x."""
141
```
142
143
[SQLAlchemy Integration](./sqlalchemy-integration.md)
144
145
## Authentication Mechanisms
146
147
- **NOSASL**: No authentication (unsecured Impala)
148
- **PLAIN**: Plain text authentication (unsecured Hive)
149
- **GSSAPI**: Kerberos authentication
150
- **LDAP**: LDAP authentication
151
- **JWT**: JSON Web Token authentication (HTTP transport only)
152
153
## Transport Protocols
154
155
- **Binary Transport**: Default Thrift binary protocol over TCP
156
- **HTTP Transport**: HTTP-based transport with cookie and authentication support
157
158
## DB API 2.0 Compliance
159
160
```python { .api }
161
# Module-level constants
162
apilevel = '2.0'
163
threadsafety = 1
164
paramstyle = 'pyformat'
165
166
# Type objects for DB API 2.0 compliance
167
STRING: _DBAPITypeObject # Matches 'STRING'
168
BINARY: _DBAPITypeObject # Matches 'BINARY'
169
NUMBER: _DBAPITypeObject # Matches 'BOOLEAN', 'TINYINT', 'SMALLINT', 'INT', 'BIGINT', 'FLOAT', 'DOUBLE', 'DECIMAL'
170
DATETIME: _DBAPITypeObject # Matches 'TIMESTAMP'
171
DATE: _DBAPITypeObject # Matches 'DATE'
172
ROWID: _DBAPITypeObject # Empty values set
173
174
# Date/time constructors
175
def Date(year, month, day): ...
176
def Time(hour, minute, second): ...
177
def Timestamp(year, month, day, hour, minute, second): ...
178
def DateFromTicks(ticks): ...
179
def TimeFromTicks(ticks): ...
180
def TimestampFromTicks(ticks): ...
181
182
# Binary data constructor
183
def Binary(data): ...
184
```