or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-impyla

Python client for the Impala distributed query engine and HiveServer2 implementations

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/impyla@0.22.x

To install, run

npx @tessl/cli install tessl/pypi-impyla@0.22.0

0

# Impyla

1

2

A comprehensive Python client library for HiveServer2 implementations, specifically designed for Impala and Hive distributed query engines. Impyla provides full DB API 2.0 (PEP 249) compliance, making it similar to other database clients like sqlite or MySQL clients, while supporting advanced features like Kerberos authentication, LDAP, SSL connections, and SQLAlchemy integration.

3

4

## Package Information

5

6

- **Package Name**: impyla

7

- **Language**: Python

8

- **Installation**: `pip install impyla`

9

- **License**: Apache 2.0

10

- **Requirements**: Python 2.7+ or 3.5+

11

12

## Core Imports

13

14

Standard DB API import:

15

16

```python

17

import impala.dbapi as dbapi

18

```

19

20

Direct connection import:

21

22

```python

23

from impala.dbapi import connect

24

```

25

26

Error handling:

27

28

```python

29

from impala.error import Error, DatabaseError, OperationalError

30

```

31

32

## Basic Usage

33

34

```python

35

from impala.dbapi import connect

36

37

# Connect to Impala

38

conn = connect(host='your-impala-host', port=21050)

39

cursor = conn.cursor()

40

41

# Execute a query

42

cursor.execute('SELECT * FROM my_table LIMIT 10')

43

results = cursor.fetchall()

44

45

# Print results

46

for row in results:

47

print(row)

48

49

# Clean up

50

cursor.close()

51

conn.close()

52

```

53

54

## Architecture

55

56

Impyla follows the DB API 2.0 specification with these key components:

57

58

- **Connection Objects**: Manage database connections with authentication and transport options

59

- **Cursor Objects**: Execute queries and fetch results with full DB API 2.0 compliance

60

- **Error Hierarchy**: Comprehensive exception handling following DB API 2.0 standards

61

- **Transport Layer**: Support for both binary Thrift and HTTP transport protocols

62

- **Authentication**: Multiple mechanisms including NOSASL, PLAIN, GSSAPI (Kerberos), LDAP, and JWT

63

64

The library serves as a bridge between Python applications and big data platforms, enabling seamless integration with the Python data science ecosystem through pandas DataFrame conversion capabilities.

65

66

## Capabilities

67

68

### Core Database API

69

70

Full DB API 2.0 compliant interface providing connection management, query execution, and result fetching. Supports all standard database operations with comprehensive parameter binding and transaction control.

71

72

```python { .api }

73

def connect(host='localhost', port=21050, database=None, timeout=None,

74

use_ssl=False, ca_cert=None, auth_mechanism='NOSASL', user=None,

75

password=None, kerberos_service_name='impala', use_ldap=None,

76

ldap_user=None, ldap_password=None, use_kerberos=None,

77

protocol=None, krb_host=None, use_http_transport=False,

78

http_path='', auth_cookie_names=None, http_cookie_names=None,

79

retries=3, jwt=None, user_agent=None,

80

get_user_custom_headers_func=None):

81

"""

82

Get a connection to HiveServer2 (HS2).

83

84

Returns:

85

HiveServer2Connection: Connection object implementing DB API 2.0

86

"""

87

```

88

89

[Core Database API](./core-database-api.md)

90

91

### Error Handling

92

93

Comprehensive exception hierarchy following DB API 2.0 standards, with specialized exceptions for RPC communication, HiveServer2-specific errors, and transport issues.

94

95

```python { .api }

96

class Error(Exception):

97

"""Base exception class for all database errors."""

98

99

class DatabaseError(Error):

100

"""Exception for database-related errors."""

101

102

class OperationalError(DatabaseError):

103

"""Exception for operational errors (connection issues, etc.)."""

104

105

class ProgrammingError(DatabaseError):

106

"""Exception for programming errors (SQL syntax, etc.)."""

107

```

108

109

[Error Handling](./error-handling.md)

110

111

### Data Utilities

112

113

Utility functions for data conversion and integration with the Python data science ecosystem, including pandas DataFrame conversion and database management functions.

114

115

```python { .api }

116

def as_pandas(cursor, coerce_float=False):

117

"""

118

Convert cursor results to pandas DataFrame.

119

120

Parameters:

121

cursor: Active cursor with executed query

122

coerce_float (bool): Coerce numeric columns to float

123

124

Returns:

125

pandas.DataFrame: Results as DataFrame

126

"""

127

```

128

129

[Data Utilities](./data-utilities.md)

130

131

### SQLAlchemy Integration

132

133

SQLAlchemy dialect support for Impala, enabling ORM and core SQLAlchemy functionality with Impala and Hive backends.

134

135

```python { .api }

136

class ImpalaDialect:

137

"""SQLAlchemy dialect for Impala."""

138

139

class Impala4Dialect:

140

"""SQLAlchemy dialect for Impala 4.x."""

141

```

142

143

[SQLAlchemy Integration](./sqlalchemy-integration.md)

144

145

## Authentication Mechanisms

146

147

- **NOSASL**: No authentication (unsecured Impala)

148

- **PLAIN**: Plain text authentication (unsecured Hive)

149

- **GSSAPI**: Kerberos authentication

150

- **LDAP**: LDAP authentication

151

- **JWT**: JSON Web Token authentication (HTTP transport only)

152

153

## Transport Protocols

154

155

- **Binary Transport**: Default Thrift binary protocol over TCP

156

- **HTTP Transport**: HTTP-based transport with cookie and authentication support

157

158

## DB API 2.0 Compliance

159

160

```python { .api }

161

# Module-level constants

162

apilevel = '2.0'

163

threadsafety = 1

164

paramstyle = 'pyformat'

165

166

# Type objects for DB API 2.0 compliance

167

STRING: _DBAPITypeObject # Matches 'STRING'

168

BINARY: _DBAPITypeObject # Matches 'BINARY'

169

NUMBER: _DBAPITypeObject # Matches 'BOOLEAN', 'TINYINT', 'SMALLINT', 'INT', 'BIGINT', 'FLOAT', 'DOUBLE', 'DECIMAL'

170

DATETIME: _DBAPITypeObject # Matches 'TIMESTAMP'

171

DATE: _DBAPITypeObject # Matches 'DATE'

172

ROWID: _DBAPITypeObject # Empty values set

173

174

# Date/time constructors

175

def Date(year, month, day): ...

176

def Time(hour, minute, second): ...

177

def Timestamp(year, month, day, hour, minute, second): ...

178

def DateFromTicks(ticks): ...

179

def TimeFromTicks(ticks): ...

180

def TimestampFromTicks(ticks): ...

181

182

# Binary data constructor

183

def Binary(data): ...

184

```