tessl/pypi-pydruid

A Python connector for Apache Druid with synchronous and asynchronous clients, database API support, and comprehensive query building utilities.

—

Pending

Overview

Eval results

Files

Command Line Interface

Name: tessl/pypi-pydruid
Author: tessl

Interactive command-line tool for executing SQL queries against Druid with syntax highlighting, autocompletion, and tabular result display. The CLI provides a convenient way to explore Druid data and test queries interactively.

Capabilities

CLI Entry Point

The command-line interface is available through the pydruid command after installation.

def main() -> None:
    """
    Main entry point for the PyDruid command-line interface.
    
    Provides an interactive SQL shell for executing queries against Druid
    with syntax highlighting, autocompletion, and formatted output.
    """

Command Execution

# Basic usage
pydruid http://localhost:8082/druid/v2/sql/

# With HTTPS
pydruid https://localhost:8082/druid/v2/sql/

# With authentication (will prompt for credentials)
pydruid https://user@localhost:8082/druid/v2/sql/

Interactive Features

The CLI provides several interactive features:

Syntax Highlighting: SQL keywords, functions, and strings are highlighted
Autocompletion: Tab completion for SQL keywords, functions, and table names
Command History: Persistent command history across sessions
Formatted Output: Results displayed in formatted tables
Multi-line Support: Queries can span multiple lines

SQL Keywords and Functions

The CLI provides autocompletion for comprehensive SQL syntax.

# SQL Keywords
keywords: list = [
    "EXPLAIN PLAN FOR", "WITH", "SELECT", "ALL", "DISTINCT", 
    "FROM", "WHERE", "GROUP BY", "HAVING", "ORDER BY", 
    "ASC", "DESC", "LIMIT", "FILTER", "UNION ALL"
]

# Aggregate Functions
aggregate_functions: list = [
    "COUNT", "SUM", "MIN", "MAX", "AVG", 
    "APPROX_COUNT_DISTINCT", "APPROX_QUANTILE"
]

# Numeric Functions
numeric_functions: list = [
    "ABS", "CEIL", "EXP", "FLOOR", "LN", "LOG10", 
    "POWER", "SQRT", "TRUNCATE", "TRUNC", "MOD"
]

# String Functions
string_functions: list = [
    "LENGTH", "CHAR_LENGTH", "CHARACTER_LENGTH", "STRLEN", "LOOKUP",
    "LOWER", "REGEXP_EXTRACT", "REPLACE", "STRPOS", "SUBSTRING",
    "SUBSTR", "TRIM", "BTRIM", "RTRIM", "LTRIM", "UPPER"
]

# Time Functions
time_functions: list = [
    "CURRENT_TIMESTAMP", "CURRENT_DATE", "DATE_TRUNC", "TIME_FLOOR",
    "TIME_SHIFT", "TIME_EXTRACT", "TIME_PARSE", "TIME_FORMAT",
    "MILLIS_TO_TIMESTAMP", "TIMESTAMP_TO_MILLIS", "EXTRACT",
    "FLOOR", "CEIL", "TIMESTAMPADD"
]

# Other Functions
other_functions: list = [
    "CASE", "COALESCE", "NULLIF", "CAST", "SAFE_DIVIDE"
]

Usage Examples

Basic Usage

$ pydruid http://localhost:8082/druid/v2/sql/
PyDruid CLI - Interactive Druid SQL Client
Connected to: http://localhost:8082/druid/v2/sql/
Type 'help' for help, 'quit' or 'exit' to quit.

> SELECT COUNT(*) AS cnt FROM places
  cnt
-----
12345

> SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES;
TABLE_NAME
----------
test_table
twitterstream
places
COLUMNS
SCHEMATA
TABLES

> quit
GoodBye!

Multi-line Queries

> SELECT place,
    CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
    CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
  FROM places
  WHERE place IS NOT NULL
  LIMIT 5;

place                    lat      lon
---------              -----    -----
"40.7128,-74.0060"     40.71    -74.01
"34.0522,-118.2437"    34.05   -118.24
"41.8781,-87.6298"     41.88    -87.63
"29.7604,-95.3698"     29.76    -95.37
"33.4484,-112.0740"    33.45   -112.07

Complex Analytical Queries

> SELECT 
    user_lang,
    COUNT(*) as tweet_count,
    AVG(LENGTH(tweet_text)) as avg_length
  FROM twitterstream 
  WHERE __time >= '2014-03-01' 
    AND __time < '2014-04-01'
  GROUP BY user_lang
  ORDER BY tweet_count DESC
  LIMIT 10;

user_lang  tweet_count  avg_length
---------  -----------  ----------
en         1,234,567    142.5
es         234,567      156.2
pt         123,456      148.9
fr         98,765       151.3
ja         87,654       67.8
...

Schema Exploration

> DESCRIBE twitterstream;

COLUMN_NAME       DATA_TYPE    IS_NULLABLE
-----------       ---------    -----------
__time           TIMESTAMP    NO
tweet_id         BIGINT       YES
user_name        VARCHAR      YES
tweet_text       VARCHAR      YES
user_lang        VARCHAR      YES
retweet_count    BIGINT       YES
favorite_count   BIGINT       YES

> SELECT * FROM INFORMATION_SCHEMA.COLUMNS 
  WHERE TABLE_NAME = 'twitterstream' 
  LIMIT 5;

TABLE_SCHEMA  TABLE_NAME    COLUMN_NAME    DATA_TYPE
------------  ----------    -----------    ---------
druid         twitterstream __time         TIMESTAMP
druid         twitterstream tweet_id       BIGINT
druid         twitterstream user_name      VARCHAR
druid         twitterstream tweet_text     VARCHAR
druid         twitterstream user_lang      VARCHAR

Time-based Analysis

> SELECT 
    FLOOR(__time TO HOUR) as hour,
    COUNT(*) as tweets_per_hour
  FROM twitterstream
  WHERE __time >= '2014-03-02'
    AND __time < '2014-03-03'
  GROUP BY FLOOR(__time TO HOUR)
  ORDER BY hour;

hour                  tweets_per_hour
----                  ---------------
2014-03-02T00:00:00Z  15,432
2014-03-02T01:00:00Z  12,876
2014-03-02T02:00:00Z  8,543
2014-03-02T03:00:00Z  6,234
...

Query Performance Analysis

> EXPLAIN PLAN FOR
  SELECT user_name, COUNT(*) as tweet_count
  FROM twitterstream
  WHERE user_lang = 'en'
  GROUP BY user_name
  ORDER BY tweet_count DESC
  LIMIT 10;

[
  {
    "query": {
      "queryType": "groupBy",
      "dataSource": "twitterstream",
      "intervals": ["1000-01-01/3000-01-01"],
      "granularity": "all",
      "dimensions": ["user_name"],
      "filter": {
        "type": "selector",
        "dimension": "user_lang",
        "value": "en"
      },
      "aggregations": [
        {
          "type": "count",
          "name": "tweet_count"
        }
      ],
      "limitSpec": {
        "type": "default",
        "limit": 10,
        "orderBy": [
          {
            "dimension": "tweet_count",
            "direction": "descending",
            "dimensionOrder": "numeric"
          }
        ]
      }
    }
  }
]

Special Commands

> help
Available commands:
- help: Show this help message
- quit, exit, bye: Exit the CLI
- \d: List all tables
- \d <table>: Describe table structure

> \d
Tables:
- twitterstream
- places  
- user_profiles

> \d twitterstream
Table: twitterstream
Columns:
- __time (TIMESTAMP)
- tweet_id (BIGINT)
- user_name (VARCHAR)
- tweet_text (VARCHAR)
- user_lang (VARCHAR)
- retweet_count (BIGINT)
- favorite_count (BIGINT)

Installation and Dependencies

The CLI requires additional dependencies for full functionality:

# Install with CLI dependencies
pip install pydruid[cli]

# Or install dependencies manually
pip install pygments prompt_toolkit tabulate

Dependencies:

pygments: Syntax highlighting for SQL
prompt_toolkit: Interactive input with autocompletion and history
tabulate: Formatted table output for query results

Configuration

The CLI uses several configuration options:

History

Command history is automatically saved to ~/.pydruid_history and persists across sessions.

Syntax Highlighting

SQL syntax highlighting is enabled by default and includes:

Keywords (SELECT, FROM, WHERE, etc.)
Functions (COUNT, SUM, CONCAT, etc.)
Strings and numeric literals
Comments

Output Formatting

Query results are automatically formatted as tables using the tabulate library, with:

Column headers
Aligned columns
Configurable table styles
Support for wide tables with proper wrapping

Connection Options

The CLI supports the same connection options as the database API:

HTTP and HTTPS protocols
Basic authentication
SSL certificate verification
Custom headers and context parameters

Example with authentication:

pydruid https://username:password@druid.example.com:8082/druid/v2/sql/

The CLI provides a convenient and user-friendly way to interact with Druid for data exploration, query development, and ad-hoc analysis tasks.

Install with Tessl CLI

npx tessl i tessl/pypi-pydruid

docs

asynchronous-client.md

command-line-interface.md

database-api.md

index.md

query-utilities.md

sqlalchemy-integration.md

synchronous-client.md

tile.json

tessl/pypi-pydruid