A Python connector for Apache Druid with synchronous and asynchronous clients, database API support, and comprehensive query building utilities.
—
Interactive command-line tool for executing SQL queries against Druid with syntax highlighting, autocompletion, and tabular result display. The CLI provides a convenient way to explore Druid data and test queries interactively.
The command-line interface is available through the pydruid command after installation.
def main() -> None:
"""
Main entry point for the PyDruid command-line interface.
Provides an interactive SQL shell for executing queries against Druid
with syntax highlighting, autocompletion, and formatted output.
"""# Basic usage
pydruid http://localhost:8082/druid/v2/sql/
# With HTTPS
pydruid https://localhost:8082/druid/v2/sql/
# With authentication (will prompt for credentials)
pydruid https://user@localhost:8082/druid/v2/sql/The CLI provides several interactive features:
The CLI provides autocompletion for comprehensive SQL syntax.
# SQL Keywords
keywords: list = [
"EXPLAIN PLAN FOR", "WITH", "SELECT", "ALL", "DISTINCT",
"FROM", "WHERE", "GROUP BY", "HAVING", "ORDER BY",
"ASC", "DESC", "LIMIT", "FILTER", "UNION ALL"
]
# Aggregate Functions
aggregate_functions: list = [
"COUNT", "SUM", "MIN", "MAX", "AVG",
"APPROX_COUNT_DISTINCT", "APPROX_QUANTILE"
]
# Numeric Functions
numeric_functions: list = [
"ABS", "CEIL", "EXP", "FLOOR", "LN", "LOG10",
"POWER", "SQRT", "TRUNCATE", "TRUNC", "MOD"
]
# String Functions
string_functions: list = [
"LENGTH", "CHAR_LENGTH", "CHARACTER_LENGTH", "STRLEN", "LOOKUP",
"LOWER", "REGEXP_EXTRACT", "REPLACE", "STRPOS", "SUBSTRING",
"SUBSTR", "TRIM", "BTRIM", "RTRIM", "LTRIM", "UPPER"
]
# Time Functions
time_functions: list = [
"CURRENT_TIMESTAMP", "CURRENT_DATE", "DATE_TRUNC", "TIME_FLOOR",
"TIME_SHIFT", "TIME_EXTRACT", "TIME_PARSE", "TIME_FORMAT",
"MILLIS_TO_TIMESTAMP", "TIMESTAMP_TO_MILLIS", "EXTRACT",
"FLOOR", "CEIL", "TIMESTAMPADD"
]
# Other Functions
other_functions: list = [
"CASE", "COALESCE", "NULLIF", "CAST", "SAFE_DIVIDE"
]$ pydruid http://localhost:8082/druid/v2/sql/
PyDruid CLI - Interactive Druid SQL Client
Connected to: http://localhost:8082/druid/v2/sql/
Type 'help' for help, 'quit' or 'exit' to quit.
> SELECT COUNT(*) AS cnt FROM places
cnt
-----
12345
> SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES;
TABLE_NAME
----------
test_table
twitterstream
places
COLUMNS
SCHEMATA
TABLES
> quit
GoodBye!> SELECT place,
CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
FROM places
WHERE place IS NOT NULL
LIMIT 5;
place lat lon
--------- ----- -----
"40.7128,-74.0060" 40.71 -74.01
"34.0522,-118.2437" 34.05 -118.24
"41.8781,-87.6298" 41.88 -87.63
"29.7604,-95.3698" 29.76 -95.37
"33.4484,-112.0740" 33.45 -112.07> SELECT
user_lang,
COUNT(*) as tweet_count,
AVG(LENGTH(tweet_text)) as avg_length
FROM twitterstream
WHERE __time >= '2014-03-01'
AND __time < '2014-04-01'
GROUP BY user_lang
ORDER BY tweet_count DESC
LIMIT 10;
user_lang tweet_count avg_length
--------- ----------- ----------
en 1,234,567 142.5
es 234,567 156.2
pt 123,456 148.9
fr 98,765 151.3
ja 87,654 67.8
...> DESCRIBE twitterstream;
COLUMN_NAME DATA_TYPE IS_NULLABLE
----------- --------- -----------
__time TIMESTAMP NO
tweet_id BIGINT YES
user_name VARCHAR YES
tweet_text VARCHAR YES
user_lang VARCHAR YES
retweet_count BIGINT YES
favorite_count BIGINT YES
> SELECT * FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'twitterstream'
LIMIT 5;
TABLE_SCHEMA TABLE_NAME COLUMN_NAME DATA_TYPE
------------ ---------- ----------- ---------
druid twitterstream __time TIMESTAMP
druid twitterstream tweet_id BIGINT
druid twitterstream user_name VARCHAR
druid twitterstream tweet_text VARCHAR
druid twitterstream user_lang VARCHAR> SELECT
FLOOR(__time TO HOUR) as hour,
COUNT(*) as tweets_per_hour
FROM twitterstream
WHERE __time >= '2014-03-02'
AND __time < '2014-03-03'
GROUP BY FLOOR(__time TO HOUR)
ORDER BY hour;
hour tweets_per_hour
---- ---------------
2014-03-02T00:00:00Z 15,432
2014-03-02T01:00:00Z 12,876
2014-03-02T02:00:00Z 8,543
2014-03-02T03:00:00Z 6,234
...> EXPLAIN PLAN FOR
SELECT user_name, COUNT(*) as tweet_count
FROM twitterstream
WHERE user_lang = 'en'
GROUP BY user_name
ORDER BY tweet_count DESC
LIMIT 10;
[
{
"query": {
"queryType": "groupBy",
"dataSource": "twitterstream",
"intervals": ["1000-01-01/3000-01-01"],
"granularity": "all",
"dimensions": ["user_name"],
"filter": {
"type": "selector",
"dimension": "user_lang",
"value": "en"
},
"aggregations": [
{
"type": "count",
"name": "tweet_count"
}
],
"limitSpec": {
"type": "default",
"limit": 10,
"orderBy": [
{
"dimension": "tweet_count",
"direction": "descending",
"dimensionOrder": "numeric"
}
]
}
}
}
]> help
Available commands:
- help: Show this help message
- quit, exit, bye: Exit the CLI
- \d: List all tables
- \d <table>: Describe table structure
> \d
Tables:
- twitterstream
- places
- user_profiles
> \d twitterstream
Table: twitterstream
Columns:
- __time (TIMESTAMP)
- tweet_id (BIGINT)
- user_name (VARCHAR)
- tweet_text (VARCHAR)
- user_lang (VARCHAR)
- retweet_count (BIGINT)
- favorite_count (BIGINT)The CLI requires additional dependencies for full functionality:
# Install with CLI dependencies
pip install pydruid[cli]
# Or install dependencies manually
pip install pygments prompt_toolkit tabulateDependencies:
The CLI uses several configuration options:
Command history is automatically saved to ~/.pydruid_history and persists across sessions.
SQL syntax highlighting is enabled by default and includes:
Query results are automatically formatted as tables using the tabulate library, with:
The CLI supports the same connection options as the database API:
Example with authentication:
pydruid https://username:password@druid.example.com:8082/druid/v2/sql/The CLI provides a convenient and user-friendly way to interact with Druid for data exploration, query development, and ad-hoc analysis tasks.
Install with Tessl CLI
npx tessl i tessl/pypi-pydruid