0
# Druid Data Sources
1
2
Panoramix integrates with Apache Druid for real-time analytics and OLAP querying. Druid datasources provide high-performance analytics on streaming and batch data with pre-aggregated metrics and fast drill-down capabilities.
3
4
## Capabilities
5
6
### Druid Datasource Management
7
8
Manage Druid datasources with automatic metadata synchronization, dimension and metric discovery, and query optimization.
9
10
```python { .api }
11
class Datasource(Model, AuditMixin, Queryable):
12
"""
13
Druid datasource model for real-time analytics.
14
15
Attributes:
16
id (int): Primary key
17
datasource_name (str): Unique datasource identifier
18
is_featured (bool): Whether datasource appears in featured list
19
is_hidden (bool): Whether datasource is hidden from UI
20
description (str): Datasource description
21
default_endpoint (str): Default visualization endpoint
22
user_id (int): Foreign key to User
23
owner (User): Datasource owner reference
24
cluster_name (str): Name of the Druid cluster
25
cluster (Cluster): Reference to Druid cluster
26
"""
27
28
def query(self, groupby, metrics, granularity, from_dttm, to_dttm,
29
limit_spec=None, filter=None, is_timeseries=True,
30
timeseries_limit=15, row_limit=None):
31
"""
32
Execute Druid query with aggregations and filters.
33
34
Args:
35
groupby (list): List of dimensions to group by
36
metrics (list): List of metrics to calculate
37
granularity (str): Time granularity ('second', 'minute', 'hour', 'day', 'week', 'month')
38
from_dttm (datetime): Start datetime for time-based queries
39
to_dttm (datetime): End datetime for time-based queries
40
limit_spec (dict, optional): Limit specification
41
filter (list, optional): List of filter conditions
42
is_timeseries (bool): Whether query is time-based (default True)
43
timeseries_limit (int): Limit for timeseries results (default 15)
44
row_limit (int, optional): Maximum number of rows to return
45
46
Returns:
47
QueryResult: Named tuple with df, query, and duration
48
"""
49
50
def get_metric_obj(self, metric_name):
51
"""
52
Get metric configuration object by name.
53
54
Args:
55
metric_name (str): Name of the metric to retrieve
56
57
Returns:
58
Metric: Metric configuration object
59
"""
60
61
@classmethod
62
def sync_to_db(cls, name, cluster):
63
"""
64
Synchronize datasource metadata from Druid cluster.
65
66
Args:
67
name (str): Datasource name in Druid
68
cluster (Cluster): Druid cluster instance
69
70
Returns:
71
Datasource: Created or updated datasource instance
72
"""
73
74
def latest_metadata(self):
75
"""
76
Get latest metadata from Druid cluster.
77
78
Returns:
79
dict: Column metadata from segment information
80
"""
81
82
def generate_metrics(self):
83
"""Generate default metrics for all columns."""
84
85
@property
86
def name(self):
87
"""Get the datasource name."""
88
return self.datasource_name
89
90
@property
91
def datasource_link(self):
92
"""Get HTML link to the datasource view."""
93
url = "/panoramix/datasource/{}/".format(self.datasource_name)
94
return '<a href="{url}">{self.datasource_name}</a>'.format(**locals())
95
96
@property
97
def metrics_combo(self):
98
"""Get list of metric name/verbose name tuples for forms."""
99
return sorted([
100
(m.metric_name, m.verbose_name) for m in self.metrics
101
], key=lambda x: x[1])
102
103
def __repr__(self):
104
"""String representation of the datasource."""
105
return self.datasource_name
106
```
107
108
### Druid Dimensions
109
110
Manage Druid dimensions (groupable columns) with data types and filtering capabilities.
111
112
```python { .api }
113
class Column(Model, AuditMixin):
114
"""
115
Druid datasource dimension metadata.
116
117
Attributes:
118
id (int): Primary key
119
column_name (str): Dimension name in Druid
120
verbose_name (str): Human-readable dimension name
121
is_active (bool): Whether dimension is active for queries
122
type (str): Dimension data type ('STRING', 'LONG', 'FLOAT', etc.)
123
groupby (bool): Whether dimension can be used for grouping
124
filterable (bool): Whether dimension can be filtered
125
description (str): Dimension description
126
datasource_id (int): Foreign key to Datasource
127
datasource (Datasource): Reference to parent datasource
128
is_dttm (bool): Whether dimension contains datetime data
129
expression (str): Custom expression for computed dimensions
130
"""
131
132
@property
133
def isnum(self):
134
"""Check if dimension is numeric type."""
135
return self.type in ('LONG', 'DOUBLE', 'FLOAT')
136
137
def generate_metrics(self):
138
"""Generate default metrics for this dimension."""
139
140
def __repr__(self):
141
"""String representation of the column."""
142
return self.column_name
143
```
144
145
### Druid Metrics
146
147
Define and manage Druid metrics including aggregations, post-aggregations, and custom expressions.
148
149
```python { .api }
150
class Metric(Model, AuditMixin):
151
"""
152
Druid-based metric definition for datasources.
153
154
Attributes:
155
id (int): Primary key
156
metric_name (str): Unique metric identifier
157
verbose_name (str): Human-readable metric name
158
metric_type (str): Type of metric ('longSum', 'doubleSum', 'count', etc.)
159
json (str): JSON configuration for complex metrics
160
description (str): Metric description
161
is_restricted (bool): Whether metric has access restrictions
162
datasource_id (int): Foreign key to Datasource
163
datasource (Datasource): Reference to parent datasource
164
"""
165
166
@property
167
def json_obj(self):
168
"""
169
Get parsed JSON configuration for the metric.
170
171
Returns:
172
dict: Parsed JSON configuration object
173
"""
174
```
175
176
## Usage Examples
177
178
### Basic Druid Querying
179
180
```python
181
from panoramix.models import Cluster, Datasource
182
183
# Get Druid cluster and datasource
184
cluster = Cluster.query.filter_by(cluster_name='production').first()
185
datasource = Datasource.query.filter_by(
186
datasource_name='events',
187
cluster=cluster
188
).first()
189
190
# Time series query
191
result = datasource.query(
192
groupby=['country'],
193
metrics=['count', 'sum__revenue'],
194
granularity='hour',
195
since='24 hours ago',
196
until='now'
197
)
198
199
print(result.df)
200
```
201
202
### Real-time Analytics
203
204
```python
205
# High-frequency real-time query
206
result = datasource.query(
207
groupby=['event_type', 'platform'],
208
metrics=['count', 'unique__user_id'],
209
granularity='minute',
210
since='1 hour ago',
211
until='now',
212
where="country = 'US'",
213
row_limit=100
214
)
215
216
# Access real-time event data
217
events_df = result.df
218
print(f"Query executed in {result.duration} seconds")
219
```
220
221
### Custom Metrics and Post-Aggregations
222
223
```python
224
# Query with custom metrics
225
result = datasource.query(
226
groupby=['campaign_id'],
227
metrics=['sum__impressions', 'sum__clicks', 'click_through_rate'],
228
having='sum__impressions > 1000',
229
limit_metric='click_through_rate',
230
order_desc=True,
231
row_limit=10
232
)
233
```
234
235
### Datasource Synchronization
236
237
```python
238
# Sync datasource metadata from Druid
239
new_datasource = Datasource.sync_to_db('new_events', cluster)
240
241
# Refresh all datasources in a cluster
242
cluster.refresh_datasources()
243
244
# Get metric configuration
245
metric_config = datasource.get_metric_obj('conversion_rate')
246
print(metric_config.json) # Metric definition JSON
247
```
248
249
## Properties and Helpers
250
251
```python { .api }
252
class Datasource:
253
@property
254
def datasource_link(self):
255
"""HTML link to datasource visualization view"""
256
257
@property
258
def metrics_combo(self):
259
"""List of available metrics as form choices"""
260
261
@property
262
def column_names(self):
263
"""List of all dimension names"""
264
265
@property
266
def groupby_column_names(self):
267
"""List of dimensions available for grouping"""
268
269
@property
270
def filterable_column_names(self):
271
"""List of dimensions available for filtering"""
272
```
273
274
## Druid-Specific Features
275
276
### Time Granularity
277
278
Druid supports fine-grained time granularities for real-time analytics:
279
280
- `second` - Second-level aggregation
281
- `minute` - Minute-level aggregation
282
- `hour` - Hourly aggregation
283
- `day` - Daily aggregation
284
- `week` - Weekly aggregation
285
- `month` - Monthly aggregation
286
287
### High Performance
288
289
Druid datasources provide:
290
291
- Sub-second query response times
292
- Real-time data ingestion
293
- Pre-aggregated rollups
294
- Columnar storage optimization
295
- Distributed query processing
296
297
### Integration with PyDruid
298
299
Panoramix uses PyDruid client for Druid communication, providing native Druid query capabilities with Python integration.