Tessl Tile for pypi/panoramix@0.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md data-sources.md druid-sources.md index.md sql-tables.md visualizations.md web-interface.md

druid-sources.mddocs/

0
# Druid Data Sources
1

2
Panoramix integrates with Apache Druid for real-time analytics and OLAP querying. Druid datasources provide high-performance analytics on streaming and batch data with pre-aggregated metrics and fast drill-down capabilities.
3

4
## Capabilities
5

6
### Druid Datasource Management
7

8
Manage Druid datasources with automatic metadata synchronization, dimension and metric discovery, and query optimization.
9

10
```python { .api }
11
class Datasource(Model, AuditMixin, Queryable):
12
    """
13
    Druid datasource model for real-time analytics.
14
    
15
    Attributes:
16
        id (int): Primary key
17
        datasource_name (str): Unique datasource identifier
18
        is_featured (bool): Whether datasource appears in featured list
19
        is_hidden (bool): Whether datasource is hidden from UI
20
        description (str): Datasource description
21
        default_endpoint (str): Default visualization endpoint
22
        user_id (int): Foreign key to User
23
        owner (User): Datasource owner reference
24
        cluster_name (str): Name of the Druid cluster
25
        cluster (Cluster): Reference to Druid cluster
26
    """
27
    
28
    def query(self, groupby, metrics, granularity, from_dttm, to_dttm,
29
             limit_spec=None, filter=None, is_timeseries=True,
30
             timeseries_limit=15, row_limit=None):
31
        """
32
        Execute Druid query with aggregations and filters.
33
        
34
        Args:
35
            groupby (list): List of dimensions to group by
36
            metrics (list): List of metrics to calculate
37
            granularity (str): Time granularity ('second', 'minute', 'hour', 'day', 'week', 'month')
38
            from_dttm (datetime): Start datetime for time-based queries
39
            to_dttm (datetime): End datetime for time-based queries
40
            limit_spec (dict, optional): Limit specification
41
            filter (list, optional): List of filter conditions
42
            is_timeseries (bool): Whether query is time-based (default True)
43
            timeseries_limit (int): Limit for timeseries results (default 15)
44
            row_limit (int, optional): Maximum number of rows to return
45
            
46
        Returns:
47
            QueryResult: Named tuple with df, query, and duration
48
        """
49
    
50
    def get_metric_obj(self, metric_name):
51
        """
52
        Get metric configuration object by name.
53
        
54
        Args:
55
            metric_name (str): Name of the metric to retrieve
56
            
57
        Returns:
58
            Metric: Metric configuration object
59
        """
60
    
61
    @classmethod
62
    def sync_to_db(cls, name, cluster):
63
        """
64
        Synchronize datasource metadata from Druid cluster.
65
        
66
        Args:
67
            name (str): Datasource name in Druid
68
            cluster (Cluster): Druid cluster instance
69
            
70
        Returns:
71
            Datasource: Created or updated datasource instance
72
        """
73
    
74
    def latest_metadata(self):
75
        """
76
        Get latest metadata from Druid cluster.
77
        
78
        Returns:
79
            dict: Column metadata from segment information
80
        """
81
    
82
    def generate_metrics(self):
83
        """Generate default metrics for all columns."""
84
    
85
    @property
86
    def name(self):
87
        """Get the datasource name."""
88
        return self.datasource_name
89
    
90
    @property
91
    def datasource_link(self):
92
        """Get HTML link to the datasource view."""
93
        url = "/panoramix/datasource/{}/".format(self.datasource_name)
94
        return '<a href="{url}">{self.datasource_name}</a>'.format(**locals())
95
    
96
    @property
97
    def metrics_combo(self):
98
        """Get list of metric name/verbose name tuples for forms."""
99
        return sorted([
100
            (m.metric_name, m.verbose_name) for m in self.metrics
101
        ], key=lambda x: x[1])
102
    
103
    def __repr__(self):
104
        """String representation of the datasource."""
105
        return self.datasource_name
106
```
107

108
### Druid Dimensions
109

110
Manage Druid dimensions (groupable columns) with data types and filtering capabilities.
111

112
```python { .api }
113
class Column(Model, AuditMixin):
114
    """
115
    Druid datasource dimension metadata.
116
    
117
    Attributes:
118
        id (int): Primary key
119
        column_name (str): Dimension name in Druid
120
        verbose_name (str): Human-readable dimension name
121
        is_active (bool): Whether dimension is active for queries
122
        type (str): Dimension data type ('STRING', 'LONG', 'FLOAT', etc.)
123
        groupby (bool): Whether dimension can be used for grouping
124
        filterable (bool): Whether dimension can be filtered
125
        description (str): Dimension description
126
        datasource_id (int): Foreign key to Datasource
127
        datasource (Datasource): Reference to parent datasource
128
        is_dttm (bool): Whether dimension contains datetime data
129
        expression (str): Custom expression for computed dimensions
130
    """
131
    
132
    @property
133
    def isnum(self):
134
        """Check if dimension is numeric type."""
135
        return self.type in ('LONG', 'DOUBLE', 'FLOAT')
136
    
137
    def generate_metrics(self):
138
        """Generate default metrics for this dimension."""
139
    
140
    def __repr__(self):
141
        """String representation of the column."""
142
        return self.column_name
143
```
144

145
### Druid Metrics
146

147
Define and manage Druid metrics including aggregations, post-aggregations, and custom expressions.
148

149
```python { .api }
150
class Metric(Model, AuditMixin):
151
    """
152
    Druid-based metric definition for datasources.
153
    
154
    Attributes:
155
        id (int): Primary key
156
        metric_name (str): Unique metric identifier
157
        verbose_name (str): Human-readable metric name
158
        metric_type (str): Type of metric ('longSum', 'doubleSum', 'count', etc.)
159
        json (str): JSON configuration for complex metrics
160
        description (str): Metric description
161
        is_restricted (bool): Whether metric has access restrictions
162
        datasource_id (int): Foreign key to Datasource
163
        datasource (Datasource): Reference to parent datasource
164
    """
165
    
166
    @property
167
    def json_obj(self):
168
        """
169
        Get parsed JSON configuration for the metric.
170
        
171
        Returns:
172
            dict: Parsed JSON configuration object
173
        """
174
```
175

176
## Usage Examples
177

178
### Basic Druid Querying
179

180
```python
181
from panoramix.models import Cluster, Datasource
182

183
# Get Druid cluster and datasource
184
cluster = Cluster.query.filter_by(cluster_name='production').first()
185
datasource = Datasource.query.filter_by(
186
    datasource_name='events', 
187
    cluster=cluster
188
).first()
189

190
# Time series query
191
result = datasource.query(
192
    groupby=['country'],
193
    metrics=['count', 'sum__revenue'],
194
    granularity='hour',
195
    since='24 hours ago',
196
    until='now'
197
)
198

199
print(result.df)
200
```
201

202
### Real-time Analytics
203

204
```python
205
# High-frequency real-time query
206
result = datasource.query(
207
    groupby=['event_type', 'platform'],
208
    metrics=['count', 'unique__user_id'],
209
    granularity='minute',
210
    since='1 hour ago',
211
    until='now',
212
    where="country = 'US'",
213
    row_limit=100
214
)
215

216
# Access real-time event data
217
events_df = result.df
218
print(f"Query executed in {result.duration} seconds")
219
```
220

221
### Custom Metrics and Post-Aggregations
222

223
```python
224
# Query with custom metrics
225
result = datasource.query(
226
    groupby=['campaign_id'],
227
    metrics=['sum__impressions', 'sum__clicks', 'click_through_rate'],
228
    having='sum__impressions > 1000',
229
    limit_metric='click_through_rate',
230
    order_desc=True,
231
    row_limit=10
232
)
233
```
234

235
### Datasource Synchronization
236

237
```python
238
# Sync datasource metadata from Druid
239
new_datasource = Datasource.sync_to_db('new_events', cluster)
240

241
# Refresh all datasources in a cluster
242
cluster.refresh_datasources()
243

244
# Get metric configuration
245
metric_config = datasource.get_metric_obj('conversion_rate')
246
print(metric_config.json)  # Metric definition JSON
247
```
248

249
## Properties and Helpers
250

251
```python { .api }
252
class Datasource:
253
    @property
254
    def datasource_link(self):
255
        """HTML link to datasource visualization view"""
256
    
257
    @property
258
    def metrics_combo(self):
259
        """List of available metrics as form choices"""
260
    
261
    @property
262
    def column_names(self):
263
        """List of all dimension names"""
264
        
265
    @property
266
    def groupby_column_names(self):
267
        """List of dimensions available for grouping"""
268
        
269
    @property  
270
    def filterable_column_names(self):
271
        """List of dimensions available for filtering"""
272
```
273

274
## Druid-Specific Features
275

276
### Time Granularity
277

278
Druid supports fine-grained time granularities for real-time analytics:
279

280
- `second` - Second-level aggregation
281
- `minute` - Minute-level aggregation  
282
- `hour` - Hourly aggregation
283
- `day` - Daily aggregation
284
- `week` - Weekly aggregation
285
- `month` - Monthly aggregation
286

287
### High Performance
288

289
Druid datasources provide:
290

291
- Sub-second query response times
292
- Real-time data ingestion
293
- Pre-aggregated rollups
294
- Columnar storage optimization
295
- Distributed query processing
296

297
### Integration with PyDruid
298

299
Panoramix uses PyDruid client for Druid communication, providing native Druid query capabilities with Python integration.

Version

Tile

Files

druid-sources.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

druid-sources.mddocs/