or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mddata-sources.mddruid-sources.mdindex.mdsql-tables.mdvisualizations.mdweb-interface.md

druid-sources.mddocs/

0

# Druid Data Sources

1

2

Panoramix integrates with Apache Druid for real-time analytics and OLAP querying. Druid datasources provide high-performance analytics on streaming and batch data with pre-aggregated metrics and fast drill-down capabilities.

3

4

## Capabilities

5

6

### Druid Datasource Management

7

8

Manage Druid datasources with automatic metadata synchronization, dimension and metric discovery, and query optimization.

9

10

```python { .api }

11

class Datasource(Model, AuditMixin, Queryable):

12

"""

13

Druid datasource model for real-time analytics.

14

15

Attributes:

16

id (int): Primary key

17

datasource_name (str): Unique datasource identifier

18

is_featured (bool): Whether datasource appears in featured list

19

is_hidden (bool): Whether datasource is hidden from UI

20

description (str): Datasource description

21

default_endpoint (str): Default visualization endpoint

22

user_id (int): Foreign key to User

23

owner (User): Datasource owner reference

24

cluster_name (str): Name of the Druid cluster

25

cluster (Cluster): Reference to Druid cluster

26

"""

27

28

def query(self, groupby, metrics, granularity, from_dttm, to_dttm,

29

limit_spec=None, filter=None, is_timeseries=True,

30

timeseries_limit=15, row_limit=None):

31

"""

32

Execute Druid query with aggregations and filters.

33

34

Args:

35

groupby (list): List of dimensions to group by

36

metrics (list): List of metrics to calculate

37

granularity (str): Time granularity ('second', 'minute', 'hour', 'day', 'week', 'month')

38

from_dttm (datetime): Start datetime for time-based queries

39

to_dttm (datetime): End datetime for time-based queries

40

limit_spec (dict, optional): Limit specification

41

filter (list, optional): List of filter conditions

42

is_timeseries (bool): Whether query is time-based (default True)

43

timeseries_limit (int): Limit for timeseries results (default 15)

44

row_limit (int, optional): Maximum number of rows to return

45

46

Returns:

47

QueryResult: Named tuple with df, query, and duration

48

"""

49

50

def get_metric_obj(self, metric_name):

51

"""

52

Get metric configuration object by name.

53

54

Args:

55

metric_name (str): Name of the metric to retrieve

56

57

Returns:

58

Metric: Metric configuration object

59

"""

60

61

@classmethod

62

def sync_to_db(cls, name, cluster):

63

"""

64

Synchronize datasource metadata from Druid cluster.

65

66

Args:

67

name (str): Datasource name in Druid

68

cluster (Cluster): Druid cluster instance

69

70

Returns:

71

Datasource: Created or updated datasource instance

72

"""

73

74

def latest_metadata(self):

75

"""

76

Get latest metadata from Druid cluster.

77

78

Returns:

79

dict: Column metadata from segment information

80

"""

81

82

def generate_metrics(self):

83

"""Generate default metrics for all columns."""

84

85

@property

86

def name(self):

87

"""Get the datasource name."""

88

return self.datasource_name

89

90

@property

91

def datasource_link(self):

92

"""Get HTML link to the datasource view."""

93

url = "/panoramix/datasource/{}/".format(self.datasource_name)

94

return '<a href="{url}">{self.datasource_name}</a>'.format(**locals())

95

96

@property

97

def metrics_combo(self):

98

"""Get list of metric name/verbose name tuples for forms."""

99

return sorted([

100

(m.metric_name, m.verbose_name) for m in self.metrics

101

], key=lambda x: x[1])

102

103

def __repr__(self):

104

"""String representation of the datasource."""

105

return self.datasource_name

106

```

107

108

### Druid Dimensions

109

110

Manage Druid dimensions (groupable columns) with data types and filtering capabilities.

111

112

```python { .api }

113

class Column(Model, AuditMixin):

114

"""

115

Druid datasource dimension metadata.

116

117

Attributes:

118

id (int): Primary key

119

column_name (str): Dimension name in Druid

120

verbose_name (str): Human-readable dimension name

121

is_active (bool): Whether dimension is active for queries

122

type (str): Dimension data type ('STRING', 'LONG', 'FLOAT', etc.)

123

groupby (bool): Whether dimension can be used for grouping

124

filterable (bool): Whether dimension can be filtered

125

description (str): Dimension description

126

datasource_id (int): Foreign key to Datasource

127

datasource (Datasource): Reference to parent datasource

128

is_dttm (bool): Whether dimension contains datetime data

129

expression (str): Custom expression for computed dimensions

130

"""

131

132

@property

133

def isnum(self):

134

"""Check if dimension is numeric type."""

135

return self.type in ('LONG', 'DOUBLE', 'FLOAT')

136

137

def generate_metrics(self):

138

"""Generate default metrics for this dimension."""

139

140

def __repr__(self):

141

"""String representation of the column."""

142

return self.column_name

143

```

144

145

### Druid Metrics

146

147

Define and manage Druid metrics including aggregations, post-aggregations, and custom expressions.

148

149

```python { .api }

150

class Metric(Model, AuditMixin):

151

"""

152

Druid-based metric definition for datasources.

153

154

Attributes:

155

id (int): Primary key

156

metric_name (str): Unique metric identifier

157

verbose_name (str): Human-readable metric name

158

metric_type (str): Type of metric ('longSum', 'doubleSum', 'count', etc.)

159

json (str): JSON configuration for complex metrics

160

description (str): Metric description

161

is_restricted (bool): Whether metric has access restrictions

162

datasource_id (int): Foreign key to Datasource

163

datasource (Datasource): Reference to parent datasource

164

"""

165

166

@property

167

def json_obj(self):

168

"""

169

Get parsed JSON configuration for the metric.

170

171

Returns:

172

dict: Parsed JSON configuration object

173

"""

174

```

175

176

## Usage Examples

177

178

### Basic Druid Querying

179

180

```python

181

from panoramix.models import Cluster, Datasource

182

183

# Get Druid cluster and datasource

184

cluster = Cluster.query.filter_by(cluster_name='production').first()

185

datasource = Datasource.query.filter_by(

186

datasource_name='events',

187

cluster=cluster

188

).first()

189

190

# Time series query

191

result = datasource.query(

192

groupby=['country'],

193

metrics=['count', 'sum__revenue'],

194

granularity='hour',

195

since='24 hours ago',

196

until='now'

197

)

198

199

print(result.df)

200

```

201

202

### Real-time Analytics

203

204

```python

205

# High-frequency real-time query

206

result = datasource.query(

207

groupby=['event_type', 'platform'],

208

metrics=['count', 'unique__user_id'],

209

granularity='minute',

210

since='1 hour ago',

211

until='now',

212

where="country = 'US'",

213

row_limit=100

214

)

215

216

# Access real-time event data

217

events_df = result.df

218

print(f"Query executed in {result.duration} seconds")

219

```

220

221

### Custom Metrics and Post-Aggregations

222

223

```python

224

# Query with custom metrics

225

result = datasource.query(

226

groupby=['campaign_id'],

227

metrics=['sum__impressions', 'sum__clicks', 'click_through_rate'],

228

having='sum__impressions > 1000',

229

limit_metric='click_through_rate',

230

order_desc=True,

231

row_limit=10

232

)

233

```

234

235

### Datasource Synchronization

236

237

```python

238

# Sync datasource metadata from Druid

239

new_datasource = Datasource.sync_to_db('new_events', cluster)

240

241

# Refresh all datasources in a cluster

242

cluster.refresh_datasources()

243

244

# Get metric configuration

245

metric_config = datasource.get_metric_obj('conversion_rate')

246

print(metric_config.json) # Metric definition JSON

247

```

248

249

## Properties and Helpers

250

251

```python { .api }

252

class Datasource:

253

@property

254

def datasource_link(self):

255

"""HTML link to datasource visualization view"""

256

257

@property

258

def metrics_combo(self):

259

"""List of available metrics as form choices"""

260

261

@property

262

def column_names(self):

263

"""List of all dimension names"""

264

265

@property

266

def groupby_column_names(self):

267

"""List of dimensions available for grouping"""

268

269

@property

270

def filterable_column_names(self):

271

"""List of dimensions available for filtering"""

272

```

273

274

## Druid-Specific Features

275

276

### Time Granularity

277

278

Druid supports fine-grained time granularities for real-time analytics:

279

280

- `second` - Second-level aggregation

281

- `minute` - Minute-level aggregation

282

- `hour` - Hourly aggregation

283

- `day` - Daily aggregation

284

- `week` - Weekly aggregation

285

- `month` - Monthly aggregation

286

287

### High Performance

288

289

Druid datasources provide:

290

291

- Sub-second query response times

292

- Real-time data ingestion

293

- Pre-aggregated rollups

294

- Columnar storage optimization

295

- Distributed query processing

296

297

### Integration with PyDruid

298

299

Panoramix uses PyDruid client for Druid communication, providing native Druid query capabilities with Python integration.