0
# Synchronous Client
1
2
The PyDruid synchronous client provides a comprehensive interface for executing Druid queries with support for all query types, authentication, proxy configuration, and flexible result export capabilities.
3
4
## Capabilities
5
6
### Client Initialization
7
8
Creates a new PyDruid client instance for connecting to a Druid broker.
9
10
```python { .api }
11
class PyDruid:
12
def __init__(self, url: str, endpoint: str, cafile: str = None) -> None:
13
"""
14
Initialize PyDruid client.
15
16
Parameters:
17
- url: URL of Broker node in the Druid cluster
18
- endpoint: Endpoint that Broker listens for queries on (typically 'druid/v2/')
19
- cafile: Path to CA certificate file for SSL verification (optional)
20
"""
21
```
22
23
### Authentication and Configuration
24
25
Configure client authentication and proxy settings.
26
27
```python { .api }
28
def set_basic_auth_credentials(self, username: str, password: str) -> None:
29
"""
30
Set HTTP Basic Authentication credentials.
31
32
Parameters:
33
- username: Username for authentication
34
- password: Password for authentication
35
"""
36
37
def set_proxies(self, proxies: dict) -> None:
38
"""
39
Configure proxy settings for HTTP requests.
40
41
Parameters:
42
- proxies: Dictionary mapping protocol names to proxy URLs
43
Example: {'http': 'http://proxy.example.com:8080'}
44
"""
45
```
46
47
### TopN Queries
48
49
Execute TopN queries to retrieve the top values for a dimension sorted by a metric.
50
51
```python { .api }
52
def topn(
53
self,
54
datasource: str,
55
granularity: str,
56
intervals: str | list,
57
aggregations: dict,
58
dimension: str,
59
metric: str,
60
threshold: int,
61
filter: 'Filter' = None,
62
post_aggregations: dict = None,
63
context: dict = None,
64
**kwargs
65
) -> Query:
66
"""
67
Execute a TopN query.
68
69
Parameters:
70
- datasource: Data source to query
71
- granularity: Time granularity ('all', 'day', 'hour', 'minute', etc.)
72
- intervals: ISO-8601 intervals ('2014-02-02/p4w' or list of intervals)
73
- aggregations: Dict mapping aggregator names to aggregator specifications
74
- dimension: Dimension to run the query against
75
- metric: Metric to sort the dimension values by
76
- threshold: Number of top items to return
77
- filter: Filter to apply to the data (optional)
78
- post_aggregations: Dict of post-aggregations to compute (optional)
79
- context: Query context parameters (optional)
80
81
Returns:
82
Query object containing results and metadata
83
"""
84
```
85
86
Example usage:
87
88
```python
89
from pydruid.client import PyDruid
90
from pydruid.utils.aggregators import doublesum
91
from pydruid.utils.filters import Dimension
92
93
client = PyDruid('http://localhost:8082', 'druid/v2/')
94
95
top = client.topn(
96
datasource='twitterstream',
97
granularity='all',
98
intervals='2014-03-03/p1d',
99
aggregations={'count': doublesum('count')},
100
dimension='user_mention_name',
101
filter=(Dimension('user_lang') == 'en') & (Dimension('first_hashtag') == 'oscars'),
102
metric='count',
103
threshold=10
104
)
105
106
df = client.export_pandas()
107
```
108
109
### Timeseries Queries
110
111
Execute timeseries queries to retrieve aggregated data over time intervals.
112
113
```python { .api }
114
def timeseries(
115
self,
116
datasource: str,
117
granularity: str,
118
intervals: str | list,
119
aggregations: dict,
120
filter: 'Filter' = None,
121
post_aggregations: dict = None,
122
context: dict = None,
123
**kwargs
124
) -> Query:
125
"""
126
Execute a timeseries query.
127
128
Parameters:
129
- datasource: Data source to query
130
- granularity: Time granularity for aggregation
131
- intervals: ISO-8601 intervals to query
132
- aggregations: Dict mapping aggregator names to aggregator specifications
133
- filter: Filter to apply to the data (optional)
134
- post_aggregations: Dict of post-aggregations to compute (optional)
135
- context: Query context parameters (optional)
136
137
Returns:
138
Query object containing time-series results
139
"""
140
```
141
142
### GroupBy Queries
143
144
Execute groupBy queries to group data by one or more dimensions with aggregations.
145
146
```python { .api }
147
def groupby(
148
self,
149
datasource: str,
150
granularity: str,
151
intervals: str | list,
152
dimensions: list,
153
aggregations: dict,
154
filter: 'Filter' = None,
155
having: 'Having' = None,
156
post_aggregations: dict = None,
157
limit_spec: dict = None,
158
context: dict = None,
159
**kwargs
160
) -> Query:
161
"""
162
Execute a groupBy query.
163
164
Parameters:
165
- datasource: Data source to query
166
- granularity: Time granularity for grouping
167
- intervals: ISO-8601 intervals to query
168
- dimensions: List of dimensions to group by
169
- aggregations: Dict mapping aggregator names to aggregator specifications
170
- filter: Filter to apply to the data (optional)
171
- having: Having clause for filtering grouped results (optional)
172
- post_aggregations: Dict of post-aggregations to compute (optional)
173
- limit_spec: Specification for limiting and ordering results (optional)
174
- context: Query context parameters (optional)
175
176
Returns:
177
Query object containing grouped results
178
"""
179
```
180
181
### Metadata Queries
182
183
Query metadata about datasources and segments.
184
185
```python { .api }
186
def segment_metadata(
187
self,
188
datasource: str,
189
intervals: str | list = None,
190
context: dict = None,
191
**kwargs
192
) -> Query:
193
"""
194
Execute a segment metadata query.
195
196
Parameters:
197
- datasource: Data source to analyze
198
- intervals: ISO-8601 intervals to analyze (optional, defaults to all)
199
- context: Query context parameters (optional)
200
201
Returns:
202
Query object containing segment metadata
203
"""
204
205
def time_boundary(
206
self,
207
datasource: str,
208
context: dict = None,
209
**kwargs
210
) -> Query:
211
"""
212
Execute a time boundary query.
213
214
Parameters:
215
- datasource: Data source to query
216
- context: Query context parameters (optional)
217
218
Returns:
219
Query object containing time boundary information
220
"""
221
```
222
223
### Advanced Query Types
224
225
Execute select, scan, and sub-query operations for raw data access and query composition.
226
227
```python { .api }
228
def subquery(
229
self,
230
**kwargs
231
) -> dict:
232
"""
233
Create a sub-query for use in nested queries.
234
235
Parameters:
236
- **kwargs: Query parameters (datasource, granularity, intervals, etc.)
237
238
Returns:
239
Dictionary representation of query (not executed)
240
241
Note:
242
This method returns a query dictionary without executing it,
243
allowing it to be used as a datasource in other queries.
244
"""
245
```
246
247
Execute select and scan queries for raw data access.
248
249
```python { .api }
250
def select(
251
self,
252
datasource: str,
253
granularity: str,
254
intervals: str | list,
255
dimensions: list = None,
256
metrics: list = None,
257
filter: 'Filter' = None,
258
paging_spec: dict = None,
259
context: dict = None,
260
**kwargs
261
) -> Query:
262
"""
263
Execute a select query for raw data access.
264
265
Parameters:
266
- datasource: Data source to query
267
- granularity: Time granularity
268
- intervals: ISO-8601 intervals to query
269
- dimensions: List of dimensions to include (optional)
270
- metrics: List of metrics to include (optional)
271
- filter: Filter to apply (optional)
272
- paging_spec: Paging specification for large result sets (optional)
273
- context: Query context parameters (optional)
274
275
Returns:
276
Query object containing raw data
277
"""
278
279
def scan(
280
self,
281
datasource: str,
282
granularity: str,
283
intervals: str | list,
284
limit: int,
285
columns: list = None,
286
metrics: list = None,
287
filter: 'Filter' = None,
288
context: dict = None,
289
**kwargs
290
) -> Query:
291
"""
292
Execute a scan query for raw data access.
293
294
Parameters:
295
- datasource: Data source to query
296
- granularity: Time granularity
297
- intervals: ISO-8601 intervals to query
298
- limit: Maximum number of rows to return
299
- columns: List of columns to select (optional, all columns if empty)
300
- metrics: List of metrics to select (optional, all metrics if empty)
301
- filter: Filter to apply (optional)
302
- context: Query context parameters (optional)
303
304
Returns:
305
Query object containing scan results
306
"""
307
308
def search(
309
self,
310
datasource: str,
311
granularity: str,
312
intervals: str | list,
313
searchDimensions: list,
314
query: dict,
315
limit: int = None,
316
filter: 'Filter' = None,
317
sort: dict = None,
318
context: dict = None,
319
**kwargs
320
) -> Query:
321
"""
322
Execute a search query to find dimension values matching search specifications.
323
324
Parameters:
325
- datasource: Data source to query
326
- granularity: Time granularity
327
- intervals: ISO-8601 intervals to query
328
- searchDimensions: List of dimensions to search within
329
- query: Search query specification (e.g., {"type": "insensitive_contains", "value": "text"})
330
- limit: Maximum number of results to return (optional)
331
- filter: Filter to apply (optional)
332
- sort: Sort specification (optional)
333
- context: Query context parameters (optional)
334
335
Returns:
336
Query object containing search results
337
"""
338
```
339
340
## Result Export
341
342
All query methods return Query objects that provide export capabilities:
343
344
```python
345
# Export to pandas DataFrame (requires pandas)
346
df = client.export_pandas()
347
348
# Export to TSV file
349
client.export_tsv('results.tsv')
350
351
# Access raw results
352
results = client.result
353
query_dict = client.query_dict
354
```