Tessl Tile for pypi/pydruid@0.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

asynchronous-client.md command-line-interface.md database-api.md index.md query-utilities.md sqlalchemy-integration.md synchronous-client.md

synchronous-client.mddocs/

0
# Synchronous Client
1

2
The PyDruid synchronous client provides a comprehensive interface for executing Druid queries with support for all query types, authentication, proxy configuration, and flexible result export capabilities.
3

4
## Capabilities
5

6
### Client Initialization
7

8
Creates a new PyDruid client instance for connecting to a Druid broker.
9

10
```python { .api }
11
class PyDruid:
12
    def __init__(self, url: str, endpoint: str, cafile: str = None) -> None:
13
        """
14
        Initialize PyDruid client.
15
        
16
        Parameters:
17
        - url: URL of Broker node in the Druid cluster
18
        - endpoint: Endpoint that Broker listens for queries on (typically 'druid/v2/')
19
        - cafile: Path to CA certificate file for SSL verification (optional)
20
        """
21
```
22

23
### Authentication and Configuration
24

25
Configure client authentication and proxy settings.
26

27
```python { .api }
28
def set_basic_auth_credentials(self, username: str, password: str) -> None:
29
    """
30
    Set HTTP Basic Authentication credentials.
31
    
32
    Parameters:
33
    - username: Username for authentication
34
    - password: Password for authentication
35
    """
36

37
def set_proxies(self, proxies: dict) -> None:
38
    """
39
    Configure proxy settings for HTTP requests.
40
    
41
    Parameters:
42
    - proxies: Dictionary mapping protocol names to proxy URLs
43
               Example: {'http': 'http://proxy.example.com:8080'}
44
    """
45
```
46

47
### TopN Queries
48

49
Execute TopN queries to retrieve the top values for a dimension sorted by a metric.
50

51
```python { .api }
52
def topn(
53
    self,
54
    datasource: str,
55
    granularity: str,
56
    intervals: str | list,
57
    aggregations: dict,
58
    dimension: str,
59
    metric: str,
60
    threshold: int,
61
    filter: 'Filter' = None,
62
    post_aggregations: dict = None,
63
    context: dict = None,
64
    **kwargs
65
) -> Query:
66
    """
67
    Execute a TopN query.
68
    
69
    Parameters:
70
    - datasource: Data source to query
71
    - granularity: Time granularity ('all', 'day', 'hour', 'minute', etc.)
72
    - intervals: ISO-8601 intervals ('2014-02-02/p4w' or list of intervals)
73
    - aggregations: Dict mapping aggregator names to aggregator specifications
74
    - dimension: Dimension to run the query against
75
    - metric: Metric to sort the dimension values by
76
    - threshold: Number of top items to return
77
    - filter: Filter to apply to the data (optional)
78
    - post_aggregations: Dict of post-aggregations to compute (optional)
79
    - context: Query context parameters (optional)
80
    
81
    Returns:
82
    Query object containing results and metadata
83
    """
84
```
85

86
Example usage:
87

88
```python
89
from pydruid.client import PyDruid
90
from pydruid.utils.aggregators import doublesum
91
from pydruid.utils.filters import Dimension
92

93
client = PyDruid('http://localhost:8082', 'druid/v2/')
94

95
top = client.topn(
96
    datasource='twitterstream',
97
    granularity='all',
98
    intervals='2014-03-03/p1d',
99
    aggregations={'count': doublesum('count')},
100
    dimension='user_mention_name',
101
    filter=(Dimension('user_lang') == 'en') & (Dimension('first_hashtag') == 'oscars'),
102
    metric='count',
103
    threshold=10
104
)
105

106
df = client.export_pandas()
107
```
108

109
### Timeseries Queries
110

111
Execute timeseries queries to retrieve aggregated data over time intervals.
112

113
```python { .api }
114
def timeseries(
115
    self,
116
    datasource: str,
117
    granularity: str,
118
    intervals: str | list,
119
    aggregations: dict,
120
    filter: 'Filter' = None,
121
    post_aggregations: dict = None,
122
    context: dict = None,
123
    **kwargs
124
) -> Query:
125
    """
126
    Execute a timeseries query.
127
    
128
    Parameters:
129
    - datasource: Data source to query
130
    - granularity: Time granularity for aggregation
131
    - intervals: ISO-8601 intervals to query
132
    - aggregations: Dict mapping aggregator names to aggregator specifications
133
    - filter: Filter to apply to the data (optional)
134
    - post_aggregations: Dict of post-aggregations to compute (optional)
135
    - context: Query context parameters (optional)
136
    
137
    Returns:
138
    Query object containing time-series results
139
    """
140
```
141

142
### GroupBy Queries
143

144
Execute groupBy queries to group data by one or more dimensions with aggregations.
145

146
```python { .api }
147
def groupby(
148
    self,
149
    datasource: str,
150
    granularity: str,
151
    intervals: str | list,
152
    dimensions: list,
153
    aggregations: dict,
154
    filter: 'Filter' = None,
155
    having: 'Having' = None,
156
    post_aggregations: dict = None,
157
    limit_spec: dict = None,
158
    context: dict = None,
159
    **kwargs
160
) -> Query:
161
    """
162
    Execute a groupBy query.
163
    
164
    Parameters:
165
    - datasource: Data source to query
166
    - granularity: Time granularity for grouping
167
    - intervals: ISO-8601 intervals to query
168
    - dimensions: List of dimensions to group by
169
    - aggregations: Dict mapping aggregator names to aggregator specifications
170
    - filter: Filter to apply to the data (optional)
171
    - having: Having clause for filtering grouped results (optional)
172
    - post_aggregations: Dict of post-aggregations to compute (optional)
173
    - limit_spec: Specification for limiting and ordering results (optional)
174
    - context: Query context parameters (optional)
175
    
176
    Returns:
177
    Query object containing grouped results
178
    """
179
```
180

181
### Metadata Queries
182

183
Query metadata about datasources and segments.
184

185
```python { .api }
186
def segment_metadata(
187
    self,
188
    datasource: str,
189
    intervals: str | list = None,
190
    context: dict = None,
191
    **kwargs
192
) -> Query:
193
    """
194
    Execute a segment metadata query.
195
    
196
    Parameters:
197
    - datasource: Data source to analyze
198
    - intervals: ISO-8601 intervals to analyze (optional, defaults to all)
199
    - context: Query context parameters (optional)
200
    
201
    Returns:
202
    Query object containing segment metadata
203
    """
204

205
def time_boundary(
206
    self,
207
    datasource: str,
208
    context: dict = None,
209
    **kwargs
210
) -> Query:
211
    """
212
    Execute a time boundary query.
213
    
214
    Parameters:
215
    - datasource: Data source to query
216
    - context: Query context parameters (optional)
217
    
218
    Returns:
219
    Query object containing time boundary information
220
    """
221
```
222

223
### Advanced Query Types
224

225
Execute select, scan, and sub-query operations for raw data access and query composition.
226

227
```python { .api }
228
def subquery(
229
    self,
230
    **kwargs
231
) -> dict:
232
    """
233
    Create a sub-query for use in nested queries.
234
    
235
    Parameters:
236
    - **kwargs: Query parameters (datasource, granularity, intervals, etc.)
237
    
238
    Returns:
239
    Dictionary representation of query (not executed)
240
    
241
    Note:
242
    This method returns a query dictionary without executing it,
243
    allowing it to be used as a datasource in other queries.
244
    """
245
```
246

247
Execute select and scan queries for raw data access.
248

249
```python { .api }
250
def select(
251
    self,
252
    datasource: str,
253
    granularity: str,
254
    intervals: str | list,
255
    dimensions: list = None,
256
    metrics: list = None,
257
    filter: 'Filter' = None,
258
    paging_spec: dict = None,
259
    context: dict = None,
260
    **kwargs
261
) -> Query:
262
    """
263
    Execute a select query for raw data access.
264
    
265
    Parameters:
266
    - datasource: Data source to query
267
    - granularity: Time granularity
268
    - intervals: ISO-8601 intervals to query
269
    - dimensions: List of dimensions to include (optional)
270
    - metrics: List of metrics to include (optional)
271
    - filter: Filter to apply (optional)
272
    - paging_spec: Paging specification for large result sets (optional)
273
    - context: Query context parameters (optional)
274
    
275
    Returns:
276
    Query object containing raw data
277
    """
278

279
def scan(
280
    self,
281
    datasource: str,
282
    granularity: str,
283
    intervals: str | list,
284
    limit: int,
285
    columns: list = None,
286
    metrics: list = None,
287
    filter: 'Filter' = None,
288
    context: dict = None,
289
    **kwargs
290
) -> Query:
291
    """
292
    Execute a scan query for raw data access.
293
    
294
    Parameters:
295
    - datasource: Data source to query
296
    - granularity: Time granularity
297
    - intervals: ISO-8601 intervals to query
298
    - limit: Maximum number of rows to return
299
    - columns: List of columns to select (optional, all columns if empty)
300
    - metrics: List of metrics to select (optional, all metrics if empty)
301
    - filter: Filter to apply (optional)
302
    - context: Query context parameters (optional)
303
    
304
    Returns:
305
    Query object containing scan results
306
    """
307

308
def search(
309
    self,
310
    datasource: str,
311
    granularity: str,
312
    intervals: str | list,
313
    searchDimensions: list,
314
    query: dict,
315
    limit: int = None,
316
    filter: 'Filter' = None,
317
    sort: dict = None,
318
    context: dict = None,
319
    **kwargs
320
) -> Query:
321
    """
322
    Execute a search query to find dimension values matching search specifications.
323
    
324
    Parameters:
325
    - datasource: Data source to query
326
    - granularity: Time granularity
327
    - intervals: ISO-8601 intervals to query
328
    - searchDimensions: List of dimensions to search within
329
    - query: Search query specification (e.g., {"type": "insensitive_contains", "value": "text"})
330
    - limit: Maximum number of results to return (optional)
331
    - filter: Filter to apply (optional)
332
    - sort: Sort specification (optional)
333
    - context: Query context parameters (optional)
334
    
335
    Returns:
336
    Query object containing search results
337
    """
338
```
339

340
## Result Export
341

342
All query methods return Query objects that provide export capabilities:
343

344
```python
345
# Export to pandas DataFrame (requires pandas)
346
df = client.export_pandas()
347

348
# Export to TSV file
349
client.export_tsv('results.tsv')
350

351
# Access raw results
352
results = client.result
353
query_dict = client.query_dict
354
```

Version

Tile

Files

synchronous-client.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

synchronous-client.mddocs/