or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

asynchronous-client.mdcommand-line-interface.mddatabase-api.mdindex.mdquery-utilities.mdsqlalchemy-integration.mdsynchronous-client.md

synchronous-client.mddocs/

0

# Synchronous Client

1

2

The PyDruid synchronous client provides a comprehensive interface for executing Druid queries with support for all query types, authentication, proxy configuration, and flexible result export capabilities.

3

4

## Capabilities

5

6

### Client Initialization

7

8

Creates a new PyDruid client instance for connecting to a Druid broker.

9

10

```python { .api }

11

class PyDruid:

12

def __init__(self, url: str, endpoint: str, cafile: str = None) -> None:

13

"""

14

Initialize PyDruid client.

15

16

Parameters:

17

- url: URL of Broker node in the Druid cluster

18

- endpoint: Endpoint that Broker listens for queries on (typically 'druid/v2/')

19

- cafile: Path to CA certificate file for SSL verification (optional)

20

"""

21

```

22

23

### Authentication and Configuration

24

25

Configure client authentication and proxy settings.

26

27

```python { .api }

28

def set_basic_auth_credentials(self, username: str, password: str) -> None:

29

"""

30

Set HTTP Basic Authentication credentials.

31

32

Parameters:

33

- username: Username for authentication

34

- password: Password for authentication

35

"""

36

37

def set_proxies(self, proxies: dict) -> None:

38

"""

39

Configure proxy settings for HTTP requests.

40

41

Parameters:

42

- proxies: Dictionary mapping protocol names to proxy URLs

43

Example: {'http': 'http://proxy.example.com:8080'}

44

"""

45

```

46

47

### TopN Queries

48

49

Execute TopN queries to retrieve the top values for a dimension sorted by a metric.

50

51

```python { .api }

52

def topn(

53

self,

54

datasource: str,

55

granularity: str,

56

intervals: str | list,

57

aggregations: dict,

58

dimension: str,

59

metric: str,

60

threshold: int,

61

filter: 'Filter' = None,

62

post_aggregations: dict = None,

63

context: dict = None,

64

**kwargs

65

) -> Query:

66

"""

67

Execute a TopN query.

68

69

Parameters:

70

- datasource: Data source to query

71

- granularity: Time granularity ('all', 'day', 'hour', 'minute', etc.)

72

- intervals: ISO-8601 intervals ('2014-02-02/p4w' or list of intervals)

73

- aggregations: Dict mapping aggregator names to aggregator specifications

74

- dimension: Dimension to run the query against

75

- metric: Metric to sort the dimension values by

76

- threshold: Number of top items to return

77

- filter: Filter to apply to the data (optional)

78

- post_aggregations: Dict of post-aggregations to compute (optional)

79

- context: Query context parameters (optional)

80

81

Returns:

82

Query object containing results and metadata

83

"""

84

```

85

86

Example usage:

87

88

```python

89

from pydruid.client import PyDruid

90

from pydruid.utils.aggregators import doublesum

91

from pydruid.utils.filters import Dimension

92

93

client = PyDruid('http://localhost:8082', 'druid/v2/')

94

95

top = client.topn(

96

datasource='twitterstream',

97

granularity='all',

98

intervals='2014-03-03/p1d',

99

aggregations={'count': doublesum('count')},

100

dimension='user_mention_name',

101

filter=(Dimension('user_lang') == 'en') & (Dimension('first_hashtag') == 'oscars'),

102

metric='count',

103

threshold=10

104

)

105

106

df = client.export_pandas()

107

```

108

109

### Timeseries Queries

110

111

Execute timeseries queries to retrieve aggregated data over time intervals.

112

113

```python { .api }

114

def timeseries(

115

self,

116

datasource: str,

117

granularity: str,

118

intervals: str | list,

119

aggregations: dict,

120

filter: 'Filter' = None,

121

post_aggregations: dict = None,

122

context: dict = None,

123

**kwargs

124

) -> Query:

125

"""

126

Execute a timeseries query.

127

128

Parameters:

129

- datasource: Data source to query

130

- granularity: Time granularity for aggregation

131

- intervals: ISO-8601 intervals to query

132

- aggregations: Dict mapping aggregator names to aggregator specifications

133

- filter: Filter to apply to the data (optional)

134

- post_aggregations: Dict of post-aggregations to compute (optional)

135

- context: Query context parameters (optional)

136

137

Returns:

138

Query object containing time-series results

139

"""

140

```

141

142

### GroupBy Queries

143

144

Execute groupBy queries to group data by one or more dimensions with aggregations.

145

146

```python { .api }

147

def groupby(

148

self,

149

datasource: str,

150

granularity: str,

151

intervals: str | list,

152

dimensions: list,

153

aggregations: dict,

154

filter: 'Filter' = None,

155

having: 'Having' = None,

156

post_aggregations: dict = None,

157

limit_spec: dict = None,

158

context: dict = None,

159

**kwargs

160

) -> Query:

161

"""

162

Execute a groupBy query.

163

164

Parameters:

165

- datasource: Data source to query

166

- granularity: Time granularity for grouping

167

- intervals: ISO-8601 intervals to query

168

- dimensions: List of dimensions to group by

169

- aggregations: Dict mapping aggregator names to aggregator specifications

170

- filter: Filter to apply to the data (optional)

171

- having: Having clause for filtering grouped results (optional)

172

- post_aggregations: Dict of post-aggregations to compute (optional)

173

- limit_spec: Specification for limiting and ordering results (optional)

174

- context: Query context parameters (optional)

175

176

Returns:

177

Query object containing grouped results

178

"""

179

```

180

181

### Metadata Queries

182

183

Query metadata about datasources and segments.

184

185

```python { .api }

186

def segment_metadata(

187

self,

188

datasource: str,

189

intervals: str | list = None,

190

context: dict = None,

191

**kwargs

192

) -> Query:

193

"""

194

Execute a segment metadata query.

195

196

Parameters:

197

- datasource: Data source to analyze

198

- intervals: ISO-8601 intervals to analyze (optional, defaults to all)

199

- context: Query context parameters (optional)

200

201

Returns:

202

Query object containing segment metadata

203

"""

204

205

def time_boundary(

206

self,

207

datasource: str,

208

context: dict = None,

209

**kwargs

210

) -> Query:

211

"""

212

Execute a time boundary query.

213

214

Parameters:

215

- datasource: Data source to query

216

- context: Query context parameters (optional)

217

218

Returns:

219

Query object containing time boundary information

220

"""

221

```

222

223

### Advanced Query Types

224

225

Execute select, scan, and sub-query operations for raw data access and query composition.

226

227

```python { .api }

228

def subquery(

229

self,

230

**kwargs

231

) -> dict:

232

"""

233

Create a sub-query for use in nested queries.

234

235

Parameters:

236

- **kwargs: Query parameters (datasource, granularity, intervals, etc.)

237

238

Returns:

239

Dictionary representation of query (not executed)

240

241

Note:

242

This method returns a query dictionary without executing it,

243

allowing it to be used as a datasource in other queries.

244

"""

245

```

246

247

Execute select and scan queries for raw data access.

248

249

```python { .api }

250

def select(

251

self,

252

datasource: str,

253

granularity: str,

254

intervals: str | list,

255

dimensions: list = None,

256

metrics: list = None,

257

filter: 'Filter' = None,

258

paging_spec: dict = None,

259

context: dict = None,

260

**kwargs

261

) -> Query:

262

"""

263

Execute a select query for raw data access.

264

265

Parameters:

266

- datasource: Data source to query

267

- granularity: Time granularity

268

- intervals: ISO-8601 intervals to query

269

- dimensions: List of dimensions to include (optional)

270

- metrics: List of metrics to include (optional)

271

- filter: Filter to apply (optional)

272

- paging_spec: Paging specification for large result sets (optional)

273

- context: Query context parameters (optional)

274

275

Returns:

276

Query object containing raw data

277

"""

278

279

def scan(

280

self,

281

datasource: str,

282

granularity: str,

283

intervals: str | list,

284

limit: int,

285

columns: list = None,

286

metrics: list = None,

287

filter: 'Filter' = None,

288

context: dict = None,

289

**kwargs

290

) -> Query:

291

"""

292

Execute a scan query for raw data access.

293

294

Parameters:

295

- datasource: Data source to query

296

- granularity: Time granularity

297

- intervals: ISO-8601 intervals to query

298

- limit: Maximum number of rows to return

299

- columns: List of columns to select (optional, all columns if empty)

300

- metrics: List of metrics to select (optional, all metrics if empty)

301

- filter: Filter to apply (optional)

302

- context: Query context parameters (optional)

303

304

Returns:

305

Query object containing scan results

306

"""

307

308

def search(

309

self,

310

datasource: str,

311

granularity: str,

312

intervals: str | list,

313

searchDimensions: list,

314

query: dict,

315

limit: int = None,

316

filter: 'Filter' = None,

317

sort: dict = None,

318

context: dict = None,

319

**kwargs

320

) -> Query:

321

"""

322

Execute a search query to find dimension values matching search specifications.

323

324

Parameters:

325

- datasource: Data source to query

326

- granularity: Time granularity

327

- intervals: ISO-8601 intervals to query

328

- searchDimensions: List of dimensions to search within

329

- query: Search query specification (e.g., {"type": "insensitive_contains", "value": "text"})

330

- limit: Maximum number of results to return (optional)

331

- filter: Filter to apply (optional)

332

- sort: Sort specification (optional)

333

- context: Query context parameters (optional)

334

335

Returns:

336

Query object containing search results

337

"""

338

```

339

340

## Result Export

341

342

All query methods return Query objects that provide export capabilities:

343

344

```python

345

# Export to pandas DataFrame (requires pandas)

346

df = client.export_pandas()

347

348

# Export to TSV file

349

client.export_tsv('results.tsv')

350

351

# Access raw results

352

results = client.result

353

query_dict = client.query_dict

354

```