or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

auth-service.mdcompute-service.mdcore-framework.mdflows-service.mdgcs-service.mdgroups-service.mdindex.mdsearch-service.mdtimers-service.mdtransfer-service.md

search-service.mddocs/

0

# Search Service

1

2

Metadata indexing and search capabilities for research data discovery with support for custom schemas, faceted search, and real-time indexing. The Search service enables powerful data discovery across distributed research collections with rich metadata search and filtering capabilities.

3

4

## Capabilities

5

6

### Search Client

7

8

Core client for search index management, data indexing, and query operations with comprehensive search functionality and metadata management.

9

10

```python { .api }

11

class SearchClient(BaseClient):

12

"""

13

Client for Globus Search service operations.

14

15

Provides methods for index management, data ingestion, and search queries

16

with support for both simple and advanced search capabilities including

17

filters, facets, and complex query structures.

18

"""

19

20

def __init__(

21

self,

22

*,

23

app: GlobusApp | None = None,

24

authorizer: GlobusAuthorizer | None = None,

25

environment: str | None = None,

26

base_url: str | None = None,

27

**kwargs

28

) -> None: ...

29

```

30

31

### Index Management

32

33

Create, configure, and manage search indices for organizing and discovering research data with custom schemas and policies.

34

35

```python { .api }

36

def create_index(

37

self,

38

display_name: str,

39

description: str

40

) -> GlobusHTTPResponse:

41

"""

42

Create a new search index.

43

44

Creates a new index for storing and searching metadata documents.

45

New indices default to trial status and may have usage limitations.

46

47

Parameters:

48

- display_name: Human-readable name for the index

49

- description: Detailed description of the index purpose and content

50

51

Returns:

52

GlobusHTTPResponse with created index details including ID

53

"""

54

55

def get_index(

56

self,

57

index_id: str | UUID,

58

*,

59

query_params: dict[str, Any] | None = None

60

) -> GlobusHTTPResponse:

61

"""

62

Get index configuration and metadata.

63

64

Returns complete index information including schema, statistics,

65

access policies, and configuration settings.

66

67

Parameters:

68

- index_id: UUID of the index to retrieve

69

- query_params: Additional query parameters

70

71

Returns:

72

GlobusHTTPResponse with index configuration and statistics

73

"""

74

75

def get_index_list(

76

self,

77

*,

78

query_params: dict[str, Any] | None = None

79

) -> IndexListResponse:

80

"""

81

List accessible search indices.

82

83

Returns all indices the user can access including owned indices

84

and indices shared with appropriate permissions.

85

86

Parameters:

87

- query_params: Additional query parameters for filtering

88

89

Returns:

90

IndexListResponse with paginated index listings

91

"""

92

93

def delete_index(self, index_id: str | UUID) -> GlobusHTTPResponse:

94

"""

95

Mark an index for deletion.

96

97

Sets index status to "delete-pending". Actual deletion happens

98

asynchronously and may take time to complete fully.

99

100

Parameters:

101

- index_id: UUID of index to delete

102

103

Returns:

104

GlobusHTTPResponse confirming deletion request

105

"""

106

```

107

108

### Data Ingestion and Management

109

110

Add, update, and remove data from search indices with support for batch operations and real-time indexing.

111

112

```python { .api }

113

def ingest(

114

self,

115

index_id: str | UUID,

116

data: dict[str, Any]

117

) -> GlobusHTTPResponse:

118

"""

119

Ingest data into a search index.

120

121

Adds or updates documents in the index as an asynchronous task.

122

Data can be provided as a single document or list of documents

123

with flexible schema support for metadata organization.

124

125

Parameters:

126

- index_id: UUID of the target index

127

- data: Document(s) to ingest, can be single dict or list of dicts

128

129

Returns:

130

GlobusHTTPResponse with ingestion task information

131

"""

132

133

def delete_by_query(

134

self,

135

index_id: str | UUID,

136

data: dict[str, Any]

137

) -> GlobusHTTPResponse:

138

"""

139

Delete documents matching a query.

140

141

Removes all documents that match the specified query criteria

142

as an asynchronous task, enabling bulk deletion operations.

143

144

Parameters:

145

- index_id: UUID of the index

146

- data: Query specification for documents to delete

147

148

Returns:

149

GlobusHTTPResponse with deletion task information

150

"""

151

152

def get_task(

153

self,

154

task_id: str | UUID,

155

*,

156

query_params: dict[str, Any] | None = None

157

) -> GlobusHTTPResponse:

158

"""

159

Get status of an indexing task.

160

161

Returns current status and results of ingestion or deletion tasks,

162

useful for monitoring asynchronous operations.

163

164

Parameters:

165

- task_id: UUID of the task to check

166

- query_params: Additional query parameters

167

168

Returns:

169

GlobusHTTPResponse with task status and results

170

"""

171

172

def get_task_list(

173

self,

174

*,

175

query_params: dict[str, Any] | None = None

176

) -> GlobusHTTPResponse:

177

"""

178

List indexing tasks with optional filtering.

179

180

Returns tasks for monitoring ingestion and deletion operations

181

across all accessible indices.

182

183

Parameters:

184

- query_params: Query parameters for filtering tasks

185

186

Returns:

187

GlobusHTTPResponse with task listings

188

"""

189

```

190

191

### Search and Query Operations

192

193

Perform powerful searches with support for simple queries, advanced syntax, filters, facets, and result pagination.

194

195

```python { .api }

196

def search(

197

self,

198

index_id: str | UUID,

199

q: str,

200

*,

201

offset: int = 0,

202

limit: int = 10,

203

advanced: bool = True,

204

query_params: dict[str, Any] | None = None

205

) -> GlobusHTTPResponse:

206

"""

207

Perform a simple search query.

208

209

Executes a text search against the index with basic parameters

210

for straightforward search operations.

211

212

Parameters:

213

- index_id: UUID of the index to search

214

- q: Query string for search

215

- offset: Starting position for results (pagination)

216

- limit: Maximum number of results to return

217

- advanced: Enable advanced query syntax (default: True)

218

- query_params: Additional query parameters

219

220

Returns:

221

GlobusHTTPResponse with search results and metadata

222

"""

223

224

def post_search(

225

self,

226

index_id: str | UUID,

227

data: dict[str, Any] | SearchQuery,

228

*,

229

offset: int | None = None,

230

limit: int | None = None,

231

query_params: dict[str, Any] | None = None

232

) -> GlobusHTTPResponse:

233

"""

234

Perform an advanced search with complex query structure.

235

236

Supports sophisticated queries including filters, facets,

237

sorting, and other advanced search features using POST body.

238

239

Parameters:

240

- index_id: UUID of the index to search

241

- data: Complex search query specification or SearchQuery object

242

- offset: Starting position for results

243

- limit: Maximum results to return

244

- query_params: Additional parameters

245

246

Returns:

247

GlobusHTTPResponse with comprehensive search results

248

"""

249

250

def scroll_search(

251

self,

252

index_id: str | UUID,

253

data: SearchScrollQuery | dict[str, Any],

254

*,

255

query_params: dict[str, Any] | None = None

256

) -> GlobusHTTPResponse:

257

"""

258

Perform a scroll search for large result sets.

259

260

Uses scroll API for efficiently retrieving large numbers of

261

search results without traditional pagination limitations.

262

263

Parameters:

264

- index_id: UUID of the index to search

265

- data: Scroll query specification

266

- query_params: Additional parameters

267

268

Returns:

269

GlobusHTTPResponse with scroll results and continuation token

270

"""

271

```

272

273

### Query Builder Classes

274

275

Type-safe query construction with chainable API for building complex search queries with filters, facets, and advanced options.

276

277

```python { .api }

278

class SearchQuery(PayloadWrapper):

279

"""

280

Modern search query builder for constructing complex search requests.

281

282

Provides a fluent API for building queries with filters, facets,

283

sorting, and other advanced search features with type safety.

284

"""

285

286

def __init__(

287

self,

288

q: str | None = None,

289

*,

290

offset: int | None = None,

291

limit: int | None = None,

292

advanced: bool | None = None,

293

**kwargs

294

) -> None: ...

295

296

def set_query(self, query: str) -> SearchQuery:

297

"""

298

Set the main query string.

299

300

Parameters:

301

- query: Text query to search for

302

303

Returns:

304

Self for method chaining

305

"""

306

307

def set_limit(self, limit: int) -> SearchQuery:

308

"""

309

Set maximum number of results to return.

310

311

Parameters:

312

- limit: Maximum results per page

313

314

Returns:

315

Self for method chaining

316

"""

317

318

def set_offset(self, offset: int) -> SearchQuery:

319

"""

320

Set starting position for results (pagination).

321

322

Parameters:

323

- offset: Starting result position

324

325

Returns:

326

Self for method chaining

327

"""

328

329

def set_advanced(self, advanced: bool) -> SearchQuery:

330

"""

331

Enable or disable advanced query syntax.

332

333

Parameters:

334

- advanced: Whether to use advanced query parsing

335

336

Returns:

337

Self for method chaining

338

"""

339

340

def add_filter(

341

self,

342

field_name: str,

343

values: list[str],

344

*,

345

type: str = "match_all",

346

additional_fields: dict[str, Any] | None = None

347

) -> SearchQuery:

348

"""

349

Add a filter to constrain search results.

350

351

Parameters:

352

- field_name: Field to filter on

353

- values: Values to match in the filter

354

- type: Filter type (match_all, match_any, range, etc.)

355

- additional_fields: Additional filter configuration

356

357

Returns:

358

Self for method chaining

359

"""

360

361

def add_facet(

362

self,

363

name: str,

364

field_name: str,

365

*,

366

type: str = "terms",

367

size: int | None = None,

368

additional_fields: dict[str, Any] | None = None

369

) -> SearchQuery:

370

"""

371

Add a facet for result aggregation.

372

373

Parameters:

374

- name: Name for the facet in results

375

- field_name: Field to facet on

376

- type: Facet type (terms, date_histogram, etc.)

377

- size: Maximum facet values to return

378

- additional_fields: Additional facet configuration

379

380

Returns:

381

Self for method chaining

382

"""

383

384

def add_sort(

385

self,

386

field_name: str,

387

order: str = "asc"

388

) -> SearchQuery:

389

"""

390

Add sort criteria to results.

391

392

Parameters:

393

- field_name: Field to sort by

394

- order: Sort order (asc, desc)

395

396

Returns:

397

Self for method chaining

398

"""

399

400

def set_field_list(self, fields: list[str]) -> SearchQuery:

401

"""

402

Specify which fields to return in results.

403

404

Parameters:

405

- fields: List of field names to include

406

407

Returns:

408

Self for method chaining

409

"""

410

411

class SearchQueryV1(SearchQuery):

412

"""

413

Legacy search query builder for API v1 compatibility.

414

415

Maintains compatibility with older search API versions

416

while providing similar functionality to modern SearchQuery.

417

"""

418

419

class SearchScrollQuery(PayloadWrapper):

420

"""

421

Query builder for scroll-based search operations.

422

423

Designed for efficiently retrieving large result sets using

424

the scroll API pattern for deep pagination.

425

"""

426

427

def __init__(

428

self,

429

q: str | None = None,

430

*,

431

limit: int | None = None,

432

advanced: bool | None = None,

433

scroll: str | None = None,

434

scroll_id: str | None = None,

435

**kwargs

436

) -> None: ...

437

438

def set_scroll_size(self, size: int) -> SearchScrollQuery:

439

"""Set the scroll window size for batch retrieval."""

440

441

def set_scroll_id(self, scroll_id: str) -> SearchScrollQuery:

442

"""Set scroll ID for continuing a scroll operation."""

443

```

444

445

### Response Objects

446

447

Specialized response classes providing enhanced access to search results and index listings with iteration support.

448

449

```python { .api }

450

class IndexListResponse(GlobusHTTPResponse):

451

"""

452

Response class for index listing operations.

453

454

Provides enhanced access to index listings with metadata

455

and convenient iteration over available indices.

456

"""

457

458

def __iter__(self) -> Iterator[dict[str, Any]]:

459

"""Iterate over index records."""

460

```

461

462

### Error Handling

463

464

Search-specific error handling for indexing operations and query processing.

465

466

```python { .api }

467

class SearchAPIError(GlobusAPIError):

468

"""

469

Error class for Search service API errors.

470

471

Provides enhanced error handling for search-specific error

472

conditions including indexing failures and query syntax errors.

473

"""

474

```

475

476

## Common Usage Patterns

477

478

### Basic Index Setup and Data Ingestion

479

480

```python

481

from globus_sdk import SearchClient

482

483

# Initialize search client

484

search_client = SearchClient(authorizer=authorizer)

485

486

# Create a new index for research data

487

index_response = search_client.create_index(

488

display_name="Climate Research Data",

489

description="Searchable metadata for climate research datasets"

490

)

491

index_id = index_response["id"]

492

493

# Ingest sample data documents

494

documents = [

495

{

496

"subject": "Temperature Measurements - Station A",

497

"description": "Daily temperature recordings from weather station A",

498

"creator": "Climate Research Lab",

499

"date_created": "2024-01-15",

500

"keywords": ["temperature", "climate", "weather"],

501

"data_type": "time-series",

502

"location": {"lat": 40.7128, "lon": -74.0060},

503

"file_format": "CSV",

504

"size_mb": 15.2

505

},

506

{

507

"subject": "Precipitation Data - Regional Survey",

508

"description": "Monthly precipitation measurements across the region",

509

"creator": "Weather Monitoring Network",

510

"date_created": "2024-02-01",

511

"keywords": ["precipitation", "rainfall", "climate", "regional"],

512

"data_type": "geospatial",

513

"location": {"lat": 41.8781, "lon": -87.6298},

514

"file_format": "NetCDF",

515

"size_mb": 45.7

516

}

517

]

518

519

# Ingest documents into the index

520

ingest_response = search_client.ingest(index_id, {"ingest_type": "GMetaList", "ingest_data": {"gmeta": documents}})

521

task_id = ingest_response["task_id"]

522

523

print(f"Ingestion task started: {task_id}")

524

```

525

526

### Advanced Search with Filters and Facets

527

528

```python

529

from globus_sdk import SearchQuery

530

531

# Build a complex search query

532

query = (SearchQuery("climate temperature")

533

.set_limit(20)

534

.set_advanced(True)

535

.add_filter("data_type", ["time-series", "geospatial"], type="match_any")

536

.add_filter("size_mb", ["0", "50"], type="range")

537

.add_facet("creator", "creator", size=10)

538

.add_facet("keywords", "keywords", size=20)

539

.add_facet("location_facet", "location.country", size=15)

540

.add_sort("date_created", "desc")

541

.set_field_list(["subject", "description", "creator", "date_created", "keywords"])

542

)

543

544

# Execute the search

545

results = search_client.post_search(index_id, query)

546

547

print(f"Found {results['total']} results")

548

549

# Process search results

550

for hit in results["gmeta"]:

551

content = hit["content"][0]

552

print(f"Title: {content['subject']}")

553

print(f"Creator: {content['creator']}")

554

print(f"Keywords: {', '.join(content.get('keywords', []))}")

555

print("---")

556

557

# Process facets for building user interface

558

facets = results.get("facet_results", [])

559

for facet in facets:

560

print(f"\nFacet: {facet['name']}")

561

for bucket in facet["buckets"]:

562

print(f" {bucket['value']}: {bucket['count']}")

563

```

564

565

### Simple Text Search

566

567

```python

568

# Simple keyword search

569

simple_results = search_client.search(

570

index_id,

571

q="temperature climate data",

572

limit=10,

573

offset=0,

574

advanced=True

575

)

576

577

print(f"Simple search found {simple_results['total']} results")

578

for hit in simple_results["gmeta"]:

579

content = hit["content"][0]

580

print(f"- {content['subject']}")

581

```

582

583

### Scroll Search for Large Result Sets

584

585

```python

586

from globus_sdk import SearchScrollQuery

587

588

# Create scroll query for large datasets

589

scroll_query = SearchScrollQuery("*") # Match all documents

590

scroll_query.set_scroll_size(100) # Retrieve 100 at a time

591

592

# Start scrolling

593

scroll_response = search_client.scroll_search(index_id, scroll_query)

594

595

all_results = []

596

scroll_id = scroll_response.get("scroll_id")

597

598

# Continue scrolling until no more results

599

while scroll_response.get("gmeta"):

600

all_results.extend(scroll_response["gmeta"])

601

print(f"Retrieved {len(scroll_response['gmeta'])} more results")

602

603

if not scroll_id:

604

break

605

606

# Continue with next batch

607

continue_query = SearchScrollQuery().set_scroll_id(scroll_id)

608

scroll_response = search_client.scroll_search(index_id, continue_query)

609

scroll_id = scroll_response.get("scroll_id")

610

611

print(f"Total results retrieved: {len(all_results)}")

612

```

613

614

### Data Management and Updates

615

616

```python

617

# Update existing documents (replace by ID)

618

updated_doc = {

619

"subject": "Temperature Measurements - Station A (Updated)",

620

"description": "Updated daily temperature recordings with quality control",

621

"creator": "Climate Research Lab",

622

"date_created": "2024-01-15",

623

"date_modified": "2024-03-01",

624

"keywords": ["temperature", "climate", "weather", "quality-controlled"],

625

"data_type": "time-series",

626

"version": "2.0"

627

}

628

629

# Ingest updated document (same ID will replace)

630

update_response = search_client.ingest(

631

index_id,

632

{"ingest_type": "GMetaList", "ingest_data": {"gmeta": [updated_doc]}}

633

)

634

635

# Delete documents matching criteria

636

delete_query = {

637

"q": "*",

638

"filters": [

639

{

640

"field_name": "data_type",

641

"values": ["obsolete"],

642

"type": "match_any"

643

}

644

]

645

}

646

647

delete_response = search_client.delete_by_query(index_id, delete_query)

648

print(f"Deletion task: {delete_response['task_id']}")

649

```

650

651

### Task Monitoring

652

653

```python

654

import time

655

656

# Monitor ingestion/deletion tasks

657

def wait_for_task(search_client, task_id, timeout=300):

658

start_time = time.time()

659

while time.time() - start_time < timeout:

660

task_status = search_client.get_task(task_id)

661

662

status = task_status.get("state", "PENDING")

663

print(f"Task {task_id}: {status}")

664

665

if status in ["SUCCESS", "FAILED"]:

666

return task_status

667

668

time.sleep(5)

669

670

raise TimeoutError(f"Task {task_id} did not complete within {timeout} seconds")

671

672

# Wait for ingestion to complete

673

try:

674

final_status = wait_for_task(search_client, task_id)

675

if final_status["state"] == "SUCCESS":

676

print("Ingestion completed successfully")

677

else:

678

print(f"Ingestion failed: {final_status.get('message')}")

679

except TimeoutError as e:

680

print(e)

681

```

682

683

### Index Management and Administration

684

685

```python

686

# List all accessible indices

687

indices = search_client.get_index_list()

688

for index in indices:

689

print(f"Index: {index['display_name']} ({index['id']})")

690

print(f" Description: {index['description']}")

691

print(f" Status: {index.get('status', 'active')}")

692

print(f" Document count: {index.get('size', 'unknown')}")

693

694

# Get detailed index information

695

index_details = search_client.get_index(index_id)

696

print(f"Index created: {index_details['creation_date']}")

697

print(f"Index permissions: {index_details.get('permissions', [])}")

698

699

# List recent tasks for monitoring

700

tasks = search_client.get_task_list()

701

for task in tasks.get("tasks", []):

702

print(f"Task {task['task_id']}: {task['state']} - {task.get('task_type', 'unknown')}")

703

```

704

705

### Advanced Query Patterns

706

707

```python

708

# Geographic search with location filters

709

geo_query = (SearchQuery("research station")

710

.add_filter("location.country", ["United States", "Canada"])

711

.add_filter("coordinates", ["-90,-180", "90,180"], type="geo_bounding_box")

712

.add_facet("country_facet", "location.country")

713

.add_sort("date_created", "desc")

714

)

715

716

# Date range search

717

date_query = (SearchQuery("climate data")

718

.add_filter("date_created", ["2024-01-01", "2024-12-31"], type="date_range")

719

.add_facet("monthly", "date_created", type="date_histogram",

720

additional_fields={"interval": "month"})

721

)

722

723

# Full-text search with boosting

724

boosted_query = (SearchQuery("temperature precipitation climate")

725

.set_advanced(True)

726

.add_sort("_score", "desc") # Sort by relevance

727

.set_field_list(["subject", "description", "creator", "keywords", "_score"])

728

)

729

730

# Execute queries

731

geo_results = search_client.post_search(index_id, geo_query)

732

date_results = search_client.post_search(index_id, date_query)

733

relevance_results = search_client.post_search(index_id, boosted_query)

734

```