or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-conversion.mderddap-client.mdindex.mdmulti-server-search.mdserver-management.md

multi-server-search.mddocs/

0

# Multi-Server Search

1

2

Search capabilities across multiple ERDDAP servers simultaneously with optional parallel processing. These functions allow you to discover datasets across the entire ERDDAP ecosystem rather than searching individual servers one by one.

3

4

**Note:** These functions must be imported directly from `erddapy.multiple_server_search` as they are not included in the main package exports.

5

6

## Capabilities

7

8

### Simple Multi-Server Search

9

10

Search multiple ERDDAP servers for datasets matching a query string using Google-like search syntax.

11

12

```python { .api }

13

def search_servers(

14

query: str,

15

*,

16

servers_list: list[str] = None,

17

parallel: bool = False,

18

protocol: str = "tabledap"

19

) -> DataFrame:

20

"""

21

Search all servers for a query string.

22

23

Parameters:

24

- query: Search terms with Google-like syntax:

25

* Words separated by spaces (searches separately)

26

* "quoted phrases" for exact matches

27

* -excludedWord to exclude terms

28

* -"excluded phrase" to exclude phrases

29

* Partial word matching (e.g., "spee" matches "speed")

30

- servers_list: Optional list of server URLs. If None, searches all servers

31

- parallel: If True, uses joblib for parallel processing

32

- protocol: 'tabledap' or 'griddap'

33

34

Returns:

35

- pandas.DataFrame with columns: Title, Institution, Dataset ID, Server url

36

"""

37

```

38

39

**Usage Examples:**

40

41

```python

42

from erddapy.multiple_server_search import search_servers

43

44

# Basic search across all servers

45

results = search_servers("temperature salinity")

46

print(f"Found {len(results)} datasets")

47

print(results[['Title', 'Institution', 'Dataset ID']].head())

48

49

# Search for exact phrase

50

buoy_data = search_servers('"sea surface temperature"')

51

52

# Exclude certain terms

53

ocean_not_air = search_servers('temperature -air -atmospheric')

54

55

# Search specific servers only

56

coastal_servers = [

57

"http://erddap.secoora.org/erddap",

58

"http://www.neracoos.org/erddap"

59

]

60

coastal_results = search_servers(

61

"glider",

62

servers_list=coastal_servers

63

)

64

65

# Parallel search for faster results

66

large_search = search_servers(

67

"chlorophyll",

68

parallel=True

69

)

70

```

71

72

### Advanced Multi-Server Search

73

74

Advanced search with detailed constraint parameters for precise dataset discovery.

75

76

```python { .api }

77

def advanced_search_servers(

78

servers_list: list[str] = None,

79

*,

80

parallel: bool = False,

81

protocol: str = "tabledap",

82

**kwargs

83

) -> DataFrame:

84

"""

85

Advanced search across multiple ERDDAP servers with constraints.

86

87

Parameters:

88

- servers_list: Optional list of server URLs. If None, searches all servers

89

- parallel: If True, uses joblib for parallel processing

90

- protocol: 'tabledap' or 'griddap'

91

- **kwargs: Search constraints including:

92

* search_for: Query string (same as search_servers)

93

* cdm_data_type, institution, ioos_category: Metadata filters

94

* keywords, long_name, standard_name, variableName: Variable filters

95

* minLon, maxLon, minLat, maxLat: Geographic bounds

96

* minTime, maxTime: Temporal bounds

97

* items_per_page, page: Pagination controls

98

99

Returns:

100

- pandas.DataFrame with matching datasets

101

"""

102

```

103

104

**Usage Examples:**

105

106

```python

107

from erddapy.multiple_server_search import advanced_search_servers

108

109

# Geographic and temporal constraints

110

gulf_data = advanced_search_servers(

111

search_for="temperature",

112

minLat=25.0,

113

maxLat=31.0,

114

minLon=-98.0,

115

maxLon=-80.0,

116

minTime="2020-01-01T00:00:00Z",

117

maxTime="2020-12-31T23:59:59Z",

118

parallel=True

119

)

120

121

# Filter by data type and institution

122

mooring_data = advanced_search_servers(

123

cdm_data_type="TimeSeries",

124

institution="NOAA",

125

ioos_category="Temperature"

126

)

127

128

# Search by variable characteristics

129

salinity_vars = advanced_search_servers(

130

standard_name="sea_water_salinity",

131

protocol="tabledap"

132

)

133

134

# GridDAP satellite data

135

satellite_sst = advanced_search_servers(

136

search_for="sea surface temperature satellite",

137

protocol="griddap",

138

cdm_data_type="Grid"

139

)

140

```

141

142

### Result Processing Helper

143

144

Internal function for processing search results from individual servers.

145

146

```python { .api }

147

def fetch_results(

148

url: str,

149

key: str,

150

protocol: str

151

) -> dict[str, DataFrame]:

152

"""

153

Fetch search results from a single server.

154

155

Parameters:

156

- url: ERDDAP search URL

157

- key: Server identifier key

158

- protocol: 'tabledap' or 'griddap'

159

160

Returns:

161

- Dictionary with server key mapped to DataFrame, or None if server fails

162

"""

163

```

164

165

## Search Result Analysis

166

167

The search functions return pandas DataFrames with standardized columns for easy analysis:

168

169

```python

170

from erddapy.multiple_server_search import search_servers

171

import pandas as pd

172

173

# Perform search

174

results = search_servers("glider temperature", parallel=True)

175

176

# Analyze results

177

print("Search Results Summary:")

178

print(f"Total datasets found: {len(results)}")

179

print(f"Unique institutions: {results['Institution'].nunique()}")

180

print(f"Servers with data: {results['Server url'].nunique()}")

181

182

# Group by institution

183

by_institution = results.groupby('Institution').size().sort_values(ascending=False)

184

print("\nDatasets by Institution:")

185

print(by_institution.head(10))

186

187

# Find datasets from specific regions

188

secoora_data = results[results['Server url'].str.contains('secoora')]

189

print(f"\nSECOORA datasets: {len(secoora_data)}")

190

191

# Export results

192

results.to_csv('erddap_search_results.csv', index=False)

193

```

194

195

## Parallel Processing Setup

196

197

For faster searches across many servers, install joblib and use parallel processing:

198

199

```bash

200

pip install joblib

201

```

202

203

```python

204

from erddapy.multiple_server_search import search_servers

205

206

# Enable parallel processing

207

results = search_servers(

208

"ocean color chlorophyll",

209

parallel=True # Uses all CPU cores

210

)

211

212

# Check performance difference

213

import time

214

215

start = time.time()

216

serial_results = search_servers("temperature", parallel=False)

217

serial_time = time.time() - start

218

219

start = time.time()

220

parallel_results = search_servers("temperature", parallel=True)

221

parallel_time = time.time() - start

222

223

print(f"Serial search: {serial_time:.2f} seconds")

224

print(f"Parallel search: {parallel_time:.2f} seconds")

225

print(f"Speedup: {serial_time/parallel_time:.1f}x")

226

```

227

228

## Error Handling and Server Failures

229

230

The multi-server search functions handle individual server failures gracefully:

231

232

```python

233

from erddapy.multiple_server_search import search_servers

234

235

# Some servers may be offline or return errors

236

results = search_servers("salinity")

237

238

# Results automatically exclude failed servers

239

print(f"Collected results from available servers: {len(results)}")

240

241

# Check server availability by testing with small search

242

test_servers = [

243

"http://erddap.secoora.org/erddap",

244

"http://invalid-server.example.com/erddap", # This will fail

245

"https://gliders.ioos.us/erddap"

246

]

247

248

test_results = search_servers(

249

"test",

250

servers_list=test_servers

251

)

252

# Only includes results from working servers

253

```

254

255

## Integration with ERDDAP Client

256

257

Use search results to configure ERDDAP instances for data download:

258

259

```python

260

from erddapy.multiple_server_search import search_servers

261

from erddapy import ERDDAP

262

263

# Search for specific datasets

264

results = search_servers("glider ru29")

265

266

if len(results) > 0:

267

# Use first result

268

dataset = results.iloc[0]

269

270

# Create ERDDAP instance for the server

271

e = ERDDAP(

272

server=dataset['Server url'],

273

protocol="tabledap"

274

)

275

276

# Set the dataset ID

277

e.dataset_id = dataset['Dataset ID']

278

279

# Download the data

280

df = e.to_pandas(response="csv")

281

print(f"Downloaded {len(df)} records from {dataset['Title']}")

282

```