0
# Multi-Server Search
1
2
Search capabilities across multiple ERDDAP servers simultaneously with optional parallel processing. These functions allow you to discover datasets across the entire ERDDAP ecosystem rather than searching individual servers one by one.
3
4
**Note:** These functions must be imported directly from `erddapy.multiple_server_search` as they are not included in the main package exports.
5
6
## Capabilities
7
8
### Simple Multi-Server Search
9
10
Search multiple ERDDAP servers for datasets matching a query string using Google-like search syntax.
11
12
```python { .api }
13
def search_servers(
14
query: str,
15
*,
16
servers_list: list[str] = None,
17
parallel: bool = False,
18
protocol: str = "tabledap"
19
) -> DataFrame:
20
"""
21
Search all servers for a query string.
22
23
Parameters:
24
- query: Search terms with Google-like syntax:
25
* Words separated by spaces (searches separately)
26
* "quoted phrases" for exact matches
27
* -excludedWord to exclude terms
28
* -"excluded phrase" to exclude phrases
29
* Partial word matching (e.g., "spee" matches "speed")
30
- servers_list: Optional list of server URLs. If None, searches all servers
31
- parallel: If True, uses joblib for parallel processing
32
- protocol: 'tabledap' or 'griddap'
33
34
Returns:
35
- pandas.DataFrame with columns: Title, Institution, Dataset ID, Server url
36
"""
37
```
38
39
**Usage Examples:**
40
41
```python
42
from erddapy.multiple_server_search import search_servers
43
44
# Basic search across all servers
45
results = search_servers("temperature salinity")
46
print(f"Found {len(results)} datasets")
47
print(results[['Title', 'Institution', 'Dataset ID']].head())
48
49
# Search for exact phrase
50
buoy_data = search_servers('"sea surface temperature"')
51
52
# Exclude certain terms
53
ocean_not_air = search_servers('temperature -air -atmospheric')
54
55
# Search specific servers only
56
coastal_servers = [
57
"http://erddap.secoora.org/erddap",
58
"http://www.neracoos.org/erddap"
59
]
60
coastal_results = search_servers(
61
"glider",
62
servers_list=coastal_servers
63
)
64
65
# Parallel search for faster results
66
large_search = search_servers(
67
"chlorophyll",
68
parallel=True
69
)
70
```
71
72
### Advanced Multi-Server Search
73
74
Advanced search with detailed constraint parameters for precise dataset discovery.
75
76
```python { .api }
77
def advanced_search_servers(
78
servers_list: list[str] = None,
79
*,
80
parallel: bool = False,
81
protocol: str = "tabledap",
82
**kwargs
83
) -> DataFrame:
84
"""
85
Advanced search across multiple ERDDAP servers with constraints.
86
87
Parameters:
88
- servers_list: Optional list of server URLs. If None, searches all servers
89
- parallel: If True, uses joblib for parallel processing
90
- protocol: 'tabledap' or 'griddap'
91
- **kwargs: Search constraints including:
92
* search_for: Query string (same as search_servers)
93
* cdm_data_type, institution, ioos_category: Metadata filters
94
* keywords, long_name, standard_name, variableName: Variable filters
95
* minLon, maxLon, minLat, maxLat: Geographic bounds
96
* minTime, maxTime: Temporal bounds
97
* items_per_page, page: Pagination controls
98
99
Returns:
100
- pandas.DataFrame with matching datasets
101
"""
102
```
103
104
**Usage Examples:**
105
106
```python
107
from erddapy.multiple_server_search import advanced_search_servers
108
109
# Geographic and temporal constraints
110
gulf_data = advanced_search_servers(
111
search_for="temperature",
112
minLat=25.0,
113
maxLat=31.0,
114
minLon=-98.0,
115
maxLon=-80.0,
116
minTime="2020-01-01T00:00:00Z",
117
maxTime="2020-12-31T23:59:59Z",
118
parallel=True
119
)
120
121
# Filter by data type and institution
122
mooring_data = advanced_search_servers(
123
cdm_data_type="TimeSeries",
124
institution="NOAA",
125
ioos_category="Temperature"
126
)
127
128
# Search by variable characteristics
129
salinity_vars = advanced_search_servers(
130
standard_name="sea_water_salinity",
131
protocol="tabledap"
132
)
133
134
# GridDAP satellite data
135
satellite_sst = advanced_search_servers(
136
search_for="sea surface temperature satellite",
137
protocol="griddap",
138
cdm_data_type="Grid"
139
)
140
```
141
142
### Result Processing Helper
143
144
Internal function for processing search results from individual servers.
145
146
```python { .api }
147
def fetch_results(
148
url: str,
149
key: str,
150
protocol: str
151
) -> dict[str, DataFrame]:
152
"""
153
Fetch search results from a single server.
154
155
Parameters:
156
- url: ERDDAP search URL
157
- key: Server identifier key
158
- protocol: 'tabledap' or 'griddap'
159
160
Returns:
161
- Dictionary with server key mapped to DataFrame, or None if server fails
162
"""
163
```
164
165
## Search Result Analysis
166
167
The search functions return pandas DataFrames with standardized columns for easy analysis:
168
169
```python
170
from erddapy.multiple_server_search import search_servers
171
import pandas as pd
172
173
# Perform search
174
results = search_servers("glider temperature", parallel=True)
175
176
# Analyze results
177
print("Search Results Summary:")
178
print(f"Total datasets found: {len(results)}")
179
print(f"Unique institutions: {results['Institution'].nunique()}")
180
print(f"Servers with data: {results['Server url'].nunique()}")
181
182
# Group by institution
183
by_institution = results.groupby('Institution').size().sort_values(ascending=False)
184
print("\nDatasets by Institution:")
185
print(by_institution.head(10))
186
187
# Find datasets from specific regions
188
secoora_data = results[results['Server url'].str.contains('secoora')]
189
print(f"\nSECOORA datasets: {len(secoora_data)}")
190
191
# Export results
192
results.to_csv('erddap_search_results.csv', index=False)
193
```
194
195
## Parallel Processing Setup
196
197
For faster searches across many servers, install joblib and use parallel processing:
198
199
```bash
200
pip install joblib
201
```
202
203
```python
204
from erddapy.multiple_server_search import search_servers
205
206
# Enable parallel processing
207
results = search_servers(
208
"ocean color chlorophyll",
209
parallel=True # Uses all CPU cores
210
)
211
212
# Check performance difference
213
import time
214
215
start = time.time()
216
serial_results = search_servers("temperature", parallel=False)
217
serial_time = time.time() - start
218
219
start = time.time()
220
parallel_results = search_servers("temperature", parallel=True)
221
parallel_time = time.time() - start
222
223
print(f"Serial search: {serial_time:.2f} seconds")
224
print(f"Parallel search: {parallel_time:.2f} seconds")
225
print(f"Speedup: {serial_time/parallel_time:.1f}x")
226
```
227
228
## Error Handling and Server Failures
229
230
The multi-server search functions handle individual server failures gracefully:
231
232
```python
233
from erddapy.multiple_server_search import search_servers
234
235
# Some servers may be offline or return errors
236
results = search_servers("salinity")
237
238
# Results automatically exclude failed servers
239
print(f"Collected results from available servers: {len(results)}")
240
241
# Check server availability by testing with small search
242
test_servers = [
243
"http://erddap.secoora.org/erddap",
244
"http://invalid-server.example.com/erddap", # This will fail
245
"https://gliders.ioos.us/erddap"
246
]
247
248
test_results = search_servers(
249
"test",
250
servers_list=test_servers
251
)
252
# Only includes results from working servers
253
```
254
255
## Integration with ERDDAP Client
256
257
Use search results to configure ERDDAP instances for data download:
258
259
```python
260
from erddapy.multiple_server_search import search_servers
261
from erddapy import ERDDAP
262
263
# Search for specific datasets
264
results = search_servers("glider ru29")
265
266
if len(results) > 0:
267
# Use first result
268
dataset = results.iloc[0]
269
270
# Create ERDDAP instance for the server
271
e = ERDDAP(
272
server=dataset['Server url'],
273
protocol="tabledap"
274
)
275
276
# Set the dataset ID
277
e.dataset_id = dataset['Dataset ID']
278
279
# Download the data
280
df = e.to_pandas(response="csv")
281
print(f"Downloaded {len(df)} records from {dataset['Title']}")
282
```