Tessl Tile for pypi/erddapy@2.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

data-conversion.md erddap-client.md index.md multi-server-search.md server-management.md

multi-server-search.mddocs/

0
# Multi-Server Search
1

2
Search capabilities across multiple ERDDAP servers simultaneously with optional parallel processing. These functions allow you to discover datasets across the entire ERDDAP ecosystem rather than searching individual servers one by one.
3

4
**Note:** These functions must be imported directly from `erddapy.multiple_server_search` as they are not included in the main package exports.
5

6
## Capabilities
7

8
### Simple Multi-Server Search
9

10
Search multiple ERDDAP servers for datasets matching a query string using Google-like search syntax.
11

12
```python { .api }
13
def search_servers(
14
    query: str,
15
    *,
16
    servers_list: list[str] = None,
17
    parallel: bool = False,
18
    protocol: str = "tabledap"
19
) -> DataFrame:
20
    """
21
    Search all servers for a query string.
22
    
23
    Parameters:
24
    - query: Search terms with Google-like syntax:
25
        * Words separated by spaces (searches separately)
26
        * "quoted phrases" for exact matches  
27
        * -excludedWord to exclude terms
28
        * -"excluded phrase" to exclude phrases
29
        * Partial word matching (e.g., "spee" matches "speed")
30
    - servers_list: Optional list of server URLs. If None, searches all servers
31
    - parallel: If True, uses joblib for parallel processing
32
    - protocol: 'tabledap' or 'griddap'
33
    
34
    Returns:
35
    - pandas.DataFrame with columns: Title, Institution, Dataset ID, Server url
36
    """
37
```
38

39
**Usage Examples:**
40

41
```python
42
from erddapy.multiple_server_search import search_servers
43

44
# Basic search across all servers
45
results = search_servers("temperature salinity")
46
print(f"Found {len(results)} datasets")
47
print(results[['Title', 'Institution', 'Dataset ID']].head())
48

49
# Search for exact phrase
50
buoy_data = search_servers('"sea surface temperature"')
51

52
# Exclude certain terms
53
ocean_not_air = search_servers('temperature -air -atmospheric')
54

55
# Search specific servers only
56
coastal_servers = [
57
    "http://erddap.secoora.org/erddap",
58
    "http://www.neracoos.org/erddap"
59
]
60
coastal_results = search_servers(
61
    "glider", 
62
    servers_list=coastal_servers
63
)
64

65
# Parallel search for faster results
66
large_search = search_servers(
67
    "chlorophyll", 
68
    parallel=True
69
)
70
```
71

72
### Advanced Multi-Server Search
73

74
Advanced search with detailed constraint parameters for precise dataset discovery.
75

76
```python { .api }
77
def advanced_search_servers(
78
    servers_list: list[str] = None,
79
    *,
80
    parallel: bool = False,
81
    protocol: str = "tabledap",
82
    **kwargs
83
) -> DataFrame:
84
    """
85
    Advanced search across multiple ERDDAP servers with constraints.
86
    
87
    Parameters:
88
    - servers_list: Optional list of server URLs. If None, searches all servers
89
    - parallel: If True, uses joblib for parallel processing  
90
    - protocol: 'tabledap' or 'griddap'
91
    - **kwargs: Search constraints including:
92
        * search_for: Query string (same as search_servers)
93
        * cdm_data_type, institution, ioos_category: Metadata filters
94
        * keywords, long_name, standard_name, variableName: Variable filters
95
        * minLon, maxLon, minLat, maxLat: Geographic bounds
96
        * minTime, maxTime: Temporal bounds
97
        * items_per_page, page: Pagination controls
98
    
99
    Returns:
100
    - pandas.DataFrame with matching datasets
101
    """
102
```
103

104
**Usage Examples:**
105

106
```python
107
from erddapy.multiple_server_search import advanced_search_servers
108

109
# Geographic and temporal constraints
110
gulf_data = advanced_search_servers(
111
    search_for="temperature",
112
    minLat=25.0,
113
    maxLat=31.0,
114
    minLon=-98.0,
115
    maxLon=-80.0,
116
    minTime="2020-01-01T00:00:00Z",
117
    maxTime="2020-12-31T23:59:59Z",
118
    parallel=True
119
)
120

121
# Filter by data type and institution
122
mooring_data = advanced_search_servers(
123
    cdm_data_type="TimeSeries",
124
    institution="NOAA",
125
    ioos_category="Temperature"
126
)
127

128
# Search by variable characteristics
129
salinity_vars = advanced_search_servers(
130
    standard_name="sea_water_salinity",
131
    protocol="tabledap"
132
)
133

134
# GridDAP satellite data
135
satellite_sst = advanced_search_servers(
136
    search_for="sea surface temperature satellite",
137
    protocol="griddap",
138
    cdm_data_type="Grid"
139
)
140
```
141

142
### Result Processing Helper
143

144
Internal function for processing search results from individual servers.
145

146
```python { .api }
147
def fetch_results(
148
    url: str,
149
    key: str, 
150
    protocol: str
151
) -> dict[str, DataFrame]:
152
    """
153
    Fetch search results from a single server.
154
    
155
    Parameters:
156
    - url: ERDDAP search URL
157
    - key: Server identifier key
158
    - protocol: 'tabledap' or 'griddap'
159
    
160
    Returns:
161
    - Dictionary with server key mapped to DataFrame, or None if server fails
162
    """
163
```
164

165
## Search Result Analysis
166

167
The search functions return pandas DataFrames with standardized columns for easy analysis:
168

169
```python
170
from erddapy.multiple_server_search import search_servers
171
import pandas as pd
172

173
# Perform search
174
results = search_servers("glider temperature", parallel=True)
175

176
# Analyze results
177
print("Search Results Summary:")
178
print(f"Total datasets found: {len(results)}")
179
print(f"Unique institutions: {results['Institution'].nunique()}")
180
print(f"Servers with data: {results['Server url'].nunique()}")
181

182
# Group by institution
183
by_institution = results.groupby('Institution').size().sort_values(ascending=False)
184
print("\nDatasets by Institution:")
185
print(by_institution.head(10))
186

187
# Find datasets from specific regions
188
secoora_data = results[results['Server url'].str.contains('secoora')]
189
print(f"\nSECOORA datasets: {len(secoora_data)}")
190

191
# Export results
192
results.to_csv('erddap_search_results.csv', index=False)
193
```
194

195
## Parallel Processing Setup
196

197
For faster searches across many servers, install joblib and use parallel processing:
198

199
```bash
200
pip install joblib
201
```
202

203
```python
204
from erddapy.multiple_server_search import search_servers
205

206
# Enable parallel processing
207
results = search_servers(
208
    "ocean color chlorophyll",
209
    parallel=True  # Uses all CPU cores
210
)
211

212
# Check performance difference
213
import time
214

215
start = time.time()
216
serial_results = search_servers("temperature", parallel=False)
217
serial_time = time.time() - start
218

219
start = time.time()  
220
parallel_results = search_servers("temperature", parallel=True)
221
parallel_time = time.time() - start
222

223
print(f"Serial search: {serial_time:.2f} seconds")
224
print(f"Parallel search: {parallel_time:.2f} seconds")
225
print(f"Speedup: {serial_time/parallel_time:.1f}x")
226
```
227

228
## Error Handling and Server Failures
229

230
The multi-server search functions handle individual server failures gracefully:
231

232
```python
233
from erddapy.multiple_server_search import search_servers
234

235
# Some servers may be offline or return errors
236
results = search_servers("salinity")
237

238
# Results automatically exclude failed servers
239
print(f"Collected results from available servers: {len(results)}")
240

241
# Check server availability by testing with small search
242
test_servers = [
243
    "http://erddap.secoora.org/erddap",
244
    "http://invalid-server.example.com/erddap",  # This will fail
245
    "https://gliders.ioos.us/erddap"
246
]
247

248
test_results = search_servers(
249
    "test", 
250
    servers_list=test_servers
251
)
252
# Only includes results from working servers
253
```
254

255
## Integration with ERDDAP Client
256

257
Use search results to configure ERDDAP instances for data download:
258

259
```python
260
from erddapy.multiple_server_search import search_servers
261
from erddapy import ERDDAP
262

263
# Search for specific datasets
264
results = search_servers("glider ru29")
265

266
if len(results) > 0:
267
    # Use first result
268
    dataset = results.iloc[0]
269
    
270
    # Create ERDDAP instance for the server
271
    e = ERDDAP(
272
        server=dataset['Server url'],
273
        protocol="tabledap"
274
    )
275
    
276
    # Set the dataset ID
277
    e.dataset_id = dataset['Dataset ID']
278
    
279
    # Download the data
280
    df = e.to_pandas(response="csv")
281
    print(f"Downloaded {len(df)} records from {dataset['Title']}")
282
```

Version

Tile

Files

multi-server-search.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

multi-server-search.mddocs/