0
# Sample Datasets
1
2
Built-in sample datasets for testing, examples, and educational purposes. Provides both satellite imagery and spectral reflectance data in multiple formats, enabling users to quickly test spectral index computations and explore library functionality without requiring external data sources.
3
4
## Capabilities
5
6
### Dataset Loading Function
7
8
Opens built-in sample datasets with different formats optimized for various use cases and data types.
9
10
```python { .api }
11
def open(dataset: str) -> Any:
12
"""
13
Opens a built-in sample dataset.
14
15
Parameters:
16
- dataset: Dataset name ("sentinel" or "spectral")
17
18
Returns:
19
Dataset in appropriate format:
20
- "sentinel": xarray.DataArray with Sentinel-2 sample image (10m bands)
21
- "spectral": pandas.DataFrame with Landsat 8 reflectance samples
22
23
Raises:
24
Exception: If dataset name is not valid
25
"""
26
```
27
28
**Usage Examples:**
29
30
```python
31
import spyndex.datasets
32
33
# Load Sentinel-2 sample dataset
34
sentinel_data = spyndex.datasets.open("sentinel")
35
print(type(sentinel_data)) # <class 'xarray.core.dataarray.DataArray'>
36
print(sentinel_data.shape) # (4, 300, 300)
37
38
# Load spectral reflectance samples
39
spectral_data = spyndex.datasets.open("spectral")
40
print(type(spectral_data)) # <class 'pandas.core.frame.DataFrame'>
41
print(spectral_data.shape) # (120, 9)
42
```
43
44
## Available Datasets
45
46
### Sentinel Dataset
47
48
Multi-band satellite image from Sentinel-2 satellite with 10-meter resolution bands suitable for vegetation analysis and multi-spectral index computation.
49
50
```python
51
import spyndex.datasets
52
import spyndex
53
54
# Load Sentinel-2 sample
55
sentinel = spyndex.datasets.open("sentinel")
56
57
# Explore dataset structure
58
print(sentinel)
59
# Output: <xarray.DataArray (band: 4, x: 300, y: 300)>
60
# Coordinates:
61
# * band (band) <U3 'B02' 'B03' 'B04' 'B08'
62
# Dimensions without coordinates: x, y
63
64
print(f"Bands available: {list(sentinel.coords['band'].values)}")
65
# Output: ['B02', 'B03', 'B04', 'B08']
66
67
print(f"Spatial dimensions: {sentinel.sizes['x']} x {sentinel.sizes['y']}")
68
# Output: 300 x 300
69
70
# Compute spectral indices using Sentinel-2 data
71
ndvi = spyndex.computeIndex(
72
"NDVI",
73
params={
74
"N": sentinel.sel(band="B08"), # NIR band
75
"R": sentinel.sel(band="B04") # Red band
76
}
77
)
78
79
print(f"NDVI result shape: {ndvi.shape}") # (300, 300)
80
print(f"NDVI range: {ndvi.min().values:.3f} to {ndvi.max().values:.3f}")
81
82
# Compute multiple indices
83
indices = spyndex.computeIndex(
84
["NDVI", "GNDVI"],
85
params={
86
"N": sentinel.sel(band="B08"), # NIR
87
"R": sentinel.sel(band="B04"), # Red
88
"G": sentinel.sel(band="B03") # Green
89
}
90
)
91
92
print(f"Multiple indices shape: {indices.shape}") # (2, 300, 300)
93
print(f"Index names: {list(indices.coords['index'].values)}")
94
```
95
96
### Spectral Dataset
97
98
Landsat 8 surface reflectance samples representing three different land cover types, ideal for exploring spectral signatures and testing classification-oriented indices.
99
100
```python
101
import spyndex.datasets
102
import spyndex
103
104
# Load spectral reflectance samples
105
spectral = spyndex.datasets.open("spectral")
106
107
# Explore dataset structure
108
print(spectral.dtypes)
109
# Output:
110
# SR_B1 float64 # Coastal Aerosol
111
# SR_B2 float64 # Blue
112
# SR_B3 float64 # Green
113
# SR_B4 float64 # Red
114
# SR_B5 float64 # NIR
115
# SR_B6 float64 # SWIR1
116
# SR_B7 float64 # SWIR2
117
# ST_B10 float64 # Thermal
118
# class object # Land cover class
119
# dtype: object
120
121
print(f"Dataset shape: {spectral.shape}") # (120, 9)
122
print(f"Land cover classes: {spectral['class'].unique()}")
123
# Output: ['Water' 'Vegetation' 'Urban']
124
125
# Analyze spectral signatures by class
126
for land_class in spectral['class'].unique():
127
class_data = spectral[spectral['class'] == land_class]
128
print(f"\n{land_class} samples: {len(class_data)}")
129
print(f"Average NIR reflectance: {class_data['SR_B5'].mean():.3f}")
130
print(f"Average Red reflectance: {class_data['SR_B4'].mean():.3f}")
131
132
# Compute indices for all samples
133
ndvi_all = spyndex.computeIndex(
134
"NDVI",
135
params={
136
"N": spectral["SR_B5"], # NIR
137
"R": spectral["SR_B4"] # Red
138
}
139
)
140
141
# Add NDVI to dataframe for analysis
142
spectral_with_ndvi = spectral.copy()
143
spectral_with_ndvi["NDVI"] = ndvi_all
144
145
# Analyze NDVI by land cover class
146
for land_class in spectral_with_ndvi['class'].unique():
147
class_ndvi = spectral_with_ndvi[spectral_with_ndvi['class'] == land_class]['NDVI']
148
print(f"{land_class} NDVI: {class_ndvi.mean():.3f} ± {class_ndvi.std():.3f}")
149
```
150
151
## Dataset Integration Examples
152
153
### Complete Workflow Examples
154
155
Using datasets for comprehensive spectral index analysis:
156
157
```python
158
import spyndex.datasets
159
import spyndex
160
import matplotlib.pyplot as plt
161
import numpy as np
162
163
def analyze_dataset_indices(dataset_name, indices_list):
164
"""Analyze multiple spectral indices on a sample dataset."""
165
166
if dataset_name == "sentinel":
167
data = spyndex.datasets.open("sentinel")
168
169
# Compute indices on spatial data
170
results = spyndex.computeIndex(
171
indices_list,
172
params={
173
"N": data.sel(band="B08"), # NIR
174
"R": data.sel(band="B04"), # Red
175
"G": data.sel(band="B03"), # Green
176
"B": data.sel(band="B02") # Blue
177
}
178
)
179
180
# Visualize results
181
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
182
axes = axes.flatten()
183
184
for i, idx_name in enumerate(indices_list):
185
if i < len(axes):
186
im = axes[i].imshow(results.sel(index=idx_name), cmap='RdYlGn')
187
axes[i].set_title(f"{idx_name}")
188
axes[i].axis('off')
189
plt.colorbar(im, ax=axes[i], shrink=0.8)
190
191
plt.tight_layout()
192
plt.show()
193
194
elif dataset_name == "spectral":
195
data = spyndex.datasets.open("spectral")
196
197
# Compute indices on tabular data
198
results = {}
199
for idx_name in indices_list:
200
try:
201
idx_values = spyndex.computeIndex(
202
idx_name,
203
params={
204
"N": data["SR_B5"],
205
"R": data["SR_B4"],
206
"G": data["SR_B3"],
207
"B": data["SR_B2"]
208
}
209
)
210
results[idx_name] = idx_values
211
except Exception as e:
212
print(f"Could not compute {idx_name}: {e}")
213
214
# Analyze by land cover class
215
for land_class in data['class'].unique():
216
class_mask = data['class'] == land_class
217
print(f"\n{land_class} class:")
218
219
for idx_name, values in results.items():
220
class_values = values[class_mask]
221
print(f" {idx_name}: {class_values.mean():.3f} ± {class_values.std():.3f}")
222
223
# Example usage
224
analyze_dataset_indices("sentinel", ["NDVI", "GNDVI", "EVI", "CI"])
225
analyze_dataset_indices("spectral", ["NDVI", "NDWI", "NBR"])
226
```
227
228
### Machine Learning Integration
229
230
Using datasets for supervised learning and classification:
231
232
```python
233
import spyndex.datasets
234
import spyndex
235
import pandas as pd
236
from sklearn.ensemble import RandomForestClassifier
237
from sklearn.model_selection import train_test_split
238
from sklearn.metrics import classification_report
239
240
def create_spectral_features():
241
"""Create feature matrix using spectral indices."""
242
243
# Load dataset
244
data = spyndex.datasets.open("spectral")
245
246
# Define vegetation-related indices
247
vegetation_indices = ["NDVI", "GNDVI", "SAVI", "EVI", "CI", "RDVI"]
248
249
# Compute all indices
250
features = pd.DataFrame()
251
252
for idx_name in vegetation_indices:
253
try:
254
idx_values = spyndex.computeIndex(
255
idx_name,
256
params={
257
"N": data["SR_B5"], # NIR
258
"R": data["SR_B4"], # Red
259
"G": data["SR_B3"], # Green
260
"B": data["SR_B2"], # Blue
261
"L": spyndex.constants.L.default # For SAVI
262
}
263
)
264
features[idx_name] = idx_values
265
except:
266
print(f"Skipping {idx_name} - missing parameters")
267
268
# Add original bands as features
269
band_features = ["SR_B2", "SR_B3", "SR_B4", "SR_B5", "SR_B6", "SR_B7"]
270
for band in band_features:
271
features[band] = data[band]
272
273
# Target classes
274
y = data["class"]
275
276
return features, y
277
278
# Create feature matrix and train classifier
279
X, y = create_spectral_features()
280
281
print(f"Feature matrix shape: {X.shape}")
282
print(f"Features: {list(X.columns)}")
283
print(f"Classes: {y.unique()}")
284
285
# Train classifier
286
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
287
288
rf = RandomForestClassifier(n_estimators=100, random_state=42)
289
rf.fit(X_train, y_train)
290
291
# Evaluate
292
y_pred = rf.predict(X_test)
293
print("\nClassification Results:")
294
print(classification_report(y_test, y_pred))
295
296
# Feature importance
297
feature_importance = pd.DataFrame({
298
'feature': X.columns,
299
'importance': rf.feature_importances_
300
}).sort_values('importance', ascending=False)
301
302
print("\nTop 10 Most Important Features:")
303
print(feature_importance.head(10))
304
```
305
306
## Dataset Specifications
307
308
### Sentinel Dataset Details
309
- **Source**: Sentinel-2 MSI Level-2A
310
- **Spatial Resolution**: 10 meters
311
- **Bands**: B02 (Blue), B03 (Green), B04 (Red), B08 (NIR)
312
- **Array Size**: 300 × 300 pixels
313
- **Data Type**: xarray.DataArray
314
- **Coordinate System**: Standard x, y pixel coordinates
315
- **Value Range**: Surface reflectance (0-1 typically)
316
317
### Spectral Dataset Details
318
- **Source**: Landsat 8 OLI/TIRS Level-2
319
- **Samples**: 120 total (40 per land cover class)
320
- **Classes**: Water, Vegetation, Urban
321
- **Bands**: SR_B1-B7 (surface reflectance), ST_B10 (thermal)
322
- **Data Type**: pandas.DataFrame
323
- **Value Range**: Surface reflectance values and brightness temperature
324
325
## Error Handling
326
327
```python
328
import spyndex.datasets
329
330
# Invalid dataset name
331
try:
332
invalid_data = spyndex.datasets.open("nonexistent")
333
except Exception as e:
334
print(f"Error: {e}")
335
# Output: Error: nonexistent is not a valid dataset. Please use one of ['sentinel','spectral']
336
337
# Valid dataset names only
338
valid_datasets = ["sentinel", "spectral"]
339
for dataset in valid_datasets:
340
data = spyndex.datasets.open(dataset)
341
print(f"Successfully loaded {dataset} dataset")
342
```
343
344
The sample datasets provide immediately usable data for testing spectral index computations, developing analysis workflows, and learning about remote sensing applications without requiring external data acquisition or preprocessing.