0
# Data Processing and Analysis
1
2
Meteostat provides comprehensive data processing capabilities for time series analysis, including normalization, interpolation, aggregation, unit conversion, and data quality assessment. These methods are available on all time series classes (Hourly, Daily, Monthly).
3
4
## Capabilities
5
6
### Data Retrieval
7
8
Core methods for accessing and examining time series data.
9
10
```python { .api }
11
def fetch(self) -> pd.DataFrame:
12
"""
13
Fetch the processed time series data as a pandas DataFrame.
14
15
Returns:
16
pandas.DataFrame with meteorological time series data
17
"""
18
19
def count(self) -> int:
20
"""
21
Count the number of non-null observations in the time series.
22
23
Returns:
24
int, total count of non-null data points across all parameters
25
"""
26
27
def stations(self) -> pd.Index:
28
"""
29
Get the station IDs associated with the time series.
30
31
Returns:
32
pandas.Index of station identifiers used in the time series
33
"""
34
```
35
36
### Data Quality Assessment
37
38
Evaluate data completeness and coverage across the time series.
39
40
```python { .api }
41
def coverage(self, parameter: str = None) -> float:
42
"""
43
Calculate data coverage as a ratio of available to expected observations.
44
45
Parameters:
46
- parameter: str, optional - specific parameter to calculate coverage for
47
If None, returns overall coverage across all parameters
48
49
Returns:
50
float, coverage ratio between 0.0 and 1.0 (or slightly above 1.0 if model data included)
51
"""
52
```
53
54
### Time Series Normalization
55
56
Ensure complete time series with regular intervals and filled gaps.
57
58
```python { .api }
59
def normalize(self):
60
"""
61
Normalize the time series to ensure regular time intervals.
62
Fills missing time steps with NaN values for complete series.
63
64
Returns:
65
Time series object with normalized temporal coverage
66
"""
67
```
68
69
### Missing Value Interpolation
70
71
Fill gaps in time series data using various interpolation methods.
72
73
```python { .api }
74
def interpolate(self, limit: int = 3):
75
"""
76
Interpolate missing values in the time series.
77
78
Parameters:
79
- limit: int, maximum number of consecutive NaN values to interpolate
80
(default: 3)
81
82
Returns:
83
Time series object with interpolated missing values
84
"""
85
```
86
87
### Temporal Aggregation
88
89
Aggregate time series data to different temporal frequencies.
90
91
```python { .api }
92
def aggregate(self, freq: str, spatial: bool = False):
93
"""
94
Aggregate time series data to a different temporal frequency.
95
96
Parameters:
97
- freq: str, target frequency using pandas frequency strings
98
('D' for daily, 'W' for weekly, 'MS' for monthly, 'AS' for annual)
99
- spatial: bool, whether to perform spatial averaging across stations
100
(default: False)
101
102
Returns:
103
Time series object with aggregated data at the target frequency
104
"""
105
```
106
107
### Unit Conversion
108
109
Convert meteorological parameters to different unit systems.
110
111
```python { .api }
112
def convert(self, units: dict):
113
"""
114
Convert meteorological parameters to different units.
115
116
Parameters:
117
- units: dict, mapping of parameter names to conversion functions
118
e.g., {'temp': units.fahrenheit, 'prcp': units.inches}
119
120
Returns:
121
Time series object with converted units
122
"""
123
```
124
125
### Cache Management
126
127
Manage local data cache for improved performance.
128
129
```python { .api }
130
def clear_cache(self):
131
"""
132
Clear cached data files associated with the time series.
133
Useful for forcing fresh data downloads or freeing disk space.
134
"""
135
```
136
137
## Usage Examples
138
139
### Basic Data Processing Workflow
140
141
```python
142
from datetime import datetime
143
from meteostat import Point, Daily
144
145
# Create daily time series
146
location = Point(52.5200, 13.4050) # Berlin
147
start = datetime(2020, 1, 1)
148
end = datetime(2020, 12, 31)
149
150
data = Daily(location, start, end)
151
152
# Check data quality
153
print(f"Total observations: {data.count()}")
154
coverage_stats = data.coverage()
155
print("Data coverage by parameter:")
156
print(coverage_stats)
157
158
# Fetch the data
159
daily_data = data.fetch()
160
print(f"Retrieved {len(daily_data)} daily records")
161
```
162
163
### Handling Missing Data
164
165
```python
166
from datetime import datetime
167
from meteostat import Point, Hourly
168
169
# Get hourly data that may have gaps
170
location = Point(41.8781, -87.6298) # Chicago
171
start = datetime(2020, 1, 15)
172
end = datetime(2020, 1, 20)
173
174
data = Hourly(location, start, end)
175
176
# Check for missing values before processing
177
raw_data = data.fetch()
178
missing_before = raw_data.isnull().sum()
179
print("Missing values before interpolation:")
180
print(missing_before)
181
182
# Interpolate missing values (max 3 consecutive hours)
183
data = data.interpolate(limit=3)
184
interpolated_data = data.fetch()
185
186
missing_after = interpolated_data.isnull().sum()
187
print("Missing values after interpolation:")
188
print(missing_after)
189
```
190
191
### Temporal Aggregation Examples
192
193
```python
194
from datetime import datetime
195
from meteostat import Point, Hourly
196
197
# Start with hourly data
198
location = Point(40.7128, -74.0060) # New York
199
start = datetime(2020, 6, 1)
200
end = datetime(2020, 8, 31)
201
202
hourly_data = Hourly(location, start, end)
203
204
# Aggregate to daily values
205
daily_agg = hourly_data.aggregate('D')
206
daily_data = daily_agg.fetch()
207
print(f"Aggregated to {len(daily_data)} daily records")
208
209
# Aggregate to weekly values
210
weekly_agg = hourly_data.aggregate('W')
211
weekly_data = weekly_agg.fetch()
212
print(f"Aggregated to {len(weekly_data)} weekly records")
213
214
# Aggregate to monthly values
215
monthly_agg = hourly_data.aggregate('MS') # Month start
216
monthly_data = monthly_agg.fetch()
217
print(f"Aggregated to {len(monthly_data)} monthly records")
218
```
219
220
### Spatial Aggregation
221
222
```python
223
from datetime import datetime
224
from meteostat import Stations, Daily
225
226
# Get data from multiple stations in a region
227
stations = Stations().region('DE').nearby(52.5200, 13.4050, 100000).fetch(5)
228
229
# Create time series for multiple stations
230
start = datetime(2020, 1, 1)
231
end = datetime(2020, 12, 31)
232
data = Daily(stations, start, end)
233
234
# Regular aggregation (keeps station dimension)
235
monthly_data = data.aggregate('MS')
236
station_monthly = monthly_data.fetch()
237
print(f"Monthly data with stations: {station_monthly.shape}")
238
239
# Spatial aggregation (averages across stations)
240
regional_monthly = data.aggregate('MS', spatial=True)
241
regional_data = regional_monthly.fetch()
242
print(f"Regional monthly averages: {regional_data.shape}")
243
```
244
245
### Unit Conversion Examples
246
247
```python
248
from datetime import datetime
249
from meteostat import Point, Daily, units
250
251
# Get daily data
252
location = Point(39.7392, -104.9903) # Denver
253
start = datetime(2020, 1, 1)
254
end = datetime(2020, 12, 31)
255
256
data = Daily(location, start, end)
257
258
# Convert to Imperial units
259
imperial_data = data.convert({
260
'tavg': units.fahrenheit,
261
'tmin': units.fahrenheit,
262
'tmax': units.fahrenheit,
263
'prcp': units.inches
264
})
265
266
imperial_df = imperial_data.fetch()
267
print("Temperature in Fahrenheit, precipitation in inches:")
268
print(imperial_df[['tavg', 'tmin', 'tmax', 'prcp']].head())
269
270
# Convert to scientific units
271
scientific_data = data.convert({
272
'tavg': units.kelvin,
273
'tmin': units.kelvin,
274
'tmax': units.kelvin,
275
'wspd': units.ms # m/s instead of km/h
276
})
277
278
scientific_df = scientific_data.fetch()
279
print("Temperature in Kelvin, wind speed in m/s:")
280
print(scientific_df[['tavg', 'wspd']].head())
281
```
282
283
### Custom Unit Conversions
284
285
```python
286
from meteostat import Point, Daily
287
288
# Define custom conversion functions
289
def celsius_to_rankine(temp_c):
290
"""Convert Celsius to Rankine"""
291
return (temp_c + 273.15) * 9/5
292
293
def mm_to_feet(mm):
294
"""Convert millimeters to feet"""
295
return mm / 304.8
296
297
# Apply custom conversions
298
location = Point(25.7617, -80.1918) # Miami
299
data = Daily(location, datetime(2020, 1, 1), datetime(2020, 3, 31))
300
301
converted_data = data.convert({
302
'tavg': celsius_to_rankine,
303
'prcp': mm_to_feet
304
})
305
306
custom_df = converted_data.fetch()
307
print("Custom unit conversions:")
308
print(custom_df[['tavg', 'prcp']].head())
309
```
310
311
## Aggregation Functions
312
313
Time series classes use appropriate aggregation functions when aggregating to coarser temporal resolutions:
314
315
```python { .api }
316
# Default aggregation functions for different parameters
317
aggregation_methods = {
318
# Temperature - use mean values
319
'temp': 'mean',
320
'tavg': 'mean',
321
'tmin': 'min', # For daily aggregation: minimum of period
322
'tmax': 'max', # For daily aggregation: maximum of period
323
'dwpt': 'mean',
324
325
# Precipitation - sum over period
326
'prcp': 'sum',
327
'snow': 'max', # Maximum snow depth
328
329
# Wind - directional mean for direction, average for speed
330
'wdir': 'degree_mean', # Special circular mean
331
'wspd': 'mean',
332
'wpgt': 'max', # Maximum gust
333
334
# Pressure and other continuous variables
335
'pres': 'mean',
336
'rhum': 'mean',
337
338
# Sunshine and condition codes
339
'tsun': 'sum', # Total sunshine duration
340
'coco': 'max' # Worst condition code
341
}
342
```
343
344
## Data Quality Considerations
345
346
### Coverage Analysis
347
```python
348
# Assess data completeness
349
coverage = data.coverage()
350
high_quality = coverage[coverage > 0.8] # >80% coverage
351
print(f"Parameters with good coverage: {list(high_quality.index)}")
352
```
353
354
### Interpolation Limits
355
```python
356
# Conservative interpolation for critical applications
357
conservative_data = data.interpolate(limit=1) # Only fill single gaps
358
359
# More aggressive gap-filling for visualization
360
visualization_data = data.interpolate(limit=6) # Fill up to 6-hour gaps
361
```
362
363
### Temporal Consistency
364
```python
365
# Check for unrealistic temporal jumps
366
df = data.fetch()
367
temp_diff = df['temp'].diff().abs()
368
outliers = temp_diff[temp_diff > 10] # >10°C hourly change
369
print(f"Potential temperature outliers: {len(outliers)}")
370
```