or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

climate-normals.mddaily-data.mddata-processing.mdgeographic-points.mdhourly-data.mdindex.mdmonthly-data.mdunit-conversions.mdweather-stations.md

data-processing.mddocs/

0

# Data Processing and Analysis

1

2

Meteostat provides comprehensive data processing capabilities for time series analysis, including normalization, interpolation, aggregation, unit conversion, and data quality assessment. These methods are available on all time series classes (Hourly, Daily, Monthly).

3

4

## Capabilities

5

6

### Data Retrieval

7

8

Core methods for accessing and examining time series data.

9

10

```python { .api }

11

def fetch(self) -> pd.DataFrame:

12

"""

13

Fetch the processed time series data as a pandas DataFrame.

14

15

Returns:

16

pandas.DataFrame with meteorological time series data

17

"""

18

19

def count(self) -> int:

20

"""

21

Count the number of non-null observations in the time series.

22

23

Returns:

24

int, total count of non-null data points across all parameters

25

"""

26

27

def stations(self) -> pd.Index:

28

"""

29

Get the station IDs associated with the time series.

30

31

Returns:

32

pandas.Index of station identifiers used in the time series

33

"""

34

```

35

36

### Data Quality Assessment

37

38

Evaluate data completeness and coverage across the time series.

39

40

```python { .api }

41

def coverage(self, parameter: str = None) -> float:

42

"""

43

Calculate data coverage as a ratio of available to expected observations.

44

45

Parameters:

46

- parameter: str, optional - specific parameter to calculate coverage for

47

If None, returns overall coverage across all parameters

48

49

Returns:

50

float, coverage ratio between 0.0 and 1.0 (or slightly above 1.0 if model data included)

51

"""

52

```

53

54

### Time Series Normalization

55

56

Ensure complete time series with regular intervals and filled gaps.

57

58

```python { .api }

59

def normalize(self):

60

"""

61

Normalize the time series to ensure regular time intervals.

62

Fills missing time steps with NaN values for complete series.

63

64

Returns:

65

Time series object with normalized temporal coverage

66

"""

67

```

68

69

### Missing Value Interpolation

70

71

Fill gaps in time series data using various interpolation methods.

72

73

```python { .api }

74

def interpolate(self, limit: int = 3):

75

"""

76

Interpolate missing values in the time series.

77

78

Parameters:

79

- limit: int, maximum number of consecutive NaN values to interpolate

80

(default: 3)

81

82

Returns:

83

Time series object with interpolated missing values

84

"""

85

```

86

87

### Temporal Aggregation

88

89

Aggregate time series data to different temporal frequencies.

90

91

```python { .api }

92

def aggregate(self, freq: str, spatial: bool = False):

93

"""

94

Aggregate time series data to a different temporal frequency.

95

96

Parameters:

97

- freq: str, target frequency using pandas frequency strings

98

('D' for daily, 'W' for weekly, 'MS' for monthly, 'AS' for annual)

99

- spatial: bool, whether to perform spatial averaging across stations

100

(default: False)

101

102

Returns:

103

Time series object with aggregated data at the target frequency

104

"""

105

```

106

107

### Unit Conversion

108

109

Convert meteorological parameters to different unit systems.

110

111

```python { .api }

112

def convert(self, units: dict):

113

"""

114

Convert meteorological parameters to different units.

115

116

Parameters:

117

- units: dict, mapping of parameter names to conversion functions

118

e.g., {'temp': units.fahrenheit, 'prcp': units.inches}

119

120

Returns:

121

Time series object with converted units

122

"""

123

```

124

125

### Cache Management

126

127

Manage local data cache for improved performance.

128

129

```python { .api }

130

def clear_cache(self):

131

"""

132

Clear cached data files associated with the time series.

133

Useful for forcing fresh data downloads or freeing disk space.

134

"""

135

```

136

137

## Usage Examples

138

139

### Basic Data Processing Workflow

140

141

```python

142

from datetime import datetime

143

from meteostat import Point, Daily

144

145

# Create daily time series

146

location = Point(52.5200, 13.4050) # Berlin

147

start = datetime(2020, 1, 1)

148

end = datetime(2020, 12, 31)

149

150

data = Daily(location, start, end)

151

152

# Check data quality

153

print(f"Total observations: {data.count()}")

154

coverage_stats = data.coverage()

155

print("Data coverage by parameter:")

156

print(coverage_stats)

157

158

# Fetch the data

159

daily_data = data.fetch()

160

print(f"Retrieved {len(daily_data)} daily records")

161

```

162

163

### Handling Missing Data

164

165

```python

166

from datetime import datetime

167

from meteostat import Point, Hourly

168

169

# Get hourly data that may have gaps

170

location = Point(41.8781, -87.6298) # Chicago

171

start = datetime(2020, 1, 15)

172

end = datetime(2020, 1, 20)

173

174

data = Hourly(location, start, end)

175

176

# Check for missing values before processing

177

raw_data = data.fetch()

178

missing_before = raw_data.isnull().sum()

179

print("Missing values before interpolation:")

180

print(missing_before)

181

182

# Interpolate missing values (max 3 consecutive hours)

183

data = data.interpolate(limit=3)

184

interpolated_data = data.fetch()

185

186

missing_after = interpolated_data.isnull().sum()

187

print("Missing values after interpolation:")

188

print(missing_after)

189

```

190

191

### Temporal Aggregation Examples

192

193

```python

194

from datetime import datetime

195

from meteostat import Point, Hourly

196

197

# Start with hourly data

198

location = Point(40.7128, -74.0060) # New York

199

start = datetime(2020, 6, 1)

200

end = datetime(2020, 8, 31)

201

202

hourly_data = Hourly(location, start, end)

203

204

# Aggregate to daily values

205

daily_agg = hourly_data.aggregate('D')

206

daily_data = daily_agg.fetch()

207

print(f"Aggregated to {len(daily_data)} daily records")

208

209

# Aggregate to weekly values

210

weekly_agg = hourly_data.aggregate('W')

211

weekly_data = weekly_agg.fetch()

212

print(f"Aggregated to {len(weekly_data)} weekly records")

213

214

# Aggregate to monthly values

215

monthly_agg = hourly_data.aggregate('MS') # Month start

216

monthly_data = monthly_agg.fetch()

217

print(f"Aggregated to {len(monthly_data)} monthly records")

218

```

219

220

### Spatial Aggregation

221

222

```python

223

from datetime import datetime

224

from meteostat import Stations, Daily

225

226

# Get data from multiple stations in a region

227

stations = Stations().region('DE').nearby(52.5200, 13.4050, 100000).fetch(5)

228

229

# Create time series for multiple stations

230

start = datetime(2020, 1, 1)

231

end = datetime(2020, 12, 31)

232

data = Daily(stations, start, end)

233

234

# Regular aggregation (keeps station dimension)

235

monthly_data = data.aggregate('MS')

236

station_monthly = monthly_data.fetch()

237

print(f"Monthly data with stations: {station_monthly.shape}")

238

239

# Spatial aggregation (averages across stations)

240

regional_monthly = data.aggregate('MS', spatial=True)

241

regional_data = regional_monthly.fetch()

242

print(f"Regional monthly averages: {regional_data.shape}")

243

```

244

245

### Unit Conversion Examples

246

247

```python

248

from datetime import datetime

249

from meteostat import Point, Daily, units

250

251

# Get daily data

252

location = Point(39.7392, -104.9903) # Denver

253

start = datetime(2020, 1, 1)

254

end = datetime(2020, 12, 31)

255

256

data = Daily(location, start, end)

257

258

# Convert to Imperial units

259

imperial_data = data.convert({

260

'tavg': units.fahrenheit,

261

'tmin': units.fahrenheit,

262

'tmax': units.fahrenheit,

263

'prcp': units.inches

264

})

265

266

imperial_df = imperial_data.fetch()

267

print("Temperature in Fahrenheit, precipitation in inches:")

268

print(imperial_df[['tavg', 'tmin', 'tmax', 'prcp']].head())

269

270

# Convert to scientific units

271

scientific_data = data.convert({

272

'tavg': units.kelvin,

273

'tmin': units.kelvin,

274

'tmax': units.kelvin,

275

'wspd': units.ms # m/s instead of km/h

276

})

277

278

scientific_df = scientific_data.fetch()

279

print("Temperature in Kelvin, wind speed in m/s:")

280

print(scientific_df[['tavg', 'wspd']].head())

281

```

282

283

### Custom Unit Conversions

284

285

```python

286

from meteostat import Point, Daily

287

288

# Define custom conversion functions

289

def celsius_to_rankine(temp_c):

290

"""Convert Celsius to Rankine"""

291

return (temp_c + 273.15) * 9/5

292

293

def mm_to_feet(mm):

294

"""Convert millimeters to feet"""

295

return mm / 304.8

296

297

# Apply custom conversions

298

location = Point(25.7617, -80.1918) # Miami

299

data = Daily(location, datetime(2020, 1, 1), datetime(2020, 3, 31))

300

301

converted_data = data.convert({

302

'tavg': celsius_to_rankine,

303

'prcp': mm_to_feet

304

})

305

306

custom_df = converted_data.fetch()

307

print("Custom unit conversions:")

308

print(custom_df[['tavg', 'prcp']].head())

309

```

310

311

## Aggregation Functions

312

313

Time series classes use appropriate aggregation functions when aggregating to coarser temporal resolutions:

314

315

```python { .api }

316

# Default aggregation functions for different parameters

317

aggregation_methods = {

318

# Temperature - use mean values

319

'temp': 'mean',

320

'tavg': 'mean',

321

'tmin': 'min', # For daily aggregation: minimum of period

322

'tmax': 'max', # For daily aggregation: maximum of period

323

'dwpt': 'mean',

324

325

# Precipitation - sum over period

326

'prcp': 'sum',

327

'snow': 'max', # Maximum snow depth

328

329

# Wind - directional mean for direction, average for speed

330

'wdir': 'degree_mean', # Special circular mean

331

'wspd': 'mean',

332

'wpgt': 'max', # Maximum gust

333

334

# Pressure and other continuous variables

335

'pres': 'mean',

336

'rhum': 'mean',

337

338

# Sunshine and condition codes

339

'tsun': 'sum', # Total sunshine duration

340

'coco': 'max' # Worst condition code

341

}

342

```

343

344

## Data Quality Considerations

345

346

### Coverage Analysis

347

```python

348

# Assess data completeness

349

coverage = data.coverage()

350

high_quality = coverage[coverage > 0.8] # >80% coverage

351

print(f"Parameters with good coverage: {list(high_quality.index)}")

352

```

353

354

### Interpolation Limits

355

```python

356

# Conservative interpolation for critical applications

357

conservative_data = data.interpolate(limit=1) # Only fill single gaps

358

359

# More aggressive gap-filling for visualization

360

visualization_data = data.interpolate(limit=6) # Fill up to 6-hour gaps

361

```

362

363

### Temporal Consistency

364

```python

365

# Check for unrealistic temporal jumps

366

df = data.fetch()

367

temp_diff = df['temp'].diff().abs()

368

outliers = temp_diff[temp_diff > 10] # >10°C hourly change

369

print(f"Potential temperature outliers: {len(outliers)}")

370

```