0
# Plot Data
1
2
Functions for generating data structures needed for diagnostic plots and visualization in forecast verification. These tools create plot-ready data for Murphy diagrams, Q-Q plots, ROC curves, and other verification visualizations.
3
4
## Capabilities
5
6
### Murphy Diagram Data
7
8
Murphy diagrams provide comprehensive visualization of forecast performance across all possible decision thresholds, showing the relationship between forecast value and expected score.
9
10
#### Murphy Score Calculation
11
12
Generates the core data for Murphy diagram plotting.
13
14
```python { .api }
15
def murphy_score(
16
fcst: XarrayLike,
17
obs: XarrayLike,
18
*,
19
reduce_dims: Optional[FlexibleDimensionTypes] = None,
20
preserve_dims: Optional[FlexibleDimensionTypes] = None,
21
weights: Optional[xr.DataArray] = None,
22
) -> XarrayLike:
23
"""
24
Calculate Murphy score for Murphy diagram generation.
25
26
Args:
27
fcst: Forecast values
28
obs: Observation values
29
reduce_dims: Dimensions to reduce
30
preserve_dims: Dimensions to preserve
31
weights: Optional weights
32
33
Returns:
34
Murphy scores across forecast value thresholds
35
36
Notes:
37
- Creates data for Murphy diagram plotting
38
- Shows expected score as function of forecast value
39
- Useful for understanding forecast value relationships
40
- Typically plotted as scatter or line plot
41
"""
42
```
43
44
#### Murphy Theta Values
45
46
Computes theta parameters for Murphy diagram construction.
47
48
```python { .api }
49
def murphy_thetas(
50
fcst: XarrayLike,
51
obs: XarrayLike,
52
*,
53
reduce_dims: Optional[FlexibleDimensionTypes] = None,
54
preserve_dims: Optional[FlexibleDimensionTypes] = None,
55
weights: Optional[xr.DataArray] = None,
56
) -> XarrayLike:
57
"""
58
Calculate Murphy theta values for Murphy diagrams.
59
60
Args:
61
fcst: Forecast values
62
obs: Observation values
63
reduce_dims: Dimensions to reduce
64
preserve_dims: Dimensions to preserve
65
weights: Optional weights
66
67
Returns:
68
Theta values for Murphy diagram axes
69
70
Notes:
71
- Provides theta parameters for Murphy diagram
72
- Used as x-axis values in Murphy plots
73
- Represents decision threshold parameters
74
"""
75
```
76
77
### Quantile-Quantile Plot Data
78
79
Q-Q plots compare the distributions of forecasts and observations by plotting their quantiles against each other.
80
81
```python { .api }
82
def qq(
83
fcst: XarrayLike,
84
obs: XarrayLike,
85
*,
86
quantiles: Optional[Sequence[float]] = None,
87
reduce_dims: Optional[FlexibleDimensionTypes] = None,
88
preserve_dims: Optional[FlexibleDimensionTypes] = None,
89
) -> xr.Dataset:
90
"""
91
Generate quantile-quantile plot data.
92
93
Args:
94
fcst: Forecast values
95
obs: Observation values
96
quantiles: Quantile levels to compute (default: 0.01 to 0.99 by 0.01)
97
reduce_dims: Dimensions to reduce
98
preserve_dims: Dimensions to preserve
99
100
Returns:
101
Dataset with forecast and observation quantiles
102
103
Output Variables:
104
- fcst_quantiles: Forecast quantile values
105
- obs_quantiles: Observation quantile values
106
- quantile_levels: Quantile probability levels
107
108
Notes:
109
- Perfect forecast would show fcst_quantiles = obs_quantiles
110
- Deviations indicate distributional biases
111
- Useful for identifying systematic forecast errors
112
- Default quantiles: [0.01, 0.02, ..., 0.99]
113
"""
114
```
115
116
### ROC Curve Data
117
118
ROC (Receiver Operating Characteristic) curves evaluate binary classification performance across all probability thresholds.
119
120
```python { .api }
121
def roc(
122
fcst: XarrayLike,
123
obs: XarrayLike,
124
*,
125
bin_edges: Union[str, Sequence[float]] = "continuous",
126
reduce_dims: Optional[FlexibleDimensionTypes] = None,
127
preserve_dims: Optional[FlexibleDimensionTypes] = None,
128
weights: Optional[xr.DataArray] = None,
129
) -> xr.Dataset:
130
"""
131
Generate ROC curve data for binary classification.
132
133
Args:
134
fcst: Probability forecast values [0, 1] or continuous values
135
obs: Binary observation values {0, 1}
136
bin_edges: Bin edges for probability discretization or "continuous"
137
reduce_dims: Dimensions to reduce
138
preserve_dims: Dimensions to preserve
139
weights: Optional weights
140
141
Returns:
142
Dataset with ROC curve coordinates and metrics
143
144
Output Variables:
145
- pod: Probability of Detection (True Positive Rate)
146
- pofd: Probability of False Detection (False Positive Rate)
147
- threshold: Probability thresholds used
148
- auc: Area Under Curve (if computed)
149
150
Notes:
151
- POD (y-axis) vs POFD (x-axis) creates ROC curve
152
- Perfect forecast: ROC curve passes through (0,1)
153
- No skill: ROC curve follows diagonal (0,0) to (1,1)
154
- Area Under Curve (AUC): scalar skill measure
155
- bin_edges="continuous" uses unique forecast values
156
"""
157
```
158
159
## Usage Patterns
160
161
### Basic Murphy Diagram
162
163
```python
164
from scores.plotdata import murphy_score, murphy_thetas
165
import matplotlib.pyplot as plt
166
import numpy as np
167
168
# Generate sample data
169
forecast = np.random.normal(10, 3, 1000)
170
observation = forecast + np.random.normal(0, 1, 1000)
171
172
# Calculate Murphy diagram data
173
murphy_scores = murphy_score(forecast, observation)
174
theta_values = murphy_thetas(forecast, observation)
175
176
# Create Murphy diagram plot
177
plt.figure(figsize=(10, 6))
178
plt.scatter(theta_values, murphy_scores, alpha=0.6)
179
plt.xlabel('Theta (Decision Threshold)')
180
plt.ylabel('Murphy Score')
181
plt.title('Murphy Diagram')
182
plt.grid(True)
183
plt.show()
184
```
185
186
### Quantile-Quantile Analysis
187
188
```python
189
from scores.plotdata import qq
190
import matplotlib.pyplot as plt
191
192
# Generate forecast and observation data with different distributions
193
forecast = np.random.gamma(2, 2, 1000) # Gamma distribution
194
observation = np.random.exponential(3, 1000) # Exponential distribution
195
196
# Calculate Q-Q plot data
197
qq_data = qq(forecast, observation)
198
199
# Extract quantile values
200
fcst_q = qq_data.fcst_quantiles
201
obs_q = qq_data.obs_quantiles
202
203
# Create Q-Q plot
204
plt.figure(figsize=(8, 8))
205
plt.scatter(fcst_q, obs_q, alpha=0.7)
206
plt.plot([0, max(fcst_q.max().values, obs_q.max().values)],
207
[0, max(fcst_q.max().values, obs_q.max().values)],
208
'r--', label='Perfect Forecast Line')
209
plt.xlabel('Forecast Quantiles')
210
plt.ylabel('Observation Quantiles')
211
plt.title('Quantile-Quantile Plot')
212
plt.legend()
213
plt.grid(True)
214
plt.show()
215
216
print(f"Q-Q data contains {len(qq_data.quantile_levels)} quantile levels")
217
```
218
219
### ROC Curve Analysis
220
221
```python
222
from scores.plotdata import roc
223
import matplotlib.pyplot as plt
224
225
# Create probability forecast and binary observations
226
prob_forecast = np.random.beta(2, 3, 1000) # Probability values [0,1]
227
# Generate observations with some skill
228
binary_obs = np.random.binomial(1, 0.3 + 0.4 * prob_forecast)
229
230
# Calculate ROC curve data
231
roc_data = roc(prob_forecast, binary_obs)
232
233
# Extract ROC coordinates
234
pod_values = roc_data.pod
235
pofd_values = roc_data.pofd
236
auc_value = roc_data.auc if 'auc' in roc_data else None
237
238
# Create ROC curve plot
239
plt.figure(figsize=(8, 8))
240
plt.plot(pofd_values, pod_values, 'b-', linewidth=2, label='ROC Curve')
241
plt.plot([0, 1], [0, 1], 'r--', label='No Skill Line')
242
plt.xlabel('Probability of False Detection (False Positive Rate)')
243
plt.ylabel('Probability of Detection (True Positive Rate)')
244
plt.title(f'ROC Curve (AUC = {auc_value.values:.3f})' if auc_value else 'ROC Curve')
245
plt.legend()
246
plt.grid(True)
247
plt.axis('equal')
248
plt.xlim([0, 1])
249
plt.ylim([0, 1])
250
plt.show()
251
```
252
253
### Custom Quantile Levels
254
255
```python
256
# Q-Q plot with custom quantile levels
257
custom_quantiles = [0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95]
258
259
qq_custom = qq(forecast, observation, quantiles=custom_quantiles)
260
261
print("Custom quantile analysis:")
262
for i, q_level in enumerate(custom_quantiles):
263
fcst_val = qq_custom.fcst_quantiles.values[i]
264
obs_val = qq_custom.obs_quantiles.values[i]
265
print(f"Q{q_level:4.2f}: forecast={fcst_val:6.2f}, observation={obs_val:6.2f}")
266
```
267
268
### Multi-dimensional Plot Data
269
270
```python
271
# Generate plot data for multi-dimensional arrays
272
forecast_3d = xr.DataArray(
273
np.random.normal(10, 2, (100, 5, 8)), # time, lat, lon
274
dims=["time", "lat", "lon"]
275
)
276
observation_3d = xr.DataArray(
277
forecast_3d.values + np.random.normal(0, 1, (100, 5, 8)),
278
dims=["time", "lat", "lon"]
279
)
280
281
# Q-Q data for each spatial point (reduce time dimension)
282
qq_spatial = qq(forecast_3d, observation_3d, reduce_dims="time")
283
print(f"Spatial Q-Q data shape: {qq_spatial.fcst_quantiles.shape}")
284
285
# Q-Q data for each time step (reduce spatial dimensions)
286
qq_temporal = qq(forecast_3d, observation_3d, reduce_dims=["lat", "lon"])
287
print(f"Temporal Q-Q data shape: {qq_temporal.fcst_quantiles.shape}")
288
289
# Overall Q-Q data (reduce all dimensions)
290
qq_overall = qq(forecast_3d, observation_3d)
291
print(f"Overall Q-Q data points: {len(qq_overall.quantile_levels)}")
292
```
293
294
### ROC Analysis with Binning
295
296
```python
297
# ROC curve with specified probability bins
298
prob_bins = np.linspace(0, 1, 21) # 20 probability bins
299
300
roc_binned = roc(prob_forecast, binary_obs, bin_edges=prob_bins)
301
302
print(f"ROC analysis with {len(prob_bins)-1} probability bins")
303
print(f"Number of ROC points: {len(roc_binned.threshold)}")
304
305
# Plot binned ROC curve
306
plt.figure(figsize=(8, 8))
307
plt.plot(roc_binned.pofd, roc_binned.pod, 'bo-', markersize=4, label='Binned ROC')
308
plt.plot([0, 1], [0, 1], 'r--', label='No Skill')
309
plt.xlabel('POFD')
310
plt.ylabel('POD')
311
plt.title('ROC Curve (Binned Probabilities)')
312
plt.legend()
313
plt.grid(True)
314
plt.show()
315
316
# Show threshold values
317
for i in range(0, len(roc_binned.threshold), 5): # Every 5th threshold
318
thresh = roc_binned.threshold.values[i]
319
pod = roc_binned.pod.values[i]
320
pofd = roc_binned.pofd.values[i]
321
print(f"Threshold {thresh:.2f}: POD={pod:.3f}, POFD={pofd:.3f}")
322
```
323
324
### Comprehensive Visualization Workflow
325
326
```python
327
from scores.plotdata import murphy_score, murphy_thetas, qq, roc
328
import matplotlib.pyplot as plt
329
330
# Sample forecast system evaluation
331
forecast_prob = np.random.beta(3, 2, 2000) # Probability forecasts
332
observations = np.random.binomial(1, forecast_prob * 0.8) # Binary outcomes with some skill
333
334
# Generate all plot data
335
murphy_data = murphy_score(forecast_prob, observations)
336
theta_data = murphy_thetas(forecast_prob, observations)
337
qq_data = qq(forecast_prob, observations)
338
roc_data = roc(forecast_prob, observations)
339
340
# Create comprehensive verification plot
341
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
342
343
# Murphy diagram
344
ax1.scatter(theta_data, murphy_data, alpha=0.6)
345
ax1.set_xlabel('Theta')
346
ax1.set_ylabel('Murphy Score')
347
ax1.set_title('Murphy Diagram')
348
ax1.grid(True)
349
350
# Q-Q plot
351
ax2.scatter(qq_data.fcst_quantiles, qq_data.obs_quantiles, alpha=0.7)
352
ax2.plot([0, 1], [0, 1], 'r--', label='Perfect Forecast')
353
ax2.set_xlabel('Forecast Quantiles')
354
ax2.set_ylabel('Observation Quantiles')
355
ax2.set_title('Q-Q Plot')
356
ax2.legend()
357
ax2.grid(True)
358
359
# ROC curve
360
ax3.plot(roc_data.pofd, roc_data.pod, 'b-', linewidth=2)
361
ax3.plot([0, 1], [0, 1], 'r--', alpha=0.7)
362
ax3.set_xlabel('POFD')
363
ax3.set_ylabel('POD')
364
ax3.set_title('ROC Curve')
365
ax3.grid(True)
366
367
# Distribution comparison
368
ax4.hist(forecast_prob, bins=20, alpha=0.6, label='Forecast', density=True)
369
ax4.hist(observations, bins=20, alpha=0.6, label='Observation', density=True)
370
ax4.set_xlabel('Value')
371
ax4.set_ylabel('Density')
372
ax4.set_title('Distribution Comparison')
373
ax4.legend()
374
ax4.grid(True)
375
376
plt.tight_layout()
377
plt.show()
378
```
379
380
### Weighted Plot Data
381
382
```python
383
from scores.functions import create_latitude_weights
384
385
# Create spatially-weighted plot data
386
lats = np.linspace(-60, 60, 25)
387
area_weights = create_latitude_weights(lats)
388
389
# Expand weights to match data dimensions
390
weights_3d = area_weights.broadcast_like(forecast_3d.isel(time=0, drop=True))
391
392
# Generate area-weighted plot data
393
qq_weighted = qq(forecast_3d, observation_3d,
394
reduce_dims=["time", "lat", "lon"])
395
396
roc_weighted = roc(forecast_3d, observation_3d.where(observation_3d > 5, 0).where(observation_3d <= 5, 1),
397
reduce_dims=["time", "lat", "lon"],
398
weights=weights_3d)
399
400
print("Area-weighted verification complete")
401
print(f"Weighted Q-Q quantiles: {len(qq_weighted.quantile_levels)}")
402
print(f"Weighted ROC points: {len(roc_weighted.threshold)}")
403
```