0
# MCMC Sampling Results
1
2
Container for Markov Chain Monte Carlo sampling results with comprehensive diagnostics and multiple data access formats. The CmdStanMCMC class provides access to posterior draws, diagnostics, and summary statistics from NUTS-HMC sampling.
3
4
## Capabilities
5
6
### Data Access Methods
7
8
Access posterior draws in multiple formats for integration with different analysis workflows.
9
10
```python { .api }
11
def draws(self, inc_warmup=False, concat_chains=False, vars=None):
12
"""
13
Get parameter draws as NumPy array.
14
15
Parameters:
16
- inc_warmup (bool): Include warmup draws
17
- concat_chains (bool): Concatenate chains into single array
18
- vars (list of str, optional): Specific variables to include
19
20
Returns:
21
np.ndarray: Draws array with shape (draws, chains, parameters) or (total_draws, parameters) if concat_chains=True
22
"""
23
24
def draws_pd(self, vars=None, inc_warmup=False):
25
"""
26
Get parameter draws as pandas DataFrame.
27
28
Parameters:
29
- vars (list of str, optional): Specific variables to include
30
- inc_warmup (bool): Include warmup draws
31
32
Returns:
33
pd.DataFrame: Draws with parameter names as columns
34
"""
35
36
def draws_xr(self, vars=None, inc_warmup=False):
37
"""
38
Get parameter draws as xarray Dataset.
39
40
Parameters:
41
- vars (list of str, optional): Specific variables to include
42
- inc_warmup (bool): Include warmup draws
43
44
Returns:
45
xr.Dataset: Draws with coordinate labels and metadata
46
"""
47
```
48
49
**Usage Examples:**
50
51
```python
52
# Get all draws as NumPy array
53
draws_array = fit.draws() # Shape: (1000, 4, 15) for 1000 draws, 4 chains, 15 parameters
54
55
# Get specific variables as DataFrame
56
theta_phi_df = fit.draws_pd(vars=["theta", "phi"])
57
58
# Get xarray Dataset with metadata
59
draws_xr = fit.draws_xr()
60
print(draws_xr.coords) # Shows parameter names, chain IDs, draw numbers
61
```
62
63
### Variable Access
64
65
Access individual Stan variables by name with automatic array reshaping.
66
67
```python { .api }
68
def stan_variable(self, var, inc_warmup=False):
69
"""
70
Get draws for specific Stan variable.
71
72
Parameters:
73
- var (str): Variable name
74
- inc_warmup (bool): Include warmup draws
75
76
Returns:
77
np.ndarray: Draws for the variable with original Stan dimensions
78
"""
79
80
def stan_variables(self):
81
"""
82
Get all Stan variables as dictionary.
83
84
Returns:
85
dict: Mapping from variable names to draw arrays
86
"""
87
```
88
89
**Usage Examples:**
90
91
```python
92
# Access individual variables
93
theta = fit.stan_variable("theta") # Returns array with Stan dimensions
94
mu = fit.stan_variable("mu[1]") # Access specific array element
95
96
# Get all variables
97
all_vars = fit.stan_variables()
98
for name, draws in all_vars.items():
99
print(f"{name}: {draws.shape}")
100
```
101
102
### Diagnostic Variables
103
104
Access MCMC diagnostic information including divergences, tree depth, and energy statistics.
105
106
```python { .api }
107
def method_variables(self):
108
"""
109
Get sampling diagnostic variables.
110
111
Returns:
112
dict: Mapping from diagnostic names to values across chains
113
"""
114
```
115
116
**Usage Examples:**
117
118
```python
119
diagnostics = fit.method_variables()
120
121
# Check for sampling issues
122
print("Divergences per chain:", diagnostics.get("divergent__"))
123
print("Max treedepth hits:", diagnostics.get("treedepth__"))
124
print("Energy statistics:", diagnostics.get("energy__"))
125
```
126
127
### Summary Statistics
128
129
Generate comprehensive summary statistics including posterior means, quantiles, and convergence diagnostics.
130
131
```python { .api }
132
def summary(self, percentiles=None, sig_figs=None):
133
"""
134
Compute summary statistics for all parameters.
135
136
Parameters:
137
- percentiles (list of float, optional): Percentiles to compute (default: [5, 50, 95])
138
- sig_figs (int, optional): Significant figures for output
139
140
Returns:
141
pd.DataFrame: Summary statistics with R-hat, ESS, and quantiles
142
"""
143
```
144
145
**Usage Examples:**
146
147
```python
148
# Default summary with 5%, 50%, 95% quantiles
149
summary = fit.summary()
150
print(summary)
151
152
# Custom quantiles
153
summary_custom = fit.summary(percentiles=[2.5, 25, 50, 75, 97.5])
154
155
# High precision output
156
summary_precise = fit.summary(sig_figs=10)
157
```
158
159
### Convergence Diagnostics
160
161
Run comprehensive convergence diagnostics using CmdStan's built-in diagnostic tools.
162
163
```python { .api }
164
def diagnose(self):
165
"""
166
Run CmdStan diagnostics on chains.
167
168
Returns:
169
str or None: Diagnostic output if issues found, None if chains look good
170
"""
171
```
172
173
**Usage Example:**
174
175
```python
176
# Check for convergence issues
177
diagnostic_output = fit.diagnose()
178
if diagnostic_output:
179
print("Diagnostic issues found:")
180
print(diagnostic_output)
181
else:
182
print("No issues detected")
183
```
184
185
## Properties
186
187
Access metadata and sampling configuration information.
188
189
```python { .api }
190
# Chain information
191
fit.chains # int: Number of chains
192
fit.chain_ids # List[int]: Chain identifiers
193
fit.num_draws_warmup # int: Warmup iterations per chain
194
fit.num_draws_sampling # int: Sampling iterations per chain
195
fit.thin # int: Thinning interval
196
197
# Parameter information
198
fit.column_names # Tuple[str, ...]: All output column names
199
fit.metadata # InferenceMetadata: Run configuration and timing
200
201
# Adaptation information
202
fit.metric_type # str or None: Mass matrix type ("diag_e", "dense_e", "unit_e")
203
fit.metric # np.ndarray or None: Mass matrix values per chain
204
fit.step_size # np.ndarray or None: Final step sizes per chain
205
206
# Diagnostic counts
207
fit.divergences # np.ndarray or None: Divergent transitions per chain
208
fit.max_treedepths # np.ndarray or None: Max treedepth hits per chain
209
```
210
211
**Usage Examples:**
212
213
```python
214
print(f"Ran {fit.chains} chains with {fit.num_draws_sampling} samples each")
215
print(f"Final step sizes: {fit.step_size}")
216
217
if fit.divergences is not None and fit.divergences.sum() > 0:
218
print(f"Warning: {fit.divergences.sum()} total divergent transitions")
219
```
220
221
## File Operations
222
223
Save and manage CSV output files for reproducibility and external analysis.
224
225
```python { .api }
226
def save_csvfiles(self, dir=None):
227
"""
228
Save CSV output files to directory.
229
230
Parameters:
231
- dir (str or PathLike, optional): Target directory (default: creates timestamped directory)
232
233
Returns:
234
None
235
"""
236
```
237
238
**Usage Example:**
239
240
```python
241
# Save to specific directory
242
fit.save_csvfiles(dir="./mcmc_results")
243
244
# Save to timestamped directory
245
fit.save_csvfiles() # Creates directory like "chain_outputs_20231201_143022"
246
```
247
248
## Advanced Usage Patterns
249
250
### Posterior Analysis Workflow
251
252
```python
253
# Basic convergence check
254
summary = fit.summary()
255
print("R-hat range:", summary["R_hat"].min(), "to", summary["R_hat"].max())
256
257
# Detailed diagnostics
258
diagnostics = fit.diagnose()
259
if diagnostics:
260
print("Sampling issues detected")
261
262
# Extract key parameters for analysis
263
theta = fit.stan_variable("theta")
264
posterior_mean = theta.mean(axis=(0, 1)) # Average across draws and chains
265
posterior_std = theta.std(axis=(0, 1))
266
267
print(f"Posterior mean: {posterior_mean}")
268
print(f"Posterior std: {posterior_std}")
269
```
270
271
### Working with Complex Parameters
272
273
```python
274
# For Stan model with parameters:
275
# parameters {
276
# matrix[N, K] beta;
277
# vector[J] alpha;
278
# real<lower=0> sigma;
279
# }
280
281
# Access full parameter arrays
282
beta = fit.stan_variable("beta") # Shape: (1000, 4, N, K)
283
alpha = fit.stan_variable("alpha") # Shape: (1000, 4, J)
284
sigma = fit.stan_variable("sigma") # Shape: (1000, 4)
285
286
# Work with specific elements
287
beta_1_2 = fit.stan_variable("beta[1,2]") # Specific matrix element
288
alpha_mean = alpha.mean(axis=(0, 1)) # Posterior means for each element
289
```
290
291
### Integration with Analysis Libraries
292
293
```python
294
import arviz as az
295
import matplotlib.pyplot as plt
296
297
# Convert to ArviZ InferenceData for advanced diagnostics
298
inference_data = az.from_cmdstanpy(fit)
299
300
# Diagnostic plots
301
az.plot_trace(inference_data, var_names=["theta", "sigma"])
302
plt.show()
303
304
# Posterior predictive analysis
305
az.plot_posterior(inference_data, var_names=["theta"])
306
plt.show()
307
308
# Effective sample size and R-hat
309
print(az.summary(inference_data, round_to=3))
310
```
311
312
### Memory Optimization
313
314
```python
315
# For large models, access specific variables to save memory
316
important_params = ["theta", "sigma"]
317
subset_draws = fit.draws_pd(vars=important_params)
318
319
# Clear full draws from memory if not needed
320
import gc
321
del fit._draws_array
322
gc.collect()
323
324
# Use xarray for efficient slicing of large datasets
325
draws_xr = fit.draws_xr()
326
chain_1_only = draws_xr.sel(chain=0) # Select specific chain
327
recent_draws = draws_xr.isel(draw=slice(-100, None)) # Last 100 draws
328
```
329
330
### Model Validation
331
332
```python
333
# Check sampling efficiency
334
summary = fit.summary()
335
low_ess_params = summary[summary["N_Eff"] < 100].index.tolist()
336
high_rhat_params = summary[summary["R_hat"] > 1.1].index.tolist()
337
338
if low_ess_params:
339
print(f"Low ESS parameters: {low_ess_params}")
340
if high_rhat_params:
341
print(f"High R-hat parameters: {high_rhat_params}")
342
343
# Energy diagnostics
344
method_vars = fit.method_variables()
345
energy = method_vars.get("energy__")
346
if energy is not None:
347
print(f"Energy statistics available for {energy.shape[1]} chains")
348
```