Exploratory analysis of Bayesian models with comprehensive data manipulation, statistical diagnostics, and visualization capabilities
npx @tessl/cli install tessl/pypi-arviz@0.22.00
# ArviZ
1
2
Exploratory analysis of Bayesian models with comprehensive data manipulation, statistical diagnostics, and visualization capabilities. ArviZ provides a unified interface for working with Bayesian inference results from multiple probabilistic programming frameworks, enabling model checking, comparison, and interpretation through an extensive suite of statistical functions and publication-quality visualizations.
3
4
## Package Information
5
6
- **Package Name**: arviz
7
- **Language**: Python
8
- **Installation**: `pip install arviz`
9
- **Requires**: Python 3.10+
10
11
## Core Imports
12
13
```python
14
import arviz as az
15
```
16
17
## Basic Usage
18
19
```python
20
import arviz as az
21
import numpy as np
22
23
# Load built-in example dataset
24
idata = az.load_arviz_data("centered_eight")
25
26
# Create summary statistics
27
summary = az.summary(idata)
28
print(summary)
29
30
# Generate diagnostic plots
31
az.plot_trace(idata, var_names=["mu", "tau"])
32
az.plot_posterior(idata, var_names=["mu", "tau"])
33
34
# Compute diagnostics
35
rhat = az.rhat(idata)
36
ess = az.ess(idata)
37
38
# Model comparison (requires multiple models)
39
model_dict = {"model1": idata1, "model2": idata2} # hypothetical models
40
comparison = az.compare(model_dict)
41
az.plot_compare(comparison)
42
```
43
44
## Architecture
45
46
ArviZ is built around the **InferenceData** structure, a NetCDF-based data container using xarray that organizes Bayesian inference results into logical groups:
47
48
- **Posterior**: MCMC samples from the posterior distribution
49
- **Prior**: Samples from the prior distribution
50
- **Observed Data**: Observed/input data used in the model
51
- **Posterior Predictive**: Samples from the posterior predictive distribution
52
- **Sample Stats**: MCMC diagnostics and metadata (divergences, energy, etc.)
53
- **Log Likelihood**: Log likelihood evaluations for model comparison
54
55
This design enables seamless integration across probabilistic programming frameworks (Stan, PyMC, Pyro, NumPyro, JAX, etc.) while providing a consistent interface for analysis and visualization.
56
57
## Capabilities
58
59
### Data Operations
60
61
Comprehensive data loading, conversion, and manipulation capabilities supporting multiple Bayesian frameworks and file formats. Create, transform, and manage InferenceData objects with built-in dataset examples and extensive I/O operations.
62
63
```python { .api }
64
class InferenceData:
65
"""Main data container for Bayesian inference results."""
66
67
def load_arviz_data(dataset: str, data_home: str = None, **kwargs) -> InferenceData:
68
"""Load built-in example datasets."""
69
70
def convert_to_inference_data(obj, *, group: str = None, coords: dict = None, dims: dict = None, **kwargs) -> InferenceData:
71
"""Convert various objects to InferenceData format."""
72
73
def concat(*args, dim: str, copy: bool = True, inplace: bool = False, reset_dim: bool = True) -> InferenceData:
74
"""Concatenate multiple InferenceData objects."""
75
```
76
77
[Data Operations](./data-operations.md)
78
79
### Statistical Analysis and Diagnostics
80
81
MCMC diagnostics, model comparison, and statistical functions for Bayesian analysis. Compute convergence diagnostics, effective sample sizes, information criteria, and perform Bayesian model comparison with advanced techniques like PSIS-LOO.
82
83
```python { .api }
84
def summary(data: InferenceData, *, var_names: list = None, hdi_prob: float = 0.94, **kwargs) -> pd.DataFrame:
85
"""Create summary statistics table for inference data."""
86
87
def rhat(data: InferenceData, *, var_names: list = None, method: str = "rank") -> xr.Dataset:
88
"""Compute R-hat convergence diagnostic."""
89
90
def ess(data: InferenceData, *, var_names: list = None, method: str = "bulk", relative: bool = False) -> xr.Dataset:
91
"""Compute effective sample size."""
92
93
def compare(data_dict: dict, *, ic: str = "loo", method: str = "stacking", scale: str = "log") -> pd.DataFrame:
94
"""Compare models using information criteria."""
95
96
def loo(data: InferenceData, *, pointwise: bool = False, var_name: str = None, reff: float = 1.0) -> ELPDData:
97
"""Compute Leave-One-Out Cross-Validation."""
98
```
99
100
[Statistical Analysis and Diagnostics](./statistical-analysis.md)
101
102
### Visualization and Plotting
103
104
Comprehensive plotting functions for Bayesian analysis including diagnostic plots, distribution visualizations, model comparison plots, and posterior predictive checks. Support for multiple backends (Matplotlib, Bokeh, Plotly) with publication-quality output.
105
106
```python { .api }
107
def plot_trace(data: InferenceData, *, var_names: list = None, coords: dict = None, **kwargs):
108
"""Plot MCMC trace plots."""
109
110
def plot_posterior(data: InferenceData, *, var_names: list = None, coords: dict = None, **kwargs):
111
"""Plot posterior distributions."""
112
113
def plot_forest(data: InferenceData, *, var_names: list = None, coords: dict = None, **kwargs):
114
"""Plot forest plots for parameter estimates."""
115
116
def plot_compare(comp_df: pd.DataFrame, *, insample_dev: bool = True, plot_ic_diff: bool = True, **kwargs):
117
"""Plot model comparison results."""
118
119
def plot_ppc(data: InferenceData, *, var_names: list = None, coords: dict = None, **kwargs):
120
"""Plot posterior predictive checks."""
121
```
122
123
[Visualization and Plotting](./visualization-plotting.md)
124
125
### Framework Integrations
126
127
Convert inference results from various probabilistic programming frameworks to ArviZ's unified InferenceData format. Supports Stan (CmdStan, PyStan, CmdStanPy), PyMC, Pyro, NumPyro, JAX, emcee, and more.
128
129
```python { .api }
130
def from_pystan(fit, *, posterior_predictive: str = None, observed_data: dict = None, **kwargs) -> InferenceData:
131
"""Convert PyStan fit results to InferenceData."""
132
133
def from_cmdstanpy(fit, *, posterior_predictive: str = None, observed_data: dict = None, **kwargs) -> InferenceData:
134
"""Convert CmdStanPy fit results to InferenceData."""
135
136
def from_pyro(posterior, *, prior: dict = None, posterior_predictive: dict = None, **kwargs) -> InferenceData:
137
"""Convert Pyro MCMC results to InferenceData."""
138
139
def from_numpyro(posterior, *, prior: dict = None, posterior_predictive: dict = None, **kwargs) -> InferenceData:
140
"""Convert NumPyro MCMC results to InferenceData."""
141
```
142
143
[Framework Integrations](./framework-integrations.md)
144
145
### Configuration Management
146
147
Global and context-specific configuration management for ArviZ behavior including plotting backends, statistical defaults, and data handling preferences.
148
149
```python { .api }
150
rcParams: RcParams
151
"""Global configuration parameters object."""
152
153
class rc_context:
154
"""Context manager for temporary configuration changes."""
155
def __init__(self, rc: dict = None, fname: str = None): ...
156
```
157
158
[Configuration Management](./configuration-management.md)
159
160
### Performance and Utilities
161
162
Performance optimization utilities including Numba JIT compilation, Dask parallelization, and interactive backend management for Jupyter environments.
163
164
```python { .api }
165
class Numba:
166
"""Numba JIT compilation utilities."""
167
numba_flag: bool
168
@classmethod
169
def enable_numba(cls): ...
170
@classmethod
171
def disable_numba(cls): ...
172
173
class Dask:
174
"""Dask parallel computation utilities."""
175
dask_flag: bool
176
@classmethod
177
def enable_dask(cls, dask_kwargs: dict = None): ...
178
@classmethod
179
def disable_dask(cls): ...
180
181
class interactive_backend:
182
"""Context manager for interactive plotting backends."""
183
def __init__(self, backend: str = ""): ...
184
```
185
186
[Performance and Utilities](./performance-utilities.md)
187
188
## Global Types
189
190
```python { .api }
191
class InferenceData:
192
"""
193
Main data container for Bayesian inference results.
194
195
NetCDF-based data structure using xarray groups to organize
196
posterior samples, prior samples, observed data, diagnostics,
197
and metadata from Bayesian inference.
198
"""
199
200
class ELPDData:
201
"""
202
Expected log pointwise predictive density data container.
203
204
Contains information criteria results from loo() and waic()
205
including pointwise values, diagnostics, and summary statistics.
206
"""
207
208
CoordSpec = Dict[str, List[Any]]
209
"""Type alias for coordinate specifications in data conversion."""
210
211
DimSpec = Dict[str, List[str]]
212
"""Type alias for dimension specifications in data conversion."""
213
```