Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with PyTensor
npx @tessl/cli install tessl/pypi-pymc@5.25.00
# PyMC - Probabilistic Programming in Python
1
2
## Overview
3
4
PyMC is a powerful Python library for probabilistic programming and Bayesian modeling. It provides an intuitive interface for building complex statistical models, offering automatic differentiation, efficient sampling algorithms, and seamless integration with the scientific Python ecosystem. PyMC enables users to specify models in a natural mathematical notation and automatically handles the computational details of Bayesian inference.
5
6
PyMC is built on PyTensor for automatic differentiation and compilation, providing fast and efficient computation for complex probabilistic models. It supports various inference methods including Markov Chain Monte Carlo (MCMC), variational inference, and approximate Bayesian computation.
7
8
## Package Information
9
10
- **Package Name**: pymc
11
- **Package Type**: pypi
12
- **Language**: Python
13
- **Installation**: `pip install pymc`
14
- **Dependencies**: PyTensor, NumPy, SciPy, ArviZ, Matplotlib
15
- **License**: Apache License 2.0
16
17
## Core Imports
18
19
```python { .api }
20
import pymc as pm
21
22
# Core modeling components
23
from pymc import Model, Deterministic, Potential
24
25
# Probability distributions
26
from pymc import Normal, Bernoulli, Beta, Gamma, Poisson
27
from pymc import MvNormal, Dirichlet, Categorical
28
29
# Sampling and inference
30
from pymc import sample, sample_prior_predictive, sample_posterior_predictive
31
from pymc import find_MAP, NUTS, Metropolis
32
33
# Variational inference
34
from pymc import ADVI, fit
35
36
# Gaussian processes
37
from pymc.gp import Marginal, Latent
38
from pymc.gp.cov import ExpQuad, Matern52
39
40
# Mathematical utilities
41
from pymc import logit, invlogit, logsumexp
42
43
# Data handling
44
from pymc import Data, Minibatch
45
46
# Model visualization and debugging
47
from pymc import model_to_graphviz, model_to_mermaid, model_to_networkx
48
49
# Exception handling
50
from pymc import SamplingError, IncorrectArgumentsError, ShapeError
51
52
# Log-probability functions
53
from pymc import logp, logcdf, icdf
54
55
# ODE integration (requires installation of sunode for performance)
56
from pymc import ode
57
```
58
59
## Basic Usage
60
61
### Simple Bayesian Linear Regression
62
63
```python { .api }
64
import pymc as pm
65
import numpy as np
66
import arviz as az
67
68
# Generate synthetic data
69
np.random.seed(42)
70
X = np.random.randn(100, 2)
71
true_beta = np.array([1.5, -2.0])
72
y = X @ true_beta + np.random.randn(100) * 0.5
73
74
# Define Bayesian model
75
with pm.Model() as linear_model:
76
# Priors for regression coefficients
77
beta = pm.Normal('beta', mu=0, sigma=1, shape=2)
78
79
# Prior for noise standard deviation
80
sigma = pm.HalfNormal('sigma', sigma=1)
81
82
# Linear combination
83
mu = pm.math.dot(X, beta)
84
85
# Likelihood
86
likelihood = pm.Normal('likelihood', mu=mu, sigma=sigma, observed=y)
87
88
# Sample from posterior
89
trace = pm.sample(2000, tune=1000, chains=4)
90
91
# Analyze results
92
print(az.summary(trace))
93
```
94
95
### Hierarchical Model Example
96
97
```python { .api }
98
import pymc as pm
99
100
# Hierarchical model for grouped data
101
with pm.Model() as hierarchical_model:
102
# Hyperpriors
103
mu_alpha = pm.Normal('mu_alpha', mu=0, sigma=10)
104
sigma_alpha = pm.HalfNormal('sigma_alpha', sigma=1)
105
106
# Group-level parameters
107
alpha = pm.Normal('alpha', mu=mu_alpha, sigma=sigma_alpha, shape=n_groups)
108
109
# Individual-level likelihood
110
y_obs = pm.Normal('y_obs', mu=alpha[group_idx], sigma=1, observed=data)
111
112
# Sample
113
trace = pm.sample()
114
```
115
116
## Architecture
117
118
PyMC's architecture consists of several key components:
119
120
### Core Components
121
122
1. **Model Context**: The `Model` class provides a context manager for defining probabilistic models
123
2. **Random Variables**: Distribution objects that represent uncertain quantities
124
3. **Deterministic Transformations**: Functions of random variables that don't add randomness
125
4. **Potential Terms**: Custom log-probability contributions to the model
126
127
### Distribution System
128
129
PyMC includes a comprehensive collection of probability distributions:
130
- **Continuous**: Normal, Beta, Gamma, Student-t, and 30+ others
131
- **Discrete**: Bernoulli, Poisson, Categorical, and more
132
- **Multivariate**: Multivariate Normal, Dirichlet, LKJ, and others
133
- **Time Series**: Random walks, autoregressive processes
134
- **Custom**: User-defined distributions via CustomDist and DensityDist
135
136
### Inference Engines
137
138
Multiple inference methods are supported:
139
- **MCMC**: NUTS, Metropolis, Slice sampling
140
- **Variational**: ADVI, Full-rank ADVI, SVGD
141
- **Sequential Monte Carlo**: SMC sampling
142
- **MAP Estimation**: Maximum a posteriori point estimates
143
144
## Capabilities
145
146
### Probability Distributions
147
148
PyMC provides 80+ probability distributions for modeling various types of data and uncertainty.
149
150
```python { .api }
151
# Continuous distributions
152
alpha = pm.Normal('alpha', mu=0, sigma=1)
153
beta = pm.Beta('beta', alpha=2, beta=5)
154
rate = pm.Gamma('rate', alpha=1, beta=1)
155
156
# Discrete distributions
157
success = pm.Bernoulli('success', p=0.7)
158
counts = pm.Poisson('counts', mu=3.5)
159
160
# Multivariate distributions
161
mu_vec = pm.MvNormal('mu_vec', mu=np.zeros(3), cov=np.eye(3))
162
probs = pm.Dirichlet('probs', a=np.ones(4))
163
```
164
165
**[Complete Distributions Reference](./distributions.md)**
166
167
### MCMC Sampling and Inference
168
169
Advanced sampling algorithms with automatic tuning and diagnostics.
170
171
```python { .api }
172
# Main sampling interface
173
trace = pm.sample(
174
draws=2000, # Number of samples
175
tune=1000, # Tuning samples
176
chains=4, # Number of chains
177
cores=4, # Parallel chains
178
step=None, # Auto step method selection
179
target_accept=0.8 # Target acceptance rate
180
)
181
182
# Custom step methods
183
step = pm.NUTS(target_accept=0.9)
184
trace = pm.sample(step=step)
185
186
# Predictive sampling
187
prior_pred = pm.sample_prior_predictive(samples=1000)
188
posterior_pred = pm.sample_posterior_predictive(trace)
189
```
190
191
**[Complete Sampling Reference](./sampling.md)**
192
193
### Gaussian Processes
194
195
Flexible framework for non-parametric Bayesian modeling.
196
197
```python { .api }
198
# Define GP with covariance function
199
with pm.Model() as gp_model:
200
# Length scale and variance
201
ls = pm.Gamma('ls', alpha=2, beta=1)
202
eta = pm.HalfNormal('eta', sigma=1)
203
204
# Covariance function
205
cov_func = eta**2 * pm.gp.cov.ExpQuad(1, ls)
206
207
# GP implementation
208
gp = pm.gp.Marginal(cov_func=cov_func)
209
210
# Observe data
211
y_obs = gp.marginal_likelihood('y_obs', X=X_obs, y=y_obs, noise=sigma)
212
```
213
214
**[Complete Gaussian Processes Reference](./gp.md)**
215
216
### Variational Inference
217
218
Fast approximate inference for large-scale models.
219
220
```python { .api }
221
# Automatic Differentiation Variational Inference
222
with model:
223
# Mean-field approximation
224
approx = pm.fit(method='advi', n=50000)
225
226
# Full-rank approximation
227
approx = pm.fit(method='fullrank_advi', n=50000)
228
229
# Sample from approximation
230
trace = approx.sample(2000)
231
232
# Custom optimization
233
advi = pm.ADVI()
234
approx = pm.fit(n=50000, method=advi, optimizer=pm.adam(learning_rate=0.01))
235
```
236
237
**[Complete Variational Inference Reference](./variational.md)**
238
239
### Model Building and Transformations
240
241
Core utilities for building and manipulating probabilistic models.
242
243
```python { .api }
244
# Model context and utilities
245
with pm.Model() as model:
246
# Get current model context
247
current_model = pm.modelcontext()
248
249
# Deterministic transformations
250
log_odds = pm.Deterministic('log_odds', pm.math.log(p / (1 - p)))
251
252
# Custom potential terms
253
custom_prior = pm.Potential('custom_prior', custom_log_prob)
254
255
# Update data in model
256
pm.set_data({'X_new': new_X_data})
257
```
258
259
**[Complete Model Reference](./model.md)**
260
261
### Statistical Diagnostics
262
263
Comprehensive diagnostics via ArviZ integration.
264
265
```python { .api }
266
# Convergence diagnostics
267
rhat = pm.rhat(trace)
268
ess = pm.ess(trace)
269
mcse_stats = pm.mcse(trace)
270
271
# Model comparison
272
loo_stats = pm.loo(trace, model)
273
waic_stats = pm.waic(trace, model)
274
275
# Custom log-likelihood computation
276
log_likelihood = pm.compute_log_likelihood(trace, model)
277
```
278
279
**[Complete Statistics Reference](./stats.md)**
280
281
### Mathematical Utilities
282
283
Essential mathematical functions for model building.
284
285
```python { .api }
286
# Link functions
287
log_odds = pm.logit(probability)
288
probability = pm.invlogit(log_odds)
289
z_score = pm.probit(probability)
290
291
# Numerical utilities
292
log_sum = pm.logsumexp(log_values)
293
stable_sum = pm.logaddexp(log_a, log_b)
294
295
# Matrix operations
296
triangular_matrix = pm.expand_packed_triangular(packed_values)
297
```
298
299
**[Complete Math Reference](./math.md)**
300
301
### Data Handling and Backends
302
303
Efficient data containers and trace storage systems.
304
305
```python { .api }
306
# Data containers
307
X_data = pm.Data('X_data', X_observed)
308
y_data = pm.Data('y_data', y_observed)
309
310
# Minibatch data for large datasets
311
mb_X = pm.Minibatch(X_large, batch_size=128)
312
mb_y = pm.Minibatch(y_large, batch_size=128)
313
314
# Trace backends
315
trace = pm.sample(trace=pm.backends.NDArray()) # In-memory
316
trace = pm.sample(trace=pm.backends.ZarrTrace()) # Zarr storage
317
318
# ArviZ conversion
319
idata = pm.to_inference_data(trace, model=model)
320
```
321
322
**[Complete Data Handling Reference](./data.md)**
323
324
### Ordinary Differential Equations
325
326
Solve systems of ODEs as part of probabilistic models for dynamic systems modeling.
327
328
```python { .api }
329
# Basic ODE system definition
330
from pymc import ode
331
332
# Define ODE system
333
def lotka_volterra(y, t, p):
334
"""Lotka-Volterra predator-prey equations."""
335
return [p[0] * y[0] - p[1] * y[0] * y[1],
336
p[2] * y[0] * y[1] - p[3] * y[1]]
337
338
# Create ODE solution in model context
339
with pm.Model() as ode_model:
340
# Parameters
341
alpha = pm.LogNormal('alpha', 0, 1)
342
beta = pm.LogNormal('beta', 0, 1)
343
gamma = pm.LogNormal('gamma', 0, 1)
344
delta = pm.LogNormal('delta', 0, 1)
345
346
# Initial conditions
347
y0 = [1.0, 1.0]
348
349
# ODE solution
350
ode_solution = ode.DifferentialEquation(
351
func=lotka_volterra,
352
times=np.linspace(0, 10, 100),
353
n_states=2,
354
n_theta=4,
355
t0=0
356
)
357
```
358
359
**[Complete ODE Reference](./ode.md)**
360
361
### Model Visualization and Debugging
362
363
Generate visual representations of model structure and relationships.
364
365
```python { .api }
366
# Generate model graphs
367
graphviz_graph = pm.model_to_graphviz(model)
368
mermaid_diagram = pm.model_to_mermaid(model)
369
networkx_graph = pm.model_to_networkx(model)
370
371
# Visualize model structure
372
graphviz_graph.render('model_structure', format='png')
373
```
374
375
### Exception Handling
376
377
Handle PyMC-specific errors and warnings during model building and sampling.
378
379
```python { .api }
380
# Common PyMC exceptions
381
try:
382
trace = pm.sample(model=model)
383
except pm.SamplingError as e:
384
print(f"Sampling failed: {e}")
385
except pm.ShapeError as e:
386
print(f"Shape mismatch: {e}")
387
except pm.IncorrectArgumentsError as e:
388
print(f"Invalid arguments: {e}")
389
```
390
391
## Model Workflows
392
393
PyMC supports the complete Bayesian modeling workflow:
394
395
1. **Model Specification**: Define priors, likelihood, and model structure
396
2. **Prior Predictive Checking**: Sample from priors to validate model setup
397
3. **Inference**: Fit model using MCMC, variational inference, or optimization
398
4. **Posterior Analysis**: Examine convergence diagnostics and parameter estimates
399
5. **Posterior Predictive Checking**: Validate model fit with out-of-sample predictions
400
6. **Model Comparison**: Compare different models using information criteria
401
402
PyMC integrates seamlessly with ArviZ for visualization and diagnostics, providing a complete toolkit for Bayesian analysis in Python.