0
# PyMC Probability Distributions
1
2
PyMC provides a comprehensive collection of 80+ probability distributions for modeling various types of data and uncertainty. All distributions support automatic differentiation, efficient sampling, and seamless integration with PyMC's inference engines.
3
4
## Core Distribution Interface
5
6
All PyMC distributions inherit from a common base class and share consistent APIs:
7
8
```python { .api }
9
from pymc import Distribution, Continuous, Discrete
10
11
class Distribution:
12
"""
13
Base class for all PyMC probability distributions.
14
15
Parameters:
16
- name (str): Name of the random variable
17
- observed (array-like, optional): Observed data values
18
- dims (tuple, optional): Dimension names for the variable
19
- **kwargs: Distribution-specific parameters
20
"""
21
22
def __init__(self, name, observed=None, dims=None, **kwargs):
23
pass
24
25
def random(self, point=None, size=None):
26
"""Generate random samples from the distribution."""
27
pass
28
29
def logp(self, value):
30
"""Calculate log-probability density/mass."""
31
pass
32
```
33
34
## Continuous Distributions
35
36
### Normal Distribution
37
38
The Gaussian/normal distribution is fundamental for continuous modeling:
39
40
```python { .api }
41
from pymc import Normal
42
43
# Basic normal distribution
44
x = Normal('x', mu=0, sigma=1)
45
46
# With array parameters
47
means = Normal('means', mu=[0, 1, 2], sigma=[1, 0.5, 2])
48
49
# Multivariate case - use MvNormal instead
50
```
51
52
### Beta Distribution
53
54
Continuous distribution on the interval [0, 1], commonly used for probabilities:
55
56
```python { .api }
57
from pymc import Beta
58
59
# Beta distribution for probability parameter
60
p = Beta('p', alpha=2, beta=5)
61
62
# Parameterized by mean and concentration
63
prob = Beta('prob', mu=0.3, sigma=0.1) # Alternative parameterization
64
```
65
66
### Gamma Distribution
67
68
Continuous distribution for positive values, often used for rate parameters:
69
70
```python { .api }
71
from pymc import Gamma
72
73
# Rate parameter
74
rate = Gamma('rate', alpha=2, beta=1)
75
76
# Scale parameterization
77
scale_param = Gamma('scale_param', alpha=2, beta=1)
78
79
# Inverse-Gamma for variance parameters
80
variance = pm.InverseGamma('variance', alpha=1, beta=1)
81
```
82
83
### Student's t-Distribution
84
85
Heavy-tailed alternative to normal distribution:
86
87
```python { .api }
88
from pymc import StudentT
89
90
# Standard Student's t
91
t_var = StudentT('t_var', nu=3, mu=0, sigma=1)
92
93
# Half Student's t for positive values
94
positive_t = pm.HalfStudentT('positive_t', nu=3, sigma=1)
95
```
96
97
### Exponential and Related Distributions
98
99
```python { .api }
100
from pymc import Exponential, Laplace, LogNormal
101
102
# Exponential distribution
103
rate_param = Exponential('rate_param', lam=1.0)
104
105
# Laplace (double exponential) distribution
106
location = Laplace('location', mu=0, b=1)
107
108
# Log-normal distribution
109
log_scale = LogNormal('log_scale', mu=0, sigma=1)
110
```
111
112
### Uniform Distributions
113
114
```python { .api }
115
from pymc import Uniform, HalfFlat, Flat
116
117
# Uniform distribution on interval
118
uniform_var = Uniform('uniform_var', lower=0, upper=10)
119
120
# Flat priors (improper uniform distributions)
121
flat_prior = Flat('flat_prior')
122
half_flat_prior = HalfFlat('half_flat_prior') # For positive values
123
```
124
125
### Specialized Continuous Distributions
126
127
```python { .api }
128
# Cauchy distribution (heavy tails)
129
cauchy_var = pm.Cauchy('cauchy_var', alpha=0, beta=1)
130
131
# Chi-squared distribution
132
chi2_var = pm.ChiSquared('chi2_var', nu=3)
133
134
# Pareto distribution (power law)
135
pareto_var = pm.Pareto('pareto_var', alpha=1, m=1)
136
137
# Weibull distribution
138
weibull_var = pm.Weibull('weibull_var', alpha=1, beta=2)
139
140
# Von Mises (circular) distribution
141
circular_var = pm.VonMises('circular_var', mu=0, kappa=1)
142
143
# Skew-normal distribution
144
skewed_var = pm.SkewNormal('skewed_var', mu=0, sigma=1, alpha=3)
145
146
# Asymmetric Laplace distribution
147
asymmetric_laplace = pm.AsymmetricLaplace('asymmetric_laplace', b=1, kappa=0.5, mu=0)
148
149
# Kumaraswamy distribution (alternative to Beta)
150
kumaraswamy = pm.Kumaraswamy('kumaraswamy', a=2, b=3)
151
152
# Rice distribution (Rician distribution)
153
rice_var = pm.Rice('rice_var', nu=1, sigma=1)
154
155
# Ex-Gaussian distribution (exponentially modified Gaussian)
156
ex_gaussian = pm.ExGaussian('ex_gaussian', mu=0, sigma=1, nu=0.5)
157
158
# Moyal distribution
159
moyal_var = pm.Moyal('moyal_var', mu=0, sigma=1)
160
161
# Logit-normal distribution
162
logit_normal = pm.LogitNormal('logit_normal', mu=0, sigma=1)
163
164
# Dirac delta distribution (point mass)
165
dirac_delta = pm.DiracDelta('dirac_delta', c=0)
166
```
167
168
## Discrete Distributions
169
170
### Bernoulli and Binomial
171
172
Basic discrete distributions for binary and count data:
173
174
```python { .api }
175
from pymc import Bernoulli, Binomial, BetaBinomial
176
177
# Bernoulli for binary outcomes
178
success = Bernoulli('success', p=0.7)
179
180
# Binomial for count of successes
181
num_successes = Binomial('num_successes', n=10, p=0.3)
182
183
# Beta-binomial for overdispersed counts
184
overdispersed = BetaBinomial('overdispersed', alpha=2, beta=5, n=20)
185
```
186
187
### Poisson and Related Distributions
188
189
```python { .api }
190
from pymc import Poisson, NegativeBinomial, ZeroInflatedPoisson
191
192
# Poisson distribution for count data
193
counts = Poisson('counts', mu=3.5)
194
195
# Negative binomial for overdispersed counts
196
overdispersed_counts = NegativeBinomial('overdispersed_counts', mu=3, alpha=2)
197
198
# Zero-inflated Poisson
199
zero_inflated = ZeroInflatedPoisson('zero_inflated', psi=0.2, mu=3)
200
```
201
202
### Categorical Distributions
203
204
```python { .api }
205
from pymc import Categorical, DiscreteUniform
206
207
# Categorical distribution
208
category = Categorical('category', p=[0.2, 0.3, 0.5])
209
210
# Discrete uniform distribution
211
discrete_uniform = DiscreteUniform('discrete_uniform', lower=1, upper=6)
212
```
213
214
### Other Discrete Distributions
215
216
```python { .api }
217
# Geometric distribution
218
waiting_time = pm.Geometric('waiting_time', p=0.1)
219
220
# Hypergeometric distribution
221
hypergeom = pm.HyperGeometric('hypergeom', N=20, K=7, n=12)
222
223
# Constant distribution
224
constant_val = pm.ConstantDist('constant_val', c=5)
225
226
# Discrete Weibull distribution
227
discrete_weibull = pm.DiscreteWeibull('discrete_weibull', q=0.8, beta=1.5)
228
229
# Ordered distributions for ordinal data
230
ordered_logistic = pm.OrderedLogistic('ordered_logistic', eta=0, cutpoints=[1, 2, 3])
231
ordered_probit = pm.OrderedProbit('ordered_probit', eta=0, cutpoints=[1, 2, 3])
232
```
233
234
## Multivariate Distributions
235
236
### Multivariate Normal
237
238
The cornerstone of multivariate continuous modeling:
239
240
```python { .api }
241
from pymc import MvNormal
242
import numpy as np
243
244
# With covariance matrix
245
mu = np.zeros(3)
246
cov = np.eye(3)
247
mv_normal = MvNormal('mv_normal', mu=mu, cov=cov)
248
249
# With precision matrix
250
prec = np.eye(3)
251
mv_normal_prec = MvNormal('mv_normal_prec', mu=mu, tau=prec)
252
253
# With Cholesky decomposition
254
chol = np.linalg.cholesky(cov)
255
mv_normal_chol = MvNormal('mv_normal_chol', mu=mu, chol=chol)
256
```
257
258
### Dirichlet Distribution
259
260
For probability vectors that sum to 1:
261
262
```python { .api }
263
from pymc import Dirichlet
264
265
# Dirichlet distribution for probability simplex
266
a = np.ones(4) * 2 # Concentration parameters
267
prob_vector = Dirichlet('prob_vector', a=a)
268
269
# Stick-breaking construction
270
stick_weights = pm.StickBreakingWeights('stick_weights', alpha=1, K=5)
271
```
272
273
### Matrix and Structured Distributions
274
275
```python { .api }
276
# Wishart distribution for covariance matrices
277
wishart_cov = pm.Wishart('wishart_cov', nu=5, V=np.eye(3))
278
279
# LKJ correlation matrix prior
280
corr_matrix = pm.LKJCorr('corr_matrix', n=3, eta=2)
281
282
# LKJ Cholesky covariance
283
lkj_cov = pm.LKJCholeskyCov('lkj_cov', n=3, eta=2, sd_dist=pm.HalfNormal.dist(1))
284
285
# Matrix normal distribution
286
matrix_normal = pm.MatrixNormal('matrix_normal', mu=np.zeros((3, 4)),
287
rowcov=np.eye(3), colcov=np.eye(4))
288
289
# Zero-sum normal (constraint: sum to zero)
290
zero_sum_normal = pm.ZeroSumNormal('zero_sum_normal', sigma=1, shape=4)
291
292
# Polyamacher-Gamma distribution for sparse modeling
293
polya_gamma = pm.PolyaGamma('polya_gamma', h=1, z=0)
294
```
295
296
### Multinomial Distributions
297
298
```python { .api }
299
from pymc import Multinomial, DirichletMultinomial
300
301
# Multinomial distribution
302
n_trials = 100
303
p_categories = [0.2, 0.3, 0.5]
304
multinomial_counts = Multinomial('multinomial_counts', n=n_trials, p=p_categories)
305
306
# Dirichlet-multinomial (overdispersed)
307
dirichlet_mult = DirichletMultinomial('dirichlet_mult', n=100, a=[2, 3, 5])
308
```
309
310
## Time Series Distributions
311
312
### Random Walk Processes
313
314
```python { .api }
315
from pymc import RandomWalk, GaussianRandomWalk, MvGaussianRandomWalk
316
317
# Basic random walk
318
random_walk = RandomWalk('random_walk', mu=0.1, sigma=1, steps=100)
319
320
# Gaussian random walk (more efficient)
321
gaussian_rw = GaussianRandomWalk('gaussian_rw', mu=0, sigma=1, steps=100)
322
323
# Multivariate Gaussian random walk
324
mv_rw = MvGaussianRandomWalk('mv_rw', mu=np.zeros(2), chol=np.eye(2), steps=100)
325
```
326
327
### Autoregressive Processes
328
329
```python { .api }
330
from pymc import AR
331
332
# Autoregressive process
333
ar_coeffs = [0.8, -0.2] # AR(2) coefficients
334
ar_process = AR('ar_process', rho=ar_coeffs, sigma=1, steps=100)
335
336
# With constant term
337
ar_with_constant = AR('ar_with_constant', rho=ar_coeffs, sigma=1,
338
constant=True, steps=100)
339
```
340
341
### Volatility Models
342
343
```python { .api }
344
from pymc import GARCH11
345
346
# GARCH(1,1) process for volatility modeling
347
garch_process = GARCH11('garch_process', omega=0.01, alpha_1=0.1,
348
beta_1=0.8, initial_vol=0.1, steps=100)
349
```
350
351
## Mixture Distributions
352
353
### General Mixture Models
354
355
```python { .api }
356
from pymc import Mixture, NormalMixture
357
358
# General mixture distribution
359
components = [pm.Normal.dist(mu=0, sigma=1), pm.Normal.dist(mu=5, sigma=2)]
360
weights = [0.3, 0.7]
361
mixture = Mixture('mixture', w=weights, comp_dists=components)
362
363
# Normal mixture (optimized implementation)
364
normal_mixture = NormalMixture('normal_mixture', w=[0.4, 0.6],
365
mu=[0, 3], sigma=[1, 1.5])
366
```
367
368
### Zero-Inflated Models
369
370
```python { .api }
371
# Zero-inflated Poisson
372
zip_model = pm.ZeroInflatedPoisson('zip_model', psi=0.3, mu=2.5)
373
374
# Zero-inflated binomial
375
zib_model = pm.ZeroInflatedBinomial('zib_model', psi=0.2, n=10, p=0.4)
376
377
# Zero-inflated negative binomial
378
zinb_model = pm.ZeroInflatedNegativeBinomial('zinb_model', psi=0.1, mu=3, alpha=2)
379
```
380
381
### Hurdle Models
382
383
```python { .api }
384
# Hurdle Poisson (two-part model)
385
hurdle_poisson = pm.HurdlePoisson('hurdle_poisson', psi=0.3, mu=2.5)
386
387
# Hurdle negative binomial
388
hurdle_nb = pm.HurdleNegativeBinomial('hurdle_nb', psi=0.2, mu=3, alpha=2)
389
390
# Hurdle log-normal
391
hurdle_lognorm = pm.HurdleLogNormal('hurdle_lognorm', psi=0.4, mu=1, sigma=0.5)
392
```
393
394
## Distribution Transformations
395
396
### Truncated Distributions
397
398
```python { .api }
399
from pymc import Truncated
400
401
# Truncated normal distribution
402
truncated_normal = Truncated('truncated_normal',
403
pm.Normal.dist(mu=0, sigma=1),
404
lower=-2, upper=2)
405
406
# Truncated exponential
407
truncated_exp = Truncated('truncated_exp',
408
pm.Exponential.dist(lam=1),
409
lower=0.1, upper=5)
410
```
411
412
### Censored Distributions
413
414
```python { .api }
415
from pymc import Censored
416
417
# Left-censored normal distribution
418
censored_normal = Censored('censored_normal',
419
pm.Normal.dist(mu=0, sigma=1),
420
lower=None, upper=2)
421
422
# Interval-censored data
423
interval_censored = Censored('interval_censored',
424
pm.LogNormal.dist(mu=0, sigma=1),
425
lower=0.5, upper=3)
426
```
427
428
## Custom Distributions
429
430
### CustomDist for New Distributions
431
432
```python { .api }
433
from pymc import CustomDist
434
import pytensor.tensor as pt
435
436
def custom_logp(value, mu, sigma):
437
"""Custom log-probability function."""
438
return pm.logp(pm.Normal.dist(mu=mu, sigma=sigma), value)
439
440
def custom_random(mu, sigma, rng=None, size=None):
441
"""Custom random sampling function."""
442
return rng.normal(mu, sigma, size=size)
443
444
# Create custom distribution
445
custom_dist = CustomDist('custom_dist',
446
mu=0, sigma=1,
447
logp=custom_logp,
448
random=custom_random)
449
```
450
451
### DensityDist for Likelihood-Only Models
452
453
```python { .api }
454
from pymc import DensityDist
455
456
def custom_likelihood(value, theta):
457
"""Custom likelihood function."""
458
return pt.sum(value * pt.log(theta) - theta)
459
460
# Density-only distribution
461
density_dist = DensityDist('density_dist',
462
theta=1.5,
463
logp=custom_likelihood,
464
observed=observed_data)
465
```
466
467
## Distribution Utilities
468
469
### Drawing Samples
470
471
```python { .api }
472
# Draw samples from distributions
473
samples = pm.draw(Normal.dist(mu=0, sigma=1), draws=1000)
474
475
# Draw from multiple distributions
476
multi_samples = pm.draw([Normal.dist(0, 1), Beta.dist(2, 3)], draws=500)
477
```
478
479
### Working with Distribution Objects
480
481
```python { .api }
482
# Create distribution objects (not random variables)
483
normal_dist = pm.Normal.dist(mu=0, sigma=1)
484
beta_dist = pm.Beta.dist(alpha=2, beta=5)
485
486
# Use in transformations
487
transformed = pm.Deterministic('transformed', pm.logit(beta_dist))
488
```
489
490
## Usage Patterns
491
492
### Hierarchical Models
493
494
```python { .api }
495
# Hierarchical normal model
496
with pm.Model() as hierarchical:
497
# Hyperparameters
498
mu_mu = pm.Normal('mu_mu', 0, 10)
499
sigma_mu = pm.HalfNormal('sigma_mu', 5)
500
501
# Group means
502
mu = pm.Normal('mu', mu=mu_mu, sigma=sigma_mu, shape=n_groups)
503
504
# Observations
505
sigma = pm.HalfNormal('sigma', 2)
506
y = pm.Normal('y', mu=mu[group_idx], sigma=sigma, observed=data)
507
```
508
509
### Mixture of Experts
510
511
```python { .api }
512
# Mixture of regression models
513
with pm.Model() as mixture_regression:
514
# Mixture weights
515
w = pm.Dirichlet('w', a=np.ones(n_components))
516
517
# Component-specific parameters
518
beta = pm.Normal('beta', 0, 1, shape=(n_components, n_features))
519
sigma = pm.HalfNormal('sigma', 1, shape=n_components)
520
521
# Component means
522
mu = pm.math.dot(X, beta.T) # Shape: (n_obs, n_components)
523
524
# Mixture likelihood
525
components = [pm.Normal.dist(mu=mu[:, i], sigma=sigma[i])
526
for i in range(n_components)]
527
y = pm.Mixture('y', w=w, comp_dists=components, observed=data)
528
```
529
530
PyMC's distribution system provides a flexible foundation for building complex probabilistic models. The consistent API across all distributions enables easy composition and transformation of uncertainty representations.