A library for Probabilistic Graphical Models
npx @tessl/cli install tessl/pypi-pgmpy@1.0.00
# pgmpy
1
2
A comprehensive Python library for working with Probabilistic Graphical Models, specifically Bayesian Networks and related models including Directed Acyclic Graphs, Dynamic Bayesian Networks, and Structural Equation Models. The library combines features from causal inference and probabilistic inference literature, enabling users to seamlessly work between these domains. It implements algorithms for structure learning, causal discovery, parameter estimation, probabilistic and causal inference, and simulations.
3
4
## Package Information
5
6
- **Package Name**: pgmpy
7
- **Language**: Python
8
- **Installation**: `pip install pgmpy`
9
- **Version**: 1.0.0
10
- **Documentation**: https://pgmpy.org/
11
12
## Core Imports
13
14
```python
15
import pgmpy
16
```
17
18
Common imports for specific functionality:
19
20
```python
21
# Core models
22
from pgmpy.models import DiscreteBayesianNetwork, BayesianNetwork
23
from pgmpy.models import MarkovNetwork, FactorGraph, JunctionTree
24
from pgmpy.models import DynamicBayesianNetwork, NaiveBayes
25
26
# Factors and distributions
27
from pgmpy.factors.discrete import TabularCPD, DiscreteFactor
28
from pgmpy.factors import FactorSet, FactorDict
29
30
# Inference algorithms
31
from pgmpy.inference import VariableElimination, BeliefPropagation
32
from pgmpy.inference import CausalInference
33
34
# Learning and estimation
35
from pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator
36
from pgmpy.estimators import HillClimbSearch, PC
37
38
# Sampling
39
from pgmpy.sampling import BayesianModelSampling, GibbsSampling
40
41
# File I/O
42
from pgmpy.readwrite import BIFReader, BIFWriter
43
44
# Independence and utilities
45
from pgmpy.independencies import Independencies
46
```
47
48
## Basic Usage
49
50
```python
51
from pgmpy.models import DiscreteBayesianNetwork
52
from pgmpy.factors.discrete import TabularCPD
53
from pgmpy.inference import VariableElimination
54
55
# Create a simple Bayesian Network
56
model = DiscreteBayesianNetwork([('A', 'C'), ('B', 'C')])
57
58
# Define conditional probability distributions
59
cpd_a = TabularCPD(variable='A', variable_card=2, values=[[0.7], [0.3]])
60
cpd_b = TabularCPD(variable='B', variable_card=2, values=[[0.6], [0.4]])
61
cpd_c = TabularCPD(variable='C', variable_card=2,
62
values=[[0.9, 0.8, 0.2, 0.1],
63
[0.1, 0.2, 0.8, 0.9]],
64
evidence=['A', 'B'], evidence_card=[2, 2])
65
66
# Add CPDs to the model
67
model.add_cpds(cpd_a, cpd_b, cpd_c)
68
69
# Check model validity
70
assert model.check_model()
71
72
# Perform inference
73
inference = VariableElimination(model)
74
result = inference.query(variables=['C'], evidence={'A': 1})
75
print(result)
76
```
77
78
## Configuration
79
80
Global configuration for compute backend and performance settings:
81
82
```python { .api }
83
import pgmpy
84
85
# Access global configuration
86
config = pgmpy.config
87
88
# Set compute backend (numpy or torch)
89
config.set_backend('torch', device='cuda')
90
91
# Control progress bars
92
config.set_show_progress(True)
93
94
# Set data type precision
95
config.set_dtype('float64')
96
```
97
98
## Architecture
99
100
pgmpy is organized around several key concepts:
101
102
- **Models**: Graph structures representing probabilistic relationships (Bayesian Networks, Markov Networks, Factor Graphs)
103
- **Factors**: Probability distributions and conditional probability tables
104
- **Inference**: Algorithms for computing marginal and conditional probabilities
105
- **Estimators**: Methods for learning model structure and parameters from data
106
- **Sampling**: Techniques for generating samples from probabilistic models
107
108
The library supports both discrete and continuous variables, exact and approximate inference methods, and provides extensive functionality for model evaluation and validation.
109
110
## Capabilities
111
112
### Model Creation and Management
113
114
Core model classes for creating and managing probabilistic graphical models, including Bayesian networks, Markov networks, and factor graphs.
115
116
```python { .api }
117
class DiscreteBayesianNetwork:
118
def __init__(self, ebunch=None, latents=set(), lavaan_str=None, dagitty_str=None): ...
119
def add_edge(self, u, v, **kwargs): ...
120
def add_cpds(self, *cpds): ...
121
def check_model(self): ...
122
def predict(self, data, variables=None, n_jobs=1): ...
123
def simulate(self, n_samples, do=None, evidence=None): ...
124
```
125
126
[Models and Graph Structures](./models.md)
127
128
### Probability Factors and Distributions
129
130
Representations of probability distributions including discrete factors, conditional probability distributions, and continuous distributions.
131
132
```python { .api }
133
class TabularCPD:
134
def __init__(self, variable, variable_card, values, evidence=None, evidence_card=None): ...
135
def normalize(self, inplace=True): ...
136
def marginalize(self, variables, inplace=True): ...
137
def to_factor(self): ...
138
139
class DiscreteFactor:
140
def __init__(self, variables, cardinality, values): ...
141
def product(self, phi1, inplace=True): ...
142
def marginalize(self, variables, inplace=True): ...
143
def reduce(self, values, inplace=True): ...
144
```
145
146
[Probability Factors](./factors.md)
147
148
### Probabilistic Inference
149
150
Exact and approximate inference algorithms for computing marginal probabilities, MAP queries, and causal inference.
151
152
```python { .api }
153
class VariableElimination:
154
def __init__(self, model): ...
155
def query(self, variables, evidence=None, elimination_order="MinFill"): ...
156
def map_query(self, variables=None, evidence=None): ...
157
158
class BeliefPropagation:
159
def __init__(self, model): ...
160
def calibrate(self): ...
161
def query(self, variables, evidence=None): ...
162
163
class CausalInference:
164
def __init__(self, model): ...
165
def estimate_ate(self, treatment, outcome, common_causes=None): ...
166
```
167
168
[Inference Algorithms](./inference.md)
169
170
### Structure and Parameter Learning
171
172
Algorithms for learning model structure from data and estimating parameters, including constraint-based, score-based, and hybrid approaches.
173
174
```python { .api }
175
class MaximumLikelihoodEstimator:
176
def __init__(self, model, data): ...
177
def get_parameters(self, n_jobs=1): ...
178
def estimate_cpd(self, node): ...
179
180
class HillClimbSearch:
181
def __init__(self, data, use_cache=True): ...
182
def estimate(self, start=None, max_indegree=None): ...
183
184
class PC:
185
def __init__(self, data): ...
186
def estimate(self, variant="stable", ci_test="chi_square"): ...
187
```
188
189
[Learning Algorithms](./learning.md)
190
191
### Data Import/Export and Sampling
192
193
File I/O capabilities for various formats and sampling algorithms for generating data from probabilistic models.
194
195
```python { .api }
196
class BayesianModelSampling:
197
def __init__(self, model): ...
198
def forward_sample(self, size=1, seed=None): ...
199
def rejection_sample(self, evidence=[], size=1): ...
200
201
# File format readers/writers
202
class BIFReader:
203
def __init__(self, path): ...
204
def get_model(self): ...
205
206
class BIFWriter:
207
def __init__(self, model): ...
208
def write_bif(self, filename): ...
209
```
210
211
[Data I/O and Sampling](./data-io.md)
212
213
### Model Evaluation and Metrics
214
215
Functions for evaluating model quality, computing metrics, and validating learned structures.
216
217
```python { .api }
218
def log_likelihood_score(model, data): ...
219
def structure_score(model, data, scoring_method="bic-g"): ...
220
def correlation_score(model, data, test="chi_square"): ...
221
def SHD(true_model, est_model): ...
222
```
223
224
[Evaluation and Metrics](./evaluation.md)
225
226
### Independence and Graph Structure
227
228
Classes for representing conditional independence relationships and graph structures used as foundations for probabilistic models.
229
230
```python { .api }
231
class Independencies:
232
def __init__(self, assertions=None): ...
233
def add_assertions(self, *assertions): ...
234
def get_assertions(self): ...
235
236
class IndependenceAssertion:
237
def __init__(self, event1, event2, event3=[]): ...
238
239
# Base graph structures
240
class DAG:
241
def __init__(self, ebunch=None): ...
242
def add_edges_from(self, ebunch): ...
243
def is_dag(self): ...
244
245
class UndirectedGraph:
246
def __init__(self, ebunch=None): ...
247
def add_edge(self, u, v): ...
248
```
249
250
### Utility Functions and Classes
251
252
Helper functions and classes for data processing, mathematical operations, and state management.
253
254
```python { .api }
255
# Math and data utilities
256
def cartesian(*arrays):
257
"""Cartesian product of input arrays."""
258
259
def sample_discrete(distribution, size=1, seed=None):
260
"""Sample from discrete probability distribution."""
261
262
def discretize(data, cardinality, labels=dict(), method="rounding"):
263
"""Discretize continuous data into discrete bins."""
264
265
def preprocess_data(df):
266
"""Preprocess data for pgmpy models."""
267
268
def get_example_model(model):
269
"""Get predefined example model by name."""
270
271
# Optimization utilities
272
def optimize(func, x0, method='L-BFGS-B'):
273
"""Optimization wrapper function."""
274
275
def pinverse(a, rcond=1e-15):
276
"""Compute Moore-Penrose pseudoinverse."""
277
278
# State name handling
279
class StateNameMixin:
280
"""Mixin class for handling variable state names."""
281
282
# External utilities
283
def tabulate(data, headers=None):
284
"""Format data as a table."""
285
```
286
287
## Types
288
289
```python { .api }
290
# Configuration class
291
class Config:
292
def set_backend(self, backend: str, device: str = None, dtype = None): ...
293
def get_backend(self) -> str: ...
294
def set_device(self, device: str = None): ...
295
def get_device(self): ...
296
def set_dtype(self, dtype = None): ...
297
def get_dtype(self): ...
298
def set_show_progress(self, show_progress: bool): ...
299
def get_show_progress(self) -> bool: ...
300
301
# Common data structures
302
StateNameType = Dict[str, List[str]]
303
EvidenceType = Dict[str, int]
304
VariableCardType = Dict[str, int]
305
```