Process mining library for discovering, analyzing and visualizing business processes from event data
npx @tessl/cli install tessl/pypi-pm4py@2.7.00
# PM4PY
1
2
A comprehensive Python library for process mining providing extensive functionality for reading, writing, discovering, analyzing, and visualizing process models and event logs. PM4PY supports traditional event logs and Object-Centric Event Logs (OCEL), offering 280+ API functions across multiple process mining paradigms.
3
4
## Package Information
5
6
- **Package Name**: pm4py
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install pm4py`
10
- **Documentation**: [https://pm4py.fit.fraunhofer.de/](https://pm4py.fit.fraunhofer.de/)
11
- **Version**: 2.7.17
12
13
## Core Imports
14
15
```python
16
import pm4py
17
```
18
19
Common pattern for accessing functionality:
20
21
```python
22
# Read event logs
23
from pm4py import read_xes, read_ocel
24
25
# Process discovery
26
from pm4py import discover_petri_net_inductive, discover_dfg
27
28
# Conformance checking
29
from pm4py import fitness_alignments, conformance_diagnostics_alignments
30
31
# Visualization
32
from pm4py import view_petri_net, view_dfg
33
34
# Filtering
35
from pm4py import filter_variants_top_k, filter_start_activities
36
```
37
38
## Basic Usage
39
40
```python
41
import pm4py
42
import pandas as pd
43
44
# Read event log from XES file
45
log = pm4py.read_xes('event_log.xes')
46
47
# Alternative: Work with DataFrame
48
df = pd.read_csv('event_data.csv')
49
log = pm4py.format_dataframe(df, case_id='case_id',
50
activity_key='activity',
51
timestamp_key='timestamp')
52
53
# Process discovery - discover process model
54
net, initial_marking, final_marking = pm4py.discover_petri_net_inductive(log)
55
56
# Conformance checking - measure fitness
57
fitness = pm4py.fitness_alignments(log, net, initial_marking, final_marking)
58
print(f"Fitness: {fitness['log_fitness']}")
59
60
# Visualization
61
pm4py.view_petri_net(net, initial_marking, final_marking)
62
63
# Filtering - keep top 10 most frequent variants
64
filtered_log = pm4py.filter_variants_top_k(log, 10)
65
66
# Statistics
67
start_activities = pm4py.get_start_activities(log)
68
variants = pm4py.get_variants_as_tuples(log)
69
```
70
71
## Architecture
72
73
PM4PY is structured around several key components:
74
75
### Data Objects
76
- **EventLog/DataFrame**: Traditional event logs with case-activity-timestamp structure
77
- **OCEL (Object-Centric Event Logs)**: Multi-dimensional event logs with objects and relationships
78
- **PetriNet**: Process models with places, transitions, and markings
79
- **ProcessTree**: Hierarchical process representations
80
- **BPMN**: Business Process Model and Notation objects
81
82
### Processing Pipeline
83
1. **Data Input**: Read various formats (XES, CSV, PNML, BPMN, OCEL formats)
84
2. **Data Preparation**: Format, filter, and preprocess event data
85
3. **Process Discovery**: Extract process models from event logs
86
4. **Conformance Checking**: Measure model-log alignment and fitness
87
5. **Enhancement**: Enrich models with performance, organizational data
88
6. **Visualization**: Generate visual representations
89
7. **Export**: Write results in multiple formats
90
91
### Algorithm Categories
92
- **Classical Discovery**: Alpha Miner, Heuristics Miner, ILP Miner
93
- **Modern Discovery**: Inductive Miner, POWL, Declare discovery
94
- **Conformance**: Token-based replay, alignments, temporal conformance
95
- **Object-Centric**: OCEL-specific discovery and conformance methods
96
97
## Capabilities
98
99
### I/O Operations
100
101
Comprehensive support for reading and writing process mining data in various formats including XES, PNML, BPMN, and Object-Centric Event Log formats.
102
103
```python { .api }
104
def read_xes(file_path, variant=None, return_legacy_log_object=False, encoding='utf-8', **kwargs): ...
105
def write_xes(log, file_path, case_id_key='case:concept:name', extensions=None, encoding='utf-8', **kwargs): ...
106
def read_ocel(file_path, objects_path=None, encoding='utf-8'): ...
107
def write_ocel(ocel, file_path, objects_path=None, encoding='utf-8'): ...
108
```
109
110
[Reading and Writing Operations](./reading-writing.md)
111
112
### Process Discovery
113
114
Algorithms for discovering process models from event logs, including classical miners (Alpha, Heuristics) and modern techniques (Inductive Miner, POWL).
115
116
```python { .api }
117
def discover_petri_net_inductive(log, noise_threshold=0.0, multi_processing=True, activity_key='concept:name', **kwargs): ...
118
def discover_process_tree_inductive(log, noise_threshold=0.0, multi_processing=True, **kwargs): ...
119
def discover_dfg(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'): ...
120
def discover_heuristics_net(log, dependency_threshold=0.5, and_threshold=0.65, **kwargs): ...
121
```
122
123
[Process Discovery Algorithms](./process-discovery.md)
124
125
### Conformance Checking
126
127
Methods for measuring how well process models align with event logs, including fitness, precision, and diagnostic capabilities.
128
129
```python { .api }
130
def fitness_alignments(log, petri_net, initial_marking, final_marking, multi_processing=True, **kwargs): ...
131
def conformance_diagnostics_alignments(log, petri_net, initial_marking, final_marking, **kwargs): ...
132
def fitness_token_based_replay(log, petri_net, initial_marking, final_marking, **kwargs): ...
133
def precision_alignments(log, petri_net, initial_marking, final_marking, **kwargs): ...
134
```
135
136
[Conformance Checking and Fitness](./conformance-checking.md)
137
138
### Filtering Operations
139
140
Comprehensive filtering capabilities for event logs and OCEL including behavioral, temporal, organizational, and structural filters.
141
142
```python { .api }
143
def filter_variants_top_k(log, k, activity_key='concept:name', **kwargs): ...
144
def filter_start_activities(log, activities, retain=True, **kwargs): ...
145
def filter_time_range(log, dt1, dt2, **kwargs): ...
146
def filter_case_performance(log, min_performance, max_performance, **kwargs): ...
147
```
148
149
[Filtering Operations](./filtering.md)
150
151
### Visualization
152
153
Extensive visualization capabilities for process models, statistics, and analysis results with both viewing and saving options.
154
155
```python { .api }
156
def view_petri_net(petri_net, initial_marking=None, final_marking=None, format='png', **kwargs): ...
157
def view_dfg(dfg, start_activities=None, end_activities=None, format='png', **kwargs): ...
158
def save_vis_process_tree(tree, file_path, **kwargs): ...
159
def view_dotted_chart(log, **kwargs): ...
160
```
161
162
[Visualization Functions](./visualization.md)
163
164
### Object-Centric Process Mining
165
166
Specialized operations for Object-Centric Event Logs (OCEL) including discovery, analysis, and manipulation of multi-dimensional process data.
167
168
```python { .api }
169
def ocel_flattening(ocel, object_type): ...
170
def discover_ocdfg(ocel, **kwargs): ...
171
def discover_oc_petri_net(ocel, **kwargs): ...
172
def ocel_objects_interactions_summary(ocel): ...
173
```
174
175
[Object-Centric Operations](./object-centric.md)
176
177
### Statistics and Analysis
178
179
Statistical analysis functions for process behavior, performance metrics, and advanced analytical operations.
180
181
```python { .api }
182
def get_variants_as_tuples(log, activity_key='concept:name', **kwargs): ...
183
def get_case_duration(log, timestamp_key='time:timestamp', case_id_key='case:concept:name'): ...
184
def get_start_activities(log, **kwargs): ...
185
def check_soundness(petri_net, initial_marking, final_marking): ...
186
```
187
188
[Statistics and Analysis](./statistics-analysis.md)
189
190
### Utilities and Conversion
191
192
Utility functions for data manipulation, format conversion, and model transformation between different representations.
193
194
```python { .api }
195
def format_dataframe(df, case_id='case:concept:name', activity_key='concept:name', **kwargs): ...
196
def convert_to_petri_net(*args, **kwargs): ...
197
def convert_to_process_tree(*args, **kwargs): ...
198
def serialize(obj, file_path): ...
199
```
200
201
[Utilities and Conversion](./utilities-conversion.md)
202
203
### Machine Learning and Organizational Mining
204
205
Machine learning features for predictive process analytics and organizational mining for resource and social network analysis.
206
207
```python { .api }
208
def extract_features_dataframe(log, **kwargs): ...
209
def split_train_test(log, train_percentage=0.8, **kwargs): ...
210
def discover_handover_of_work_network(log, beta=0, **kwargs): ...
211
def discover_organizational_roles(log, **kwargs): ...
212
```
213
214
[Machine Learning and Organizational Mining](./ml-organizational.md)
215
216
## Types
217
218
Complete type definitions for PM4PY objects referenced in the API.
219
220
```python { .api }
221
# Core Data Types
222
from typing import Dict, List, Tuple, Optional, Union, Any
223
import pandas as pd
224
225
# Event Log Types
226
EventLog = List[Dict[str, Any]] # Collection of events with attributes
227
EventStream = List[Dict[str, Any]] # Ordered sequence of events
228
229
# Process Model Types
230
class PetriNet:
231
"""Petri net with places, transitions, and arcs."""
232
places: List[Any]
233
transitions: List[Any]
234
arcs: List[Any]
235
236
class ProcessTree:
237
"""Hierarchical process tree representation."""
238
operator: str
239
children: List['ProcessTree']
240
label: Optional[str]
241
242
class BPMN:
243
"""Business Process Model and Notation object."""
244
nodes: List[Any]
245
flows: List[Any]
246
247
class HeuristicsNet:
248
"""Heuristics net representation."""
249
activities: List[str]
250
dependencies: Dict[Tuple[str, str], float]
251
252
# Discovery Types
253
DFG = Dict[Tuple[str, str], int] # Directly-Follows Graph
254
PerformanceDFG = Dict[Tuple[str, str], float] # Performance-annotated DFG
255
256
# OCEL Types
257
class OCEL:
258
"""Object-Centric Event Log."""
259
events: pd.DataFrame
260
objects: pd.DataFrame
261
relations: pd.DataFrame
262
263
class OCDFG:
264
"""Object-Centric Directly-Follows Graph."""
265
activities: List[str]
266
objects: List[str]
267
edges: Dict[Tuple[str, str], int]
268
269
# Conformance Types
270
AlignmentResult = Dict[str, Any] # Alignment computation results
271
FitnessResult = Dict[str, float] # Fitness measurement results
272
ReplayResult = Dict[str, Any] # Token-based replay results
273
274
# Marking Types
275
Marking = Dict[Any, int] # Petri net marking (place -> tokens)
276
277
# Analysis Types
278
VariantDict = Dict[Tuple[str, ...], int] # Process variants with frequencies
279
CaseDuration = Dict[str, float] # Case durations by case ID
280
ActivityStats = Dict[str, Any] # Activity statistics
281
```