0
# Process Discovery Algorithms
1
2
Comprehensive process discovery algorithms for extracting process models from event logs. PM4PY implements classical and modern discovery techniques including Alpha Miner, Heuristics Miner, Inductive Miner, and advanced approaches like POWL and DECLARE.
3
4
## Capabilities
5
6
### Petri Net Discovery
7
8
Discover Petri net models using various algorithms, each with different strengths for handling noise, loops, and complex control flow.
9
10
```python { .api }
11
def discover_petri_net_inductive(log, multi_processing=True, noise_threshold=0.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):
12
"""
13
Discover Petri net using Inductive Miner algorithm.
14
Best for handling noise and guaranteeing sound process models.
15
16
Parameters:
17
- log (Union[EventLog, pd.DataFrame]): Event log data
18
- multi_processing (bool): Enable parallel processing
19
- noise_threshold (float): Noise threshold (0.0-1.0)
20
- activity_key (str): Activity attribute name
21
- timestamp_key (str): Timestamp attribute name
22
- case_id_key (str): Case ID attribute name
23
- disable_fallthroughs (bool): Disable fallthrough operations
24
25
Returns:
26
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
27
"""
28
29
def discover_petri_net_alpha(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
30
"""
31
Discover Petri net using Alpha Miner algorithm.
32
Classical algorithm good for structured processes without noise.
33
34
Parameters:
35
- log (Union[EventLog, pd.DataFrame]): Event log data
36
- activity_key (str): Activity attribute name
37
- timestamp_key (str): Timestamp attribute name
38
- case_id_key (str): Case ID attribute name
39
40
Returns:
41
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
42
"""
43
44
def discover_petri_net_alpha_plus(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
45
"""
46
Discover Petri net using Alpha+ algorithm (deprecated in 2.3.0).
47
Enhanced Alpha Miner with improved loop handling.
48
49
Parameters:
50
- log (Union[EventLog, pd.DataFrame]): Event log data
51
- activity_key (str): Activity attribute name
52
- timestamp_key (str): Timestamp attribute name
53
- case_id_key (str): Case ID attribute name
54
55
Returns:
56
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
57
"""
58
59
def discover_petri_net_heuristics(log, dependency_threshold=0.5, and_threshold=0.65, loop_two_threshold=0.5, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
60
"""
61
Discover Petri net using Heuristics Miner algorithm.
62
Good balance between noise handling and model precision.
63
64
Parameters:
65
- log (Union[EventLog, pd.DataFrame]): Event log data
66
- dependency_threshold (float): Dependency threshold (0.0-1.0)
67
- and_threshold (float): AND-split threshold
68
- loop_two_threshold (float): Two-loop threshold
69
- activity_key (str): Activity attribute name
70
- timestamp_key (str): Timestamp attribute name
71
- case_id_key (str): Case ID attribute name
72
73
Returns:
74
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
75
"""
76
77
def discover_petri_net_ilp(log, alpha=1.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
78
"""
79
Discover Petri net using ILP (Integer Linear Programming) Miner.
80
Optimization-based approach for optimal model discovery.
81
82
Parameters:
83
- log (Union[EventLog, pd.DataFrame]): Event log data
84
- alpha (float): Alpha parameter for optimization
85
- activity_key (str): Activity attribute name
86
- timestamp_key (str): Timestamp attribute name
87
- case_id_key (str): Case ID attribute name
88
89
Returns:
90
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
91
"""
92
```
93
94
### Process Tree Discovery
95
96
Discover hierarchical process tree models that provide structured representations of process behavior.
97
98
```python { .api }
99
def discover_process_tree_inductive(log, noise_threshold=0.0, multi_processing=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):
100
"""
101
Discover process tree using Inductive Miner algorithm.
102
Guarantees sound, block-structured process models.
103
104
Parameters:
105
- log (Union[EventLog, pd.DataFrame]): Event log data
106
- noise_threshold (float): Noise threshold for filtering
107
- multi_processing (bool): Enable parallel processing
108
- activity_key (str): Activity attribute name
109
- timestamp_key (str): Timestamp attribute name
110
- case_id_key (str): Case ID attribute name
111
- disable_fallthroughs (bool): Disable fallthrough operations
112
113
Returns:
114
ProcessTree: Hierarchical process tree model
115
"""
116
```
117
118
### Graph-Based Discovery
119
120
Discover graph-based process representations including Directly-Follows Graphs and performance-enhanced variants.
121
122
```python { .api }
123
def discover_dfg(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
124
"""
125
Discover Directly-Follows Graph showing direct successor relationships.
126
127
Parameters:
128
- log (Union[EventLog, pd.DataFrame]): Event log data
129
- activity_key (str): Activity attribute name
130
- timestamp_key (str): Timestamp attribute name
131
- case_id_key (str): Case ID attribute name
132
133
Returns:
134
Tuple[dict, dict, dict]: (dfg_dict, start_activities, end_activities)
135
"""
136
137
def discover_dfg_typed(log, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp'):
138
"""
139
Discover typed DFG object with enhanced functionality.
140
141
Parameters:
142
- log (Union[EventLog, pd.DataFrame]): Event log data
143
- case_id_key (str): Case ID attribute name
144
- activity_key (str): Activity attribute name
145
- timestamp_key (str): Timestamp attribute name
146
147
Returns:
148
DFG: Typed directly-follows graph object
149
"""
150
151
def discover_directly_follows_graph(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
152
"""
153
Alias for discover_dfg function.
154
155
Parameters:
156
- log (Union[EventLog, pd.DataFrame]): Event log data
157
- activity_key (str): Activity attribute name
158
- timestamp_key (str): Timestamp attribute name
159
- case_id_key (str): Case ID attribute name
160
161
Returns:
162
Tuple[dict, dict, dict]: (dfg_dict, start_activities, end_activities)
163
"""
164
165
def discover_performance_dfg(log, business_hours=False, business_hour_slots=None, workcalendar=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', perf_aggregation_key='all'):
166
"""
167
Discover performance DFG with timing information between activities.
168
169
Parameters:
170
- log (Union[EventLog, pd.DataFrame]): Event log data
171
- business_hours (bool): Consider only business hours
172
- business_hour_slots (Optional[List]): Business hour time slots
173
- workcalendar (Optional): Work calendar for time calculations
174
- activity_key (str): Activity attribute name
175
- timestamp_key (str): Timestamp attribute name
176
- case_id_key (str): Case ID attribute name
177
- perf_aggregation_key (str): Performance aggregation method
178
179
Returns:
180
Tuple[dict, dict, dict]: (performance_dfg, start_activities, end_activities)
181
"""
182
183
def discover_eventually_follows_graph(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
184
"""
185
Discover eventually-follows relationships between activities.
186
187
Parameters:
188
- log (Union[EventLog, pd.DataFrame]): Event log data
189
- activity_key (str): Activity attribute name
190
- timestamp_key (str): Timestamp attribute name
191
- case_id_key (str): Case ID attribute name
192
193
Returns:
194
Dict[Tuple[str, str], int]: Eventually-follows relationships with frequencies
195
"""
196
```
197
198
### Heuristics Net Discovery
199
200
Discover heuristics nets that balance between precision and noise tolerance using frequency and dependency metrics.
201
202
```python { .api }
203
def discover_heuristics_net(log, dependency_threshold=0.5, and_threshold=0.65, loop_two_threshold=0.5, min_act_count=1, min_dfg_occurrences=1, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', decoration='frequency'):
204
"""
205
Discover heuristics net using frequency and dependency heuristics.
206
207
Parameters:
208
- log (Union[EventLog, pd.DataFrame]): Event log data
209
- dependency_threshold (float): Dependency threshold
210
- and_threshold (float): AND-split threshold
211
- loop_two_threshold (float): Two-loop threshold
212
- min_act_count (int): Minimum activity count
213
- min_dfg_occurrences (int): Minimum DFG occurrences
214
- activity_key (str): Activity attribute name
215
- timestamp_key (str): Timestamp attribute name
216
- case_id_key (str): Case ID attribute name
217
- decoration (str): Decoration type ('frequency', 'performance')
218
219
Returns:
220
HeuristicsNet: Heuristics net object
221
"""
222
```
223
224
### Advanced Discovery Methods
225
226
Discover BPMN models and transition systems for different modeling requirements.
227
228
```python { .api }
229
def discover_bpmn_inductive(log, noise_threshold=0.0, multi_processing=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):
230
"""
231
Discover BPMN model using Inductive Miner algorithm.
232
233
Parameters:
234
- log (Union[EventLog, pd.DataFrame]): Event log data
235
- noise_threshold (float): Noise threshold
236
- multi_processing (bool): Enable parallel processing
237
- activity_key (str): Activity attribute name
238
- timestamp_key (str): Timestamp attribute name
239
- case_id_key (str): Case ID attribute name
240
- disable_fallthroughs (bool): Disable fallthrough operations
241
242
Returns:
243
BPMN: BPMN model object
244
"""
245
246
def discover_transition_system(log, direction='forward', window=2, view='sequence', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
247
"""
248
Discover transition system representing state space of the process.
249
250
Parameters:
251
- log (Union[EventLog, pd.DataFrame]): Event log data
252
- direction (str): Direction of analysis ('forward', 'backward')
253
- window (int): Window size for state construction
254
- view (str): View type ('sequence', 'set')
255
- activity_key (str): Activity attribute name
256
- timestamp_key (str): Timestamp attribute name
257
- case_id_key (str): Case ID attribute name
258
259
Returns:
260
TransitionSystem: Transition system model
261
"""
262
263
def discover_prefix_tree(log, max_path_length=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
264
"""
265
Discover prefix tree/trie structure from process traces.
266
267
Parameters:
268
- log (Union[EventLog, pd.DataFrame]): Event log data
269
- max_path_length (Optional[int]): Maximum path length
270
- activity_key (str): Activity attribute name
271
- timestamp_key (str): Timestamp attribute name
272
- case_id_key (str): Case ID attribute name
273
274
Returns:
275
Trie: Prefix tree structure
276
"""
277
```
278
279
### Temporal and Constraint Discovery
280
281
Discover temporal profiles and constraint-based models for time-aware process analysis.
282
283
```python { .api }
284
def discover_temporal_profile(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
285
"""
286
Discover temporal profile showing time relationships between activities.
287
288
Parameters:
289
- log (Union[EventLog, pd.DataFrame]): Event log data
290
- activity_key (str): Activity attribute name
291
- timestamp_key (str): Timestamp attribute name
292
- case_id_key (str): Case ID attribute name
293
294
Returns:
295
Dict[Tuple[str, str], Tuple[float, float]]: Temporal constraints (min_time, max_time)
296
"""
297
```
298
299
### Declarative Discovery
300
301
Discover declarative models including log skeletons and DECLARE constraints.
302
303
```python { .api }
304
def discover_log_skeleton(log, noise_threshold=0.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
305
"""
306
Discover log skeleton constraints from event log.
307
308
Parameters:
309
- log (Union[EventLog, pd.DataFrame]): Event log data
310
- noise_threshold (float): Noise threshold for constraint filtering
311
- activity_key (str): Activity attribute name
312
- timestamp_key (str): Timestamp attribute name
313
- case_id_key (str): Case ID attribute name
314
315
Returns:
316
Dict[str, Any]: Log skeleton constraints
317
"""
318
319
def discover_declare(log, allowed_templates=None, considered_activities=None, min_support_ratio=None, min_confidence_ratio=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
320
"""
321
Discover DECLARE model with temporal logic constraints.
322
323
Parameters:
324
- log (Union[EventLog, pd.DataFrame]): Event log data
325
- allowed_templates (Optional[List]): Allowed DECLARE templates
326
- considered_activities (Optional[List]): Activities to consider
327
- min_support_ratio (Optional[float]): Minimum support ratio
328
- min_confidence_ratio (Optional[float]): Minimum confidence ratio
329
- activity_key (str): Activity attribute name
330
- timestamp_key (str): Timestamp attribute name
331
- case_id_key (str): Case ID attribute name
332
333
Returns:
334
Dict[str, Dict[Any, Dict[str, int]]]: DECLARE model constraints
335
"""
336
337
def discover_powl(log, variant=None, filtering_weight_factor=0.0, order_graph_filtering_threshold=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
338
"""
339
Discover POWL (Partially Ordered Workflow Language) model.
340
341
Parameters:
342
- log (Union[EventLog, pd.DataFrame]): Event log data
343
- variant (Optional[str]): Algorithm variant
344
- filtering_weight_factor (float): Weight factor for filtering
345
- order_graph_filtering_threshold (Optional[float]): Filtering threshold
346
- activity_key (str): Activity attribute name
347
- timestamp_key (str): Timestamp attribute name
348
- case_id_key (str): Case ID attribute name
349
350
Returns:
351
POWL: POWL model object
352
"""
353
```
354
355
### Utility Discovery Functions
356
357
Discover footprints, batches, and other analytical structures from event logs.
358
359
```python { .api }
360
def discover_footprints(*args):
361
"""
362
Discover footprints from logs or models for comparison purposes.
363
364
Parameters:
365
- *args: Variable arguments (log or model objects)
366
367
Returns:
368
Union[List[Dict[str, Any]], Dict[str, Any]]: Footprint representations
369
"""
370
371
def derive_minimum_self_distance(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
372
"""
373
Compute minimum self-distance for activities (loop detection).
374
375
Parameters:
376
- log (Union[EventLog, pd.DataFrame]): Event log data
377
- activity_key (str): Activity attribute name
378
- timestamp_key (str): Timestamp attribute name
379
- case_id_key (str): Case ID attribute name
380
381
Returns:
382
Dict[str, int]: Minimum self-distances per activity
383
"""
384
385
def discover_batches(log, merge_distance=900, min_batch_size=2, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', resource_key='org:resource'):
386
"""
387
Discover batch activities based on temporal and resource patterns.
388
389
Parameters:
390
- log (Union[EventLog, pd.DataFrame]): Event log data
391
- merge_distance (int): Maximum time distance for batching (seconds)
392
- min_batch_size (int): Minimum batch size
393
- activity_key (str): Activity attribute name
394
- timestamp_key (str): Timestamp attribute name
395
- case_id_key (str): Case ID attribute name
396
- resource_key (str): Resource attribute name
397
398
Returns:
399
List[Tuple[Tuple[str, str], int, Dict[str, Any]]]: Discovered batches
400
"""
401
402
def correlation_miner(df, annotation='frequency', activity_key='concept:name', timestamp_key='time:timestamp'):
403
"""
404
Correlation miner for logs without case IDs.
405
406
Parameters:
407
- df (pd.DataFrame): Event data without case identifiers
408
- annotation (str): Annotation type ('frequency', 'performance')
409
- activity_key (str): Activity attribute name
410
- timestamp_key (str): Timestamp attribute name
411
412
Returns:
413
Tuple[dict, dict, dict]: (dfg, start_activities, end_activities)
414
"""
415
```
416
417
## Usage Examples
418
419
### Basic Process Discovery
420
421
```python
422
import pm4py
423
424
# Load event log
425
log = pm4py.read_xes('event_log.xes')
426
427
# Discover Petri net using Inductive Miner (recommended)
428
net, initial_marking, final_marking = pm4py.discover_petri_net_inductive(log)
429
430
# Discover process tree
431
tree = pm4py.discover_process_tree_inductive(log)
432
433
# Discover DFG
434
dfg, start_activities, end_activities = pm4py.discover_dfg(log)
435
```
436
437
### Advanced Discovery with Parameters
438
439
```python
440
import pm4py
441
442
# Inductive Miner with noise handling
443
net, im, fm = pm4py.discover_petri_net_inductive(
444
log,
445
noise_threshold=0.2, # Handle 20% noise
446
multi_processing=True
447
)
448
449
# Heuristics Miner with custom thresholds
450
net, im, fm = pm4py.discover_petri_net_heuristics(
451
log,
452
dependency_threshold=0.7,
453
and_threshold=0.8,
454
loop_two_threshold=0.9
455
)
456
457
# Performance DFG with business hours
458
perf_dfg, start_acts, end_acts = pm4py.discover_performance_dfg(
459
log,
460
business_hours=True,
461
perf_aggregation_key='mean'
462
)
463
```
464
465
### Declarative Process Discovery
466
467
```python
468
import pm4py
469
470
# Discover DECLARE constraints
471
declare_model = pm4py.discover_declare(
472
log,
473
min_support_ratio=0.8,
474
min_confidence_ratio=0.9
475
)
476
477
# Discover log skeleton
478
skeleton = pm4py.discover_log_skeleton(log, noise_threshold=0.1)
479
480
# Discover POWL model
481
powl_model = pm4py.discover_powl(log, filtering_weight_factor=0.5)
482
```