0
# Filtering Operations
1
2
Comprehensive filtering capabilities for event logs and Object-Centric Event Logs (OCEL). PM4PY provides behavioral, temporal, organizational, and structural filters to preprocess data and focus analysis on specific aspects of process behavior.
3
4
## Capabilities
5
6
### Event and Case Filtering
7
8
Filter events and cases based on attribute values and occurrence patterns.
9
10
```python { .api }
11
def filter_log_relative_occurrence_event_attribute(log, min_relative_stake, attribute_key='concept:name', level='cases', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
12
"""
13
Filter by relative occurrence of event attributes.
14
15
Parameters:
16
- log (Union[EventLog, pd.DataFrame]): Event log data
17
- min_relative_stake (float): Minimum relative occurrence (0.0-1.0)
18
- attribute_key (str): Attribute to filter on
19
- level (str): Filtering level ('cases', 'events')
20
- timestamp_key (str): Timestamp attribute name
21
- case_id_key (str): Case ID attribute name
22
23
Returns:
24
Union[EventLog, pd.DataFrame]: Filtered event log
25
"""
26
27
def filter_start_activities(log, activities, retain=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
28
"""
29
Filter cases by start activities.
30
31
Parameters:
32
- log (Union[EventLog, pd.DataFrame]): Event log data
33
- activities (List[str]): List of start activities to filter
34
- retain (bool): True to keep, False to remove matching cases
35
- activity_key (str): Activity attribute name
36
- timestamp_key (str): Timestamp attribute name
37
- case_id_key (str): Case ID attribute name
38
39
Returns:
40
Union[EventLog, pd.DataFrame]: Filtered event log
41
"""
42
43
def filter_end_activities(log, activities, retain=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
44
"""
45
Filter cases by end activities.
46
47
Parameters:
48
- log (Union[EventLog, pd.DataFrame]): Event log data
49
- activities (List[str]): List of end activities to filter
50
- retain (bool): True to keep, False to remove matching cases
51
- activity_key (str): Activity attribute name
52
- timestamp_key (str): Timestamp attribute name
53
- case_id_key (str): Case ID attribute name
54
55
Returns:
56
Union[EventLog, pd.DataFrame]: Filtered event log
57
"""
58
59
def filter_event_attribute_values(log, attribute_values, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):
60
"""
61
Filter events by attribute values.
62
63
Parameters:
64
- log (Union[EventLog, pd.DataFrame]): Event log data
65
- attribute_values (List[Any]): Values to filter on
66
- activity_key (str): Activity attribute name
67
- timestamp_key (str): Timestamp attribute name
68
- case_id_key (str): Case ID attribute name
69
- retain (bool): True to keep, False to remove matching events
70
71
Returns:
72
Union[EventLog, pd.DataFrame]: Filtered event log
73
"""
74
75
def filter_trace_attribute_values(log, attribute_values, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):
76
"""
77
Filter traces by attribute values.
78
79
Parameters:
80
- log (Union[EventLog, pd.DataFrame]): Event log data
81
- attribute_values (List[Any]): Values to filter on
82
- activity_key (str): Activity attribute name
83
- timestamp_key (str): Timestamp attribute name
84
- case_id_key (str): Case ID attribute name
85
- retain (bool): True to keep, False to remove matching traces
86
87
Returns:
88
Union[EventLog, pd.DataFrame]: Filtered event log
89
"""
90
```
91
92
### Behavioral Filtering
93
94
Filter based on process behavior patterns including variants and activity relationships.
95
96
```python { .api }
97
def filter_variants(log, variants, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):
98
"""
99
Filter by trace variants (activity sequences).
100
101
Parameters:
102
- log (Union[EventLog, pd.DataFrame]): Event log data
103
- variants (List[Tuple[str, ...]]): List of variants to filter
104
- activity_key (str): Activity attribute name
105
- timestamp_key (str): Timestamp attribute name
106
- case_id_key (str): Case ID attribute name
107
- retain (bool): True to keep, False to remove matching variants
108
109
Returns:
110
Union[EventLog, pd.DataFrame]: Filtered event log
111
"""
112
113
def filter_variants_by_coverage_percentage(log, percentage, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
114
"""
115
Keep variants that cover specified percentage of cases.
116
117
Parameters:
118
- log (Union[EventLog, pd.DataFrame]): Event log data
119
- percentage (float): Coverage percentage (0.0-1.0)
120
- activity_key (str): Activity attribute name
121
- timestamp_key (str): Timestamp attribute name
122
- case_id_key (str): Case ID attribute name
123
124
Returns:
125
Union[EventLog, pd.DataFrame]: Filtered event log
126
"""
127
128
def filter_variants_top_k(log, k, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
129
"""
130
Keep top-k most frequent variants.
131
132
Parameters:
133
- log (Union[EventLog, pd.DataFrame]): Event log data
134
- k (int): Number of top variants to keep
135
- activity_key (str): Activity attribute name
136
- timestamp_key (str): Timestamp attribute name
137
- case_id_key (str): Case ID attribute name
138
139
Returns:
140
Union[EventLog, pd.DataFrame]: Filtered event log
141
"""
142
143
def filter_directly_follows_relation(log, relations, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):
144
"""
145
Filter by directly-follows relations between activities.
146
147
Parameters:
148
- log (Union[EventLog, pd.DataFrame]): Event log data
149
- relations (List[Tuple[str, str]]): List of directly-follows relations
150
- activity_key (str): Activity attribute name
151
- timestamp_key (str): Timestamp attribute name
152
- case_id_key (str): Case ID attribute name
153
- retain (bool): True to keep, False to remove cases with relations
154
155
Returns:
156
Union[EventLog, pd.DataFrame]: Filtered event log
157
"""
158
159
def filter_eventually_follows_relation(log, relations, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):
160
"""
161
Filter by eventually-follows relations between activities.
162
163
Parameters:
164
- log (Union[EventLog, pd.DataFrame]): Event log data
165
- relations (List[Tuple[str, str]]): List of eventually-follows relations
166
- activity_key (str): Activity attribute name
167
- timestamp_key (str): Timestamp attribute name
168
- case_id_key (str): Case ID attribute name
169
- retain (bool): True to keep, False to remove cases with relations
170
171
Returns:
172
Union[EventLog, pd.DataFrame]: Filtered event log
173
"""
174
```
175
176
### Time-Based Filtering
177
178
Filter events and cases based on temporal criteria and performance metrics.
179
180
```python { .api }
181
def filter_time_range(log, dt1, dt2, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
182
"""
183
Filter events within specific time range.
184
185
Parameters:
186
- log (Union[EventLog, pd.DataFrame]): Event log data
187
- dt1 (datetime): Start of time range
188
- dt2 (datetime): End of time range
189
- activity_key (str): Activity attribute name
190
- timestamp_key (str): Timestamp attribute name
191
- case_id_key (str): Case ID attribute name
192
193
Returns:
194
Union[EventLog, pd.DataFrame]: Filtered event log
195
"""
196
197
def filter_case_performance(log, min_performance, max_performance, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
198
"""
199
Filter cases by performance (duration) thresholds.
200
201
Parameters:
202
- log (Union[EventLog, pd.DataFrame]): Event log data
203
- min_performance (float): Minimum case duration (seconds)
204
- max_performance (float): Maximum case duration (seconds)
205
- activity_key (str): Activity attribute name
206
- timestamp_key (str): Timestamp attribute name
207
- case_id_key (str): Case ID attribute name
208
209
Returns:
210
Union[EventLog, pd.DataFrame]: Filtered event log
211
"""
212
```
213
214
### Structural Filtering
215
216
Filter based on structural properties like case size and activity patterns.
217
218
```python { .api }
219
def filter_case_size(log, min_size, max_size, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
220
"""
221
Filter cases by number of events (case size).
222
223
Parameters:
224
- log (Union[EventLog, pd.DataFrame]): Event log data
225
- min_size (int): Minimum number of events per case
226
- max_size (int): Maximum number of events per case
227
- activity_key (str): Activity attribute name
228
- timestamp_key (str): Timestamp attribute name
229
- case_id_key (str): Case ID attribute name
230
231
Returns:
232
Union[EventLog, pd.DataFrame]: Filtered event log
233
"""
234
235
def filter_between(log, activity1, activity2, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
236
"""
237
Filter events that occur between two specific activities.
238
239
Parameters:
240
- log (Union[EventLog, pd.DataFrame]): Event log data
241
- activity1 (str): First activity (start marker)
242
- activity2 (str): Second activity (end marker)
243
- activity_key (str): Activity attribute name
244
- timestamp_key (str): Timestamp attribute name
245
- case_id_key (str): Case ID attribute name
246
247
Returns:
248
Union[EventLog, pd.DataFrame]: Filtered event log
249
"""
250
251
def filter_activities_rework(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', min_occurrences=2):
252
"""
253
Filter cases with activity rework (repeated activities).
254
255
Parameters:
256
- log (Union[EventLog, pd.DataFrame]): Event log data
257
- activity_key (str): Activity attribute name
258
- timestamp_key (str): Timestamp attribute name
259
- case_id_key (str): Case ID attribute name
260
- min_occurrences (int): Minimum occurrences to consider as rework
261
262
Returns:
263
Union[EventLog, pd.DataFrame]: Filtered event log
264
"""
265
266
def filter_paths_performance(log, paths, min_performance, max_performance, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
267
"""
268
Filter by performance of specific activity paths.
269
270
Parameters:
271
- log (Union[EventLog, pd.DataFrame]): Event log data
272
- paths (List[Tuple[str, str]]): Activity paths to measure
273
- min_performance (float): Minimum path performance (seconds)
274
- max_performance (float): Maximum path performance (seconds)
275
- activity_key (str): Activity attribute name
276
- timestamp_key (str): Timestamp attribute name
277
- case_id_key (str): Case ID attribute name
278
279
Returns:
280
Union[EventLog, pd.DataFrame]: Filtered event log
281
"""
282
```
283
284
### Trace Segment Filtering
285
286
Extract specific segments of traces for focused analysis.
287
288
```python { .api }
289
def filter_prefixes(log, length, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
290
"""
291
Extract trace prefixes of specified length.
292
293
Parameters:
294
- log (Union[EventLog, pd.DataFrame]): Event log data
295
- length (int): Length of prefixes to extract
296
- activity_key (str): Activity attribute name
297
- timestamp_key (str): Timestamp attribute name
298
- case_id_key (str): Case ID attribute name
299
300
Returns:
301
Union[EventLog, pd.DataFrame]: Filtered event log with prefixes
302
"""
303
304
def filter_suffixes(log, length, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
305
"""
306
Extract trace suffixes of specified length.
307
308
Parameters:
309
- log (Union[EventLog, pd.DataFrame]): Event log data
310
- length (int): Length of suffixes to extract
311
- activity_key (str): Activity attribute name
312
- timestamp_key (str): Timestamp attribute name
313
- case_id_key (str): Case ID attribute name
314
315
Returns:
316
Union[EventLog, pd.DataFrame]: Filtered event log with suffixes
317
"""
318
319
def filter_trace_segments(log, min_prefix_length, max_prefix_length, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
320
"""
321
Extract trace segments between specified lengths.
322
323
Parameters:
324
- log (Union[EventLog, pd.DataFrame]): Event log data
325
- min_prefix_length (int): Minimum prefix length
326
- max_prefix_length (int): Maximum prefix length
327
- activity_key (str): Activity attribute name
328
- timestamp_key (str): Timestamp attribute name
329
- case_id_key (str): Case ID attribute name
330
331
Returns:
332
Union[EventLog, pd.DataFrame]: Filtered event log with segments
333
"""
334
```
335
336
### Organizational Filtering
337
338
Filter based on organizational patterns and resource behavior.
339
340
```python { .api }
341
def filter_four_eyes_principle(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', resource_key='org:resource'):
342
"""
343
Filter cases violating four-eyes principle (same resource performing critical activities).
344
345
Parameters:
346
- log (Union[EventLog, pd.DataFrame]): Event log data
347
- activity_key (str): Activity attribute name
348
- timestamp_key (str): Timestamp attribute name
349
- case_id_key (str): Case ID attribute name
350
- resource_key (str): Resource attribute name
351
352
Returns:
353
Union[EventLog, pd.DataFrame]: Filtered event log
354
"""
355
356
def filter_activity_done_different_resources(log, activity, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', resource_key='org:resource'):
357
"""
358
Filter cases where specified activity is performed by different resources.
359
360
Parameters:
361
- log (Union[EventLog, pd.DataFrame]): Event log data
362
- activity (str): Activity to check for resource diversity
363
- activity_key (str): Activity attribute name
364
- timestamp_key (str): Timestamp attribute name
365
- case_id_key (str): Case ID attribute name
366
- resource_key (str): Resource attribute name
367
368
Returns:
369
Union[EventLog, pd.DataFrame]: Filtered event log
370
"""
371
```
372
373
### OCEL Filtering
374
375
Specialized filtering operations for Object-Centric Event Logs.
376
377
```python { .api }
378
def filter_ocel_event_attribute(ocel, attribute_key, attribute_values):
379
"""
380
Filter OCEL events by attribute values.
381
382
Parameters:
383
- ocel (OCEL): Object-centric event log
384
- attribute_key (str): Event attribute to filter on
385
- attribute_values (List[Any]): Values to retain
386
387
Returns:
388
OCEL: Filtered object-centric event log
389
"""
390
391
def filter_ocel_object_attribute(ocel, attribute_key, attribute_values):
392
"""
393
Filter OCEL objects by attribute values.
394
395
Parameters:
396
- ocel (OCEL): Object-centric event log
397
- attribute_key (str): Object attribute to filter on
398
- attribute_values (List[Any]): Values to retain
399
400
Returns:
401
OCEL: Filtered object-centric event log
402
"""
403
404
def filter_ocel_object_types_allowed_activities(ocel, object_types_allowed_activities):
405
"""
406
Filter OCEL by allowed activities per object type.
407
408
Parameters:
409
- ocel (OCEL): Object-centric event log
410
- object_types_allowed_activities (Dict[str, List[str]]): Allowed activities per object type
411
412
Returns:
413
OCEL: Filtered object-centric event log
414
"""
415
416
def filter_ocel_object_per_type_count(ocel, object_type_count):
417
"""
418
Filter OCEL by object count per type.
419
420
Parameters:
421
- ocel (OCEL): Object-centric event log
422
- object_type_count (Dict[str, Tuple[int, int]]): Min/max object counts per type
423
424
Returns:
425
OCEL: Filtered object-centric event log
426
"""
427
428
def filter_ocel_start_events_per_object_type(ocel, start_events):
429
"""
430
Filter OCEL by start events per object type.
431
432
Parameters:
433
- ocel (OCEL): Object-centric event log
434
- start_events (Dict[str, List[str]]): Start events per object type
435
436
Returns:
437
OCEL: Filtered object-centric event log
438
"""
439
440
def filter_ocel_end_events_per_object_type(ocel, end_events):
441
"""
442
Filter OCEL by end events per object type.
443
444
Parameters:
445
- ocel (OCEL): Object-centric event log
446
- end_events (Dict[str, List[str]]): End events per object type
447
448
Returns:
449
OCEL: Filtered object-centric event log
450
"""
451
452
def filter_ocel_events_timestamp(ocel, timestamp_from, timestamp_to):
453
"""
454
Filter OCEL events by timestamp range.
455
456
Parameters:
457
- ocel (OCEL): Object-centric event log
458
- timestamp_from (datetime): Start timestamp
459
- timestamp_to (datetime): End timestamp
460
461
Returns:
462
OCEL: Filtered object-centric event log
463
"""
464
465
def filter_ocel_events(ocel, event_ids):
466
"""
467
Filter OCEL by specific event IDs.
468
469
Parameters:
470
- ocel (OCEL): Object-centric event log
471
- event_ids (List[str]): Event IDs to retain
472
473
Returns:
474
OCEL: Filtered object-centric event log
475
"""
476
477
def filter_ocel_objects(ocel, object_ids):
478
"""
479
Filter OCEL by specific object IDs.
480
481
Parameters:
482
- ocel (OCEL): Object-centric event log
483
- object_ids (List[str]): Object IDs to retain
484
485
Returns:
486
OCEL: Filtered object-centric event log
487
"""
488
489
def filter_ocel_object_types(ocel, object_types):
490
"""
491
Filter OCEL by object types.
492
493
Parameters:
494
- ocel (OCEL): Object-centric event log
495
- object_types (List[str]): Object types to retain
496
497
Returns:
498
OCEL: Filtered object-centric event log
499
"""
500
```
501
502
### OCEL Connected Component Filtering
503
504
Filter OCEL based on connected component analysis.
505
506
```python { .api }
507
def filter_ocel_cc_object(ocel, object_id):
508
"""
509
Filter OCEL by connected component containing specific object.
510
511
Parameters:
512
- ocel (OCEL): Object-centric event log
513
- object_id (str): Object ID to find connected component for
514
515
Returns:
516
OCEL: Filtered object-centric event log
517
"""
518
519
def filter_ocel_cc_length(ocel, min_length, max_length):
520
"""
521
Filter OCEL by connected component length.
522
523
Parameters:
524
- ocel (OCEL): Object-centric event log
525
- min_length (int): Minimum component length
526
- max_length (int): Maximum component length
527
528
Returns:
529
OCEL: Filtered object-centric event log
530
"""
531
532
def filter_ocel_cc_otype(ocel, object_type):
533
"""
534
Filter OCEL by connected components containing specific object type.
535
536
Parameters:
537
- ocel (OCEL): Object-centric event log
538
- object_type (str): Object type to filter by
539
540
Returns:
541
OCEL: Filtered object-centric event log
542
"""
543
544
def filter_ocel_cc_activity(ocel, activity):
545
"""
546
Filter OCEL by connected components containing specific activity.
547
548
Parameters:
549
- ocel (OCEL): Object-centric event log
550
- activity (str): Activity to filter by
551
552
Returns:
553
OCEL: Filtered object-centric event log
554
"""
555
556
def filter_ocel_activities_connected_object_type(ocel, object_type):
557
"""
558
Filter OCEL activities connected to specific object type.
559
560
Parameters:
561
- ocel (OCEL): Object-centric event log
562
- object_type (str): Object type to filter activities for
563
564
Returns:
565
OCEL: Filtered object-centric event log
566
"""
567
```
568
569
### DFG Filtering
570
571
Filter Directly-Follows Graphs based on activity and path frequencies.
572
573
```python { .api }
574
def filter_dfg_activities_percentage(dfg, start_activities, end_activities, percentage):
575
"""
576
Filter DFG by activity percentage threshold.
577
578
Parameters:
579
- dfg (dict): Directly-follows graph
580
- start_activities (dict): Start activities and frequencies
581
- end_activities (dict): End activities and frequencies
582
- percentage (float): Percentage threshold (0.0-1.0)
583
584
Returns:
585
Tuple[dict, dict, dict]: Filtered (dfg, start_activities, end_activities)
586
"""
587
588
def filter_dfg_paths_percentage(dfg, start_activities, end_activities, percentage):
589
"""
590
Filter DFG by path percentage threshold.
591
592
Parameters:
593
- dfg (dict): Directly-follows graph
594
- start_activities (dict): Start activities and frequencies
595
- end_activities (dict): End activities and frequencies
596
- percentage (float): Percentage threshold (0.0-1.0)
597
598
Returns:
599
Tuple[dict, dict, dict]: Filtered (dfg, start_activities, end_activities)
600
"""
601
```
602
603
## Usage Examples
604
605
### Basic Filtering Operations
606
607
```python
608
import pm4py
609
610
# Load event log
611
log = pm4py.read_xes('event_log.xes')
612
613
# Keep only top 10 most frequent variants
614
filtered_log = pm4py.filter_variants_top_k(log, 10)
615
616
# Filter by start activities
617
filtered_log = pm4py.filter_start_activities(log, ['Start Process', 'Initialize'])
618
619
# Filter by case performance (duration between 1 hour and 1 week)
620
filtered_log = pm4py.filter_case_performance(log, 3600, 604800)
621
622
# Filter by case size (between 5 and 50 events)
623
filtered_log = pm4py.filter_case_size(log, 5, 50)
624
```
625
626
### Advanced Behavioral Filtering
627
628
```python
629
import pm4py
630
631
# Filter cases containing specific directly-follows relations
632
relations = [('Submit Application', 'Review Application'),
633
('Review Application', 'Make Decision')]
634
filtered_log = pm4py.filter_directly_follows_relation(log, relations, retain=True)
635
636
# Keep variants covering 80% of cases
637
filtered_log = pm4py.filter_variants_by_coverage_percentage(log, 0.8)
638
639
# Filter cases with rework (activities occurring more than once)
640
rework_log = pm4py.filter_activities_rework(log, min_occurrences=2)
641
```
642
643
### Time-Based Filtering
644
645
```python
646
import pm4py
647
from datetime import datetime
648
649
# Filter events within specific time range
650
start_date = datetime(2023, 1, 1)
651
end_date = datetime(2023, 12, 31)
652
time_filtered_log = pm4py.filter_time_range(log, start_date, end_date)
653
654
# Filter by path performance
655
paths = [('Submit', 'Approve'), ('Review', 'Decision')]
656
perf_filtered_log = pm4py.filter_paths_performance(
657
log, paths,
658
min_performance=3600, # 1 hour minimum
659
max_performance=86400 # 1 day maximum
660
)
661
```
662
663
### Trace Segment Analysis
664
665
```python
666
import pm4py
667
668
# Extract prefixes of length 5 for predictive modeling
669
prefixes = pm4py.filter_prefixes(log, 5)
670
671
# Extract suffixes of length 3
672
suffixes = pm4py.filter_suffixes(log, 3)
673
674
# Extract segments between positions 2 and 8
675
segments = pm4py.filter_trace_segments(log, 2, 8)
676
```
677
678
### Organizational Filtering
679
680
```python
681
import pm4py
682
683
# Filter cases violating four-eyes principle
684
violations = pm4py.filter_four_eyes_principle(log)
685
686
# Filter cases where 'Approval' activity is done by different resources
687
diverse_approval = pm4py.filter_activity_done_different_resources(log, 'Approval')
688
```
689
690
### OCEL Filtering
691
692
```python
693
import pm4py
694
695
# Load OCEL
696
ocel = pm4py.read_ocel('ocel_data.csv')
697
698
# Filter by object types
699
filtered_ocel = pm4py.filter_ocel_object_types(ocel, ['Order', 'Invoice'])
700
701
# Filter by timestamp range
702
from datetime import datetime
703
start_time = datetime(2023, 1, 1)
704
end_time = datetime(2023, 6, 30)
705
time_filtered_ocel = pm4py.filter_ocel_events_timestamp(ocel, start_time, end_time)
706
707
# Filter by connected component length
708
cc_filtered_ocel = pm4py.filter_ocel_cc_length(ocel, min_length=10, max_length=100)
709
710
# Filter by object type constraints
711
constraints = {
712
'Order': ['Create Order', 'Process Payment', 'Ship Order'],
713
'Product': ['Add to Cart', 'Remove from Cart', 'Purchase']
714
}
715
constrained_ocel = pm4py.filter_ocel_object_types_allowed_activities(ocel, constraints)
716
```
717
718
### Combining Multiple Filters
719
720
```python
721
import pm4py
722
723
def create_analysis_subset(log):
724
"""Create a focused subset for detailed analysis."""
725
726
# Start with top variants covering 90% of cases
727
filtered_log = pm4py.filter_variants_by_coverage_percentage(log, 0.9)
728
729
# Remove very short and very long cases
730
filtered_log = pm4py.filter_case_size(filtered_log, 3, 30)
731
732
# Filter by reasonable case duration (1 hour to 30 days)
733
filtered_log = pm4py.filter_case_performance(filtered_log, 3600, 2592000)
734
735
# Keep only cases starting with specific activities
736
start_activities = ['Register', 'Submit Application', 'Create Order']
737
filtered_log = pm4py.filter_start_activities(filtered_log, start_activities)
738
739
return filtered_log
740
741
analysis_log = create_analysis_subset(log)
742
print(f"Original log: {len(log)} cases")
743
print(f"Filtered log: {len(analysis_log)} cases")
744
```
745
746
### DFG Filtering
747
748
```python
749
import pm4py
750
751
# Discover DFG
752
dfg, start_activities, end_activities = pm4py.discover_dfg(log)
753
754
# Filter DFG to keep top 80% of activities by frequency
755
filtered_dfg, filtered_start, filtered_end = pm4py.filter_dfg_activities_percentage(
756
dfg, start_activities, end_activities, 0.8
757
)
758
759
# Filter DFG to keep top 90% of paths by frequency
760
path_filtered_dfg, path_start, path_end = pm4py.filter_dfg_paths_percentage(
761
dfg, start_activities, end_activities, 0.9
762
)
763
```