or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

conformance-checking.mdfiltering.mdindex.mdml-organizational.mdobject-centric.mdprocess-discovery.mdreading-writing.mdstatistics-analysis.mdutilities-conversion.mdvisualization.md

filtering.mddocs/

0

# Filtering Operations

1

2

Comprehensive filtering capabilities for event logs and Object-Centric Event Logs (OCEL). PM4PY provides behavioral, temporal, organizational, and structural filters to preprocess data and focus analysis on specific aspects of process behavior.

3

4

## Capabilities

5

6

### Event and Case Filtering

7

8

Filter events and cases based on attribute values and occurrence patterns.

9

10

```python { .api }

11

def filter_log_relative_occurrence_event_attribute(log, min_relative_stake, attribute_key='concept:name', level='cases', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

12

"""

13

Filter by relative occurrence of event attributes.

14

15

Parameters:

16

- log (Union[EventLog, pd.DataFrame]): Event log data

17

- min_relative_stake (float): Minimum relative occurrence (0.0-1.0)

18

- attribute_key (str): Attribute to filter on

19

- level (str): Filtering level ('cases', 'events')

20

- timestamp_key (str): Timestamp attribute name

21

- case_id_key (str): Case ID attribute name

22

23

Returns:

24

Union[EventLog, pd.DataFrame]: Filtered event log

25

"""

26

27

def filter_start_activities(log, activities, retain=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

28

"""

29

Filter cases by start activities.

30

31

Parameters:

32

- log (Union[EventLog, pd.DataFrame]): Event log data

33

- activities (List[str]): List of start activities to filter

34

- retain (bool): True to keep, False to remove matching cases

35

- activity_key (str): Activity attribute name

36

- timestamp_key (str): Timestamp attribute name

37

- case_id_key (str): Case ID attribute name

38

39

Returns:

40

Union[EventLog, pd.DataFrame]: Filtered event log

41

"""

42

43

def filter_end_activities(log, activities, retain=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

44

"""

45

Filter cases by end activities.

46

47

Parameters:

48

- log (Union[EventLog, pd.DataFrame]): Event log data

49

- activities (List[str]): List of end activities to filter

50

- retain (bool): True to keep, False to remove matching cases

51

- activity_key (str): Activity attribute name

52

- timestamp_key (str): Timestamp attribute name

53

- case_id_key (str): Case ID attribute name

54

55

Returns:

56

Union[EventLog, pd.DataFrame]: Filtered event log

57

"""

58

59

def filter_event_attribute_values(log, attribute_values, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):

60

"""

61

Filter events by attribute values.

62

63

Parameters:

64

- log (Union[EventLog, pd.DataFrame]): Event log data

65

- attribute_values (List[Any]): Values to filter on

66

- activity_key (str): Activity attribute name

67

- timestamp_key (str): Timestamp attribute name

68

- case_id_key (str): Case ID attribute name

69

- retain (bool): True to keep, False to remove matching events

70

71

Returns:

72

Union[EventLog, pd.DataFrame]: Filtered event log

73

"""

74

75

def filter_trace_attribute_values(log, attribute_values, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):

76

"""

77

Filter traces by attribute values.

78

79

Parameters:

80

- log (Union[EventLog, pd.DataFrame]): Event log data

81

- attribute_values (List[Any]): Values to filter on

82

- activity_key (str): Activity attribute name

83

- timestamp_key (str): Timestamp attribute name

84

- case_id_key (str): Case ID attribute name

85

- retain (bool): True to keep, False to remove matching traces

86

87

Returns:

88

Union[EventLog, pd.DataFrame]: Filtered event log

89

"""

90

```

91

92

### Behavioral Filtering

93

94

Filter based on process behavior patterns including variants and activity relationships.

95

96

```python { .api }

97

def filter_variants(log, variants, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):

98

"""

99

Filter by trace variants (activity sequences).

100

101

Parameters:

102

- log (Union[EventLog, pd.DataFrame]): Event log data

103

- variants (List[Tuple[str, ...]]): List of variants to filter

104

- activity_key (str): Activity attribute name

105

- timestamp_key (str): Timestamp attribute name

106

- case_id_key (str): Case ID attribute name

107

- retain (bool): True to keep, False to remove matching variants

108

109

Returns:

110

Union[EventLog, pd.DataFrame]: Filtered event log

111

"""

112

113

def filter_variants_by_coverage_percentage(log, percentage, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

114

"""

115

Keep variants that cover specified percentage of cases.

116

117

Parameters:

118

- log (Union[EventLog, pd.DataFrame]): Event log data

119

- percentage (float): Coverage percentage (0.0-1.0)

120

- activity_key (str): Activity attribute name

121

- timestamp_key (str): Timestamp attribute name

122

- case_id_key (str): Case ID attribute name

123

124

Returns:

125

Union[EventLog, pd.DataFrame]: Filtered event log

126

"""

127

128

def filter_variants_top_k(log, k, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

129

"""

130

Keep top-k most frequent variants.

131

132

Parameters:

133

- log (Union[EventLog, pd.DataFrame]): Event log data

134

- k (int): Number of top variants to keep

135

- activity_key (str): Activity attribute name

136

- timestamp_key (str): Timestamp attribute name

137

- case_id_key (str): Case ID attribute name

138

139

Returns:

140

Union[EventLog, pd.DataFrame]: Filtered event log

141

"""

142

143

def filter_directly_follows_relation(log, relations, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):

144

"""

145

Filter by directly-follows relations between activities.

146

147

Parameters:

148

- log (Union[EventLog, pd.DataFrame]): Event log data

149

- relations (List[Tuple[str, str]]): List of directly-follows relations

150

- activity_key (str): Activity attribute name

151

- timestamp_key (str): Timestamp attribute name

152

- case_id_key (str): Case ID attribute name

153

- retain (bool): True to keep, False to remove cases with relations

154

155

Returns:

156

Union[EventLog, pd.DataFrame]: Filtered event log

157

"""

158

159

def filter_eventually_follows_relation(log, relations, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', retain=True):

160

"""

161

Filter by eventually-follows relations between activities.

162

163

Parameters:

164

- log (Union[EventLog, pd.DataFrame]): Event log data

165

- relations (List[Tuple[str, str]]): List of eventually-follows relations

166

- activity_key (str): Activity attribute name

167

- timestamp_key (str): Timestamp attribute name

168

- case_id_key (str): Case ID attribute name

169

- retain (bool): True to keep, False to remove cases with relations

170

171

Returns:

172

Union[EventLog, pd.DataFrame]: Filtered event log

173

"""

174

```

175

176

### Time-Based Filtering

177

178

Filter events and cases based on temporal criteria and performance metrics.

179

180

```python { .api }

181

def filter_time_range(log, dt1, dt2, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

182

"""

183

Filter events within specific time range.

184

185

Parameters:

186

- log (Union[EventLog, pd.DataFrame]): Event log data

187

- dt1 (datetime): Start of time range

188

- dt2 (datetime): End of time range

189

- activity_key (str): Activity attribute name

190

- timestamp_key (str): Timestamp attribute name

191

- case_id_key (str): Case ID attribute name

192

193

Returns:

194

Union[EventLog, pd.DataFrame]: Filtered event log

195

"""

196

197

def filter_case_performance(log, min_performance, max_performance, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

198

"""

199

Filter cases by performance (duration) thresholds.

200

201

Parameters:

202

- log (Union[EventLog, pd.DataFrame]): Event log data

203

- min_performance (float): Minimum case duration (seconds)

204

- max_performance (float): Maximum case duration (seconds)

205

- activity_key (str): Activity attribute name

206

- timestamp_key (str): Timestamp attribute name

207

- case_id_key (str): Case ID attribute name

208

209

Returns:

210

Union[EventLog, pd.DataFrame]: Filtered event log

211

"""

212

```

213

214

### Structural Filtering

215

216

Filter based on structural properties like case size and activity patterns.

217

218

```python { .api }

219

def filter_case_size(log, min_size, max_size, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

220

"""

221

Filter cases by number of events (case size).

222

223

Parameters:

224

- log (Union[EventLog, pd.DataFrame]): Event log data

225

- min_size (int): Minimum number of events per case

226

- max_size (int): Maximum number of events per case

227

- activity_key (str): Activity attribute name

228

- timestamp_key (str): Timestamp attribute name

229

- case_id_key (str): Case ID attribute name

230

231

Returns:

232

Union[EventLog, pd.DataFrame]: Filtered event log

233

"""

234

235

def filter_between(log, activity1, activity2, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

236

"""

237

Filter events that occur between two specific activities.

238

239

Parameters:

240

- log (Union[EventLog, pd.DataFrame]): Event log data

241

- activity1 (str): First activity (start marker)

242

- activity2 (str): Second activity (end marker)

243

- activity_key (str): Activity attribute name

244

- timestamp_key (str): Timestamp attribute name

245

- case_id_key (str): Case ID attribute name

246

247

Returns:

248

Union[EventLog, pd.DataFrame]: Filtered event log

249

"""

250

251

def filter_activities_rework(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', min_occurrences=2):

252

"""

253

Filter cases with activity rework (repeated activities).

254

255

Parameters:

256

- log (Union[EventLog, pd.DataFrame]): Event log data

257

- activity_key (str): Activity attribute name

258

- timestamp_key (str): Timestamp attribute name

259

- case_id_key (str): Case ID attribute name

260

- min_occurrences (int): Minimum occurrences to consider as rework

261

262

Returns:

263

Union[EventLog, pd.DataFrame]: Filtered event log

264

"""

265

266

def filter_paths_performance(log, paths, min_performance, max_performance, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

267

"""

268

Filter by performance of specific activity paths.

269

270

Parameters:

271

- log (Union[EventLog, pd.DataFrame]): Event log data

272

- paths (List[Tuple[str, str]]): Activity paths to measure

273

- min_performance (float): Minimum path performance (seconds)

274

- max_performance (float): Maximum path performance (seconds)

275

- activity_key (str): Activity attribute name

276

- timestamp_key (str): Timestamp attribute name

277

- case_id_key (str): Case ID attribute name

278

279

Returns:

280

Union[EventLog, pd.DataFrame]: Filtered event log

281

"""

282

```

283

284

### Trace Segment Filtering

285

286

Extract specific segments of traces for focused analysis.

287

288

```python { .api }

289

def filter_prefixes(log, length, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

290

"""

291

Extract trace prefixes of specified length.

292

293

Parameters:

294

- log (Union[EventLog, pd.DataFrame]): Event log data

295

- length (int): Length of prefixes to extract

296

- activity_key (str): Activity attribute name

297

- timestamp_key (str): Timestamp attribute name

298

- case_id_key (str): Case ID attribute name

299

300

Returns:

301

Union[EventLog, pd.DataFrame]: Filtered event log with prefixes

302

"""

303

304

def filter_suffixes(log, length, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

305

"""

306

Extract trace suffixes of specified length.

307

308

Parameters:

309

- log (Union[EventLog, pd.DataFrame]): Event log data

310

- length (int): Length of suffixes to extract

311

- activity_key (str): Activity attribute name

312

- timestamp_key (str): Timestamp attribute name

313

- case_id_key (str): Case ID attribute name

314

315

Returns:

316

Union[EventLog, pd.DataFrame]: Filtered event log with suffixes

317

"""

318

319

def filter_trace_segments(log, min_prefix_length, max_prefix_length, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

320

"""

321

Extract trace segments between specified lengths.

322

323

Parameters:

324

- log (Union[EventLog, pd.DataFrame]): Event log data

325

- min_prefix_length (int): Minimum prefix length

326

- max_prefix_length (int): Maximum prefix length

327

- activity_key (str): Activity attribute name

328

- timestamp_key (str): Timestamp attribute name

329

- case_id_key (str): Case ID attribute name

330

331

Returns:

332

Union[EventLog, pd.DataFrame]: Filtered event log with segments

333

"""

334

```

335

336

### Organizational Filtering

337

338

Filter based on organizational patterns and resource behavior.

339

340

```python { .api }

341

def filter_four_eyes_principle(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', resource_key='org:resource'):

342

"""

343

Filter cases violating four-eyes principle (same resource performing critical activities).

344

345

Parameters:

346

- log (Union[EventLog, pd.DataFrame]): Event log data

347

- activity_key (str): Activity attribute name

348

- timestamp_key (str): Timestamp attribute name

349

- case_id_key (str): Case ID attribute name

350

- resource_key (str): Resource attribute name

351

352

Returns:

353

Union[EventLog, pd.DataFrame]: Filtered event log

354

"""

355

356

def filter_activity_done_different_resources(log, activity, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', resource_key='org:resource'):

357

"""

358

Filter cases where specified activity is performed by different resources.

359

360

Parameters:

361

- log (Union[EventLog, pd.DataFrame]): Event log data

362

- activity (str): Activity to check for resource diversity

363

- activity_key (str): Activity attribute name

364

- timestamp_key (str): Timestamp attribute name

365

- case_id_key (str): Case ID attribute name

366

- resource_key (str): Resource attribute name

367

368

Returns:

369

Union[EventLog, pd.DataFrame]: Filtered event log

370

"""

371

```

372

373

### OCEL Filtering

374

375

Specialized filtering operations for Object-Centric Event Logs.

376

377

```python { .api }

378

def filter_ocel_event_attribute(ocel, attribute_key, attribute_values):

379

"""

380

Filter OCEL events by attribute values.

381

382

Parameters:

383

- ocel (OCEL): Object-centric event log

384

- attribute_key (str): Event attribute to filter on

385

- attribute_values (List[Any]): Values to retain

386

387

Returns:

388

OCEL: Filtered object-centric event log

389

"""

390

391

def filter_ocel_object_attribute(ocel, attribute_key, attribute_values):

392

"""

393

Filter OCEL objects by attribute values.

394

395

Parameters:

396

- ocel (OCEL): Object-centric event log

397

- attribute_key (str): Object attribute to filter on

398

- attribute_values (List[Any]): Values to retain

399

400

Returns:

401

OCEL: Filtered object-centric event log

402

"""

403

404

def filter_ocel_object_types_allowed_activities(ocel, object_types_allowed_activities):

405

"""

406

Filter OCEL by allowed activities per object type.

407

408

Parameters:

409

- ocel (OCEL): Object-centric event log

410

- object_types_allowed_activities (Dict[str, List[str]]): Allowed activities per object type

411

412

Returns:

413

OCEL: Filtered object-centric event log

414

"""

415

416

def filter_ocel_object_per_type_count(ocel, object_type_count):

417

"""

418

Filter OCEL by object count per type.

419

420

Parameters:

421

- ocel (OCEL): Object-centric event log

422

- object_type_count (Dict[str, Tuple[int, int]]): Min/max object counts per type

423

424

Returns:

425

OCEL: Filtered object-centric event log

426

"""

427

428

def filter_ocel_start_events_per_object_type(ocel, start_events):

429

"""

430

Filter OCEL by start events per object type.

431

432

Parameters:

433

- ocel (OCEL): Object-centric event log

434

- start_events (Dict[str, List[str]]): Start events per object type

435

436

Returns:

437

OCEL: Filtered object-centric event log

438

"""

439

440

def filter_ocel_end_events_per_object_type(ocel, end_events):

441

"""

442

Filter OCEL by end events per object type.

443

444

Parameters:

445

- ocel (OCEL): Object-centric event log

446

- end_events (Dict[str, List[str]]): End events per object type

447

448

Returns:

449

OCEL: Filtered object-centric event log

450

"""

451

452

def filter_ocel_events_timestamp(ocel, timestamp_from, timestamp_to):

453

"""

454

Filter OCEL events by timestamp range.

455

456

Parameters:

457

- ocel (OCEL): Object-centric event log

458

- timestamp_from (datetime): Start timestamp

459

- timestamp_to (datetime): End timestamp

460

461

Returns:

462

OCEL: Filtered object-centric event log

463

"""

464

465

def filter_ocel_events(ocel, event_ids):

466

"""

467

Filter OCEL by specific event IDs.

468

469

Parameters:

470

- ocel (OCEL): Object-centric event log

471

- event_ids (List[str]): Event IDs to retain

472

473

Returns:

474

OCEL: Filtered object-centric event log

475

"""

476

477

def filter_ocel_objects(ocel, object_ids):

478

"""

479

Filter OCEL by specific object IDs.

480

481

Parameters:

482

- ocel (OCEL): Object-centric event log

483

- object_ids (List[str]): Object IDs to retain

484

485

Returns:

486

OCEL: Filtered object-centric event log

487

"""

488

489

def filter_ocel_object_types(ocel, object_types):

490

"""

491

Filter OCEL by object types.

492

493

Parameters:

494

- ocel (OCEL): Object-centric event log

495

- object_types (List[str]): Object types to retain

496

497

Returns:

498

OCEL: Filtered object-centric event log

499

"""

500

```

501

502

### OCEL Connected Component Filtering

503

504

Filter OCEL based on connected component analysis.

505

506

```python { .api }

507

def filter_ocel_cc_object(ocel, object_id):

508

"""

509

Filter OCEL by connected component containing specific object.

510

511

Parameters:

512

- ocel (OCEL): Object-centric event log

513

- object_id (str): Object ID to find connected component for

514

515

Returns:

516

OCEL: Filtered object-centric event log

517

"""

518

519

def filter_ocel_cc_length(ocel, min_length, max_length):

520

"""

521

Filter OCEL by connected component length.

522

523

Parameters:

524

- ocel (OCEL): Object-centric event log

525

- min_length (int): Minimum component length

526

- max_length (int): Maximum component length

527

528

Returns:

529

OCEL: Filtered object-centric event log

530

"""

531

532

def filter_ocel_cc_otype(ocel, object_type):

533

"""

534

Filter OCEL by connected components containing specific object type.

535

536

Parameters:

537

- ocel (OCEL): Object-centric event log

538

- object_type (str): Object type to filter by

539

540

Returns:

541

OCEL: Filtered object-centric event log

542

"""

543

544

def filter_ocel_cc_activity(ocel, activity):

545

"""

546

Filter OCEL by connected components containing specific activity.

547

548

Parameters:

549

- ocel (OCEL): Object-centric event log

550

- activity (str): Activity to filter by

551

552

Returns:

553

OCEL: Filtered object-centric event log

554

"""

555

556

def filter_ocel_activities_connected_object_type(ocel, object_type):

557

"""

558

Filter OCEL activities connected to specific object type.

559

560

Parameters:

561

- ocel (OCEL): Object-centric event log

562

- object_type (str): Object type to filter activities for

563

564

Returns:

565

OCEL: Filtered object-centric event log

566

"""

567

```

568

569

### DFG Filtering

570

571

Filter Directly-Follows Graphs based on activity and path frequencies.

572

573

```python { .api }

574

def filter_dfg_activities_percentage(dfg, start_activities, end_activities, percentage):

575

"""

576

Filter DFG by activity percentage threshold.

577

578

Parameters:

579

- dfg (dict): Directly-follows graph

580

- start_activities (dict): Start activities and frequencies

581

- end_activities (dict): End activities and frequencies

582

- percentage (float): Percentage threshold (0.0-1.0)

583

584

Returns:

585

Tuple[dict, dict, dict]: Filtered (dfg, start_activities, end_activities)

586

"""

587

588

def filter_dfg_paths_percentage(dfg, start_activities, end_activities, percentage):

589

"""

590

Filter DFG by path percentage threshold.

591

592

Parameters:

593

- dfg (dict): Directly-follows graph

594

- start_activities (dict): Start activities and frequencies

595

- end_activities (dict): End activities and frequencies

596

- percentage (float): Percentage threshold (0.0-1.0)

597

598

Returns:

599

Tuple[dict, dict, dict]: Filtered (dfg, start_activities, end_activities)

600

"""

601

```

602

603

## Usage Examples

604

605

### Basic Filtering Operations

606

607

```python

608

import pm4py

609

610

# Load event log

611

log = pm4py.read_xes('event_log.xes')

612

613

# Keep only top 10 most frequent variants

614

filtered_log = pm4py.filter_variants_top_k(log, 10)

615

616

# Filter by start activities

617

filtered_log = pm4py.filter_start_activities(log, ['Start Process', 'Initialize'])

618

619

# Filter by case performance (duration between 1 hour and 1 week)

620

filtered_log = pm4py.filter_case_performance(log, 3600, 604800)

621

622

# Filter by case size (between 5 and 50 events)

623

filtered_log = pm4py.filter_case_size(log, 5, 50)

624

```

625

626

### Advanced Behavioral Filtering

627

628

```python

629

import pm4py

630

631

# Filter cases containing specific directly-follows relations

632

relations = [('Submit Application', 'Review Application'),

633

('Review Application', 'Make Decision')]

634

filtered_log = pm4py.filter_directly_follows_relation(log, relations, retain=True)

635

636

# Keep variants covering 80% of cases

637

filtered_log = pm4py.filter_variants_by_coverage_percentage(log, 0.8)

638

639

# Filter cases with rework (activities occurring more than once)

640

rework_log = pm4py.filter_activities_rework(log, min_occurrences=2)

641

```

642

643

### Time-Based Filtering

644

645

```python

646

import pm4py

647

from datetime import datetime

648

649

# Filter events within specific time range

650

start_date = datetime(2023, 1, 1)

651

end_date = datetime(2023, 12, 31)

652

time_filtered_log = pm4py.filter_time_range(log, start_date, end_date)

653

654

# Filter by path performance

655

paths = [('Submit', 'Approve'), ('Review', 'Decision')]

656

perf_filtered_log = pm4py.filter_paths_performance(

657

log, paths,

658

min_performance=3600, # 1 hour minimum

659

max_performance=86400 # 1 day maximum

660

)

661

```

662

663

### Trace Segment Analysis

664

665

```python

666

import pm4py

667

668

# Extract prefixes of length 5 for predictive modeling

669

prefixes = pm4py.filter_prefixes(log, 5)

670

671

# Extract suffixes of length 3

672

suffixes = pm4py.filter_suffixes(log, 3)

673

674

# Extract segments between positions 2 and 8

675

segments = pm4py.filter_trace_segments(log, 2, 8)

676

```

677

678

### Organizational Filtering

679

680

```python

681

import pm4py

682

683

# Filter cases violating four-eyes principle

684

violations = pm4py.filter_four_eyes_principle(log)

685

686

# Filter cases where 'Approval' activity is done by different resources

687

diverse_approval = pm4py.filter_activity_done_different_resources(log, 'Approval')

688

```

689

690

### OCEL Filtering

691

692

```python

693

import pm4py

694

695

# Load OCEL

696

ocel = pm4py.read_ocel('ocel_data.csv')

697

698

# Filter by object types

699

filtered_ocel = pm4py.filter_ocel_object_types(ocel, ['Order', 'Invoice'])

700

701

# Filter by timestamp range

702

from datetime import datetime

703

start_time = datetime(2023, 1, 1)

704

end_time = datetime(2023, 6, 30)

705

time_filtered_ocel = pm4py.filter_ocel_events_timestamp(ocel, start_time, end_time)

706

707

# Filter by connected component length

708

cc_filtered_ocel = pm4py.filter_ocel_cc_length(ocel, min_length=10, max_length=100)

709

710

# Filter by object type constraints

711

constraints = {

712

'Order': ['Create Order', 'Process Payment', 'Ship Order'],

713

'Product': ['Add to Cart', 'Remove from Cart', 'Purchase']

714

}

715

constrained_ocel = pm4py.filter_ocel_object_types_allowed_activities(ocel, constraints)

716

```

717

718

### Combining Multiple Filters

719

720

```python

721

import pm4py

722

723

def create_analysis_subset(log):

724

"""Create a focused subset for detailed analysis."""

725

726

# Start with top variants covering 90% of cases

727

filtered_log = pm4py.filter_variants_by_coverage_percentage(log, 0.9)

728

729

# Remove very short and very long cases

730

filtered_log = pm4py.filter_case_size(filtered_log, 3, 30)

731

732

# Filter by reasonable case duration (1 hour to 30 days)

733

filtered_log = pm4py.filter_case_performance(filtered_log, 3600, 2592000)

734

735

# Keep only cases starting with specific activities

736

start_activities = ['Register', 'Submit Application', 'Create Order']

737

filtered_log = pm4py.filter_start_activities(filtered_log, start_activities)

738

739

return filtered_log

740

741

analysis_log = create_analysis_subset(log)

742

print(f"Original log: {len(log)} cases")

743

print(f"Filtered log: {len(analysis_log)} cases")

744

```

745

746

### DFG Filtering

747

748

```python

749

import pm4py

750

751

# Discover DFG

752

dfg, start_activities, end_activities = pm4py.discover_dfg(log)

753

754

# Filter DFG to keep top 80% of activities by frequency

755

filtered_dfg, filtered_start, filtered_end = pm4py.filter_dfg_activities_percentage(

756

dfg, start_activities, end_activities, 0.8

757

)

758

759

# Filter DFG to keep top 90% of paths by frequency

760

path_filtered_dfg, path_start, path_end = pm4py.filter_dfg_paths_percentage(

761

dfg, start_activities, end_activities, 0.9

762

)

763

```