or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

conformance-checking.mdfiltering.mdindex.mdml-organizational.mdobject-centric.mdprocess-discovery.mdreading-writing.mdstatistics-analysis.mdutilities-conversion.mdvisualization.md

process-discovery.mddocs/

0

# Process Discovery Algorithms

1

2

Comprehensive process discovery algorithms for extracting process models from event logs. PM4PY implements classical and modern discovery techniques including Alpha Miner, Heuristics Miner, Inductive Miner, and advanced approaches like POWL and DECLARE.

3

4

## Capabilities

5

6

### Petri Net Discovery

7

8

Discover Petri net models using various algorithms, each with different strengths for handling noise, loops, and complex control flow.

9

10

```python { .api }

11

def discover_petri_net_inductive(log, multi_processing=True, noise_threshold=0.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):

12

"""

13

Discover Petri net using Inductive Miner algorithm.

14

Best for handling noise and guaranteeing sound process models.

15

16

Parameters:

17

- log (Union[EventLog, pd.DataFrame]): Event log data

18

- multi_processing (bool): Enable parallel processing

19

- noise_threshold (float): Noise threshold (0.0-1.0)

20

- activity_key (str): Activity attribute name

21

- timestamp_key (str): Timestamp attribute name

22

- case_id_key (str): Case ID attribute name

23

- disable_fallthroughs (bool): Disable fallthrough operations

24

25

Returns:

26

Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)

27

"""

28

29

def discover_petri_net_alpha(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

30

"""

31

Discover Petri net using Alpha Miner algorithm.

32

Classical algorithm good for structured processes without noise.

33

34

Parameters:

35

- log (Union[EventLog, pd.DataFrame]): Event log data

36

- activity_key (str): Activity attribute name

37

- timestamp_key (str): Timestamp attribute name

38

- case_id_key (str): Case ID attribute name

39

40

Returns:

41

Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)

42

"""

43

44

def discover_petri_net_alpha_plus(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

45

"""

46

Discover Petri net using Alpha+ algorithm (deprecated in 2.3.0).

47

Enhanced Alpha Miner with improved loop handling.

48

49

Parameters:

50

- log (Union[EventLog, pd.DataFrame]): Event log data

51

- activity_key (str): Activity attribute name

52

- timestamp_key (str): Timestamp attribute name

53

- case_id_key (str): Case ID attribute name

54

55

Returns:

56

Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)

57

"""

58

59

def discover_petri_net_heuristics(log, dependency_threshold=0.5, and_threshold=0.65, loop_two_threshold=0.5, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

60

"""

61

Discover Petri net using Heuristics Miner algorithm.

62

Good balance between noise handling and model precision.

63

64

Parameters:

65

- log (Union[EventLog, pd.DataFrame]): Event log data

66

- dependency_threshold (float): Dependency threshold (0.0-1.0)

67

- and_threshold (float): AND-split threshold

68

- loop_two_threshold (float): Two-loop threshold

69

- activity_key (str): Activity attribute name

70

- timestamp_key (str): Timestamp attribute name

71

- case_id_key (str): Case ID attribute name

72

73

Returns:

74

Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)

75

"""

76

77

def discover_petri_net_ilp(log, alpha=1.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

78

"""

79

Discover Petri net using ILP (Integer Linear Programming) Miner.

80

Optimization-based approach for optimal model discovery.

81

82

Parameters:

83

- log (Union[EventLog, pd.DataFrame]): Event log data

84

- alpha (float): Alpha parameter for optimization

85

- activity_key (str): Activity attribute name

86

- timestamp_key (str): Timestamp attribute name

87

- case_id_key (str): Case ID attribute name

88

89

Returns:

90

Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)

91

"""

92

```

93

94

### Process Tree Discovery

95

96

Discover hierarchical process tree models that provide structured representations of process behavior.

97

98

```python { .api }

99

def discover_process_tree_inductive(log, noise_threshold=0.0, multi_processing=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):

100

"""

101

Discover process tree using Inductive Miner algorithm.

102

Guarantees sound, block-structured process models.

103

104

Parameters:

105

- log (Union[EventLog, pd.DataFrame]): Event log data

106

- noise_threshold (float): Noise threshold for filtering

107

- multi_processing (bool): Enable parallel processing

108

- activity_key (str): Activity attribute name

109

- timestamp_key (str): Timestamp attribute name

110

- case_id_key (str): Case ID attribute name

111

- disable_fallthroughs (bool): Disable fallthrough operations

112

113

Returns:

114

ProcessTree: Hierarchical process tree model

115

"""

116

```

117

118

### Graph-Based Discovery

119

120

Discover graph-based process representations including Directly-Follows Graphs and performance-enhanced variants.

121

122

```python { .api }

123

def discover_dfg(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

124

"""

125

Discover Directly-Follows Graph showing direct successor relationships.

126

127

Parameters:

128

- log (Union[EventLog, pd.DataFrame]): Event log data

129

- activity_key (str): Activity attribute name

130

- timestamp_key (str): Timestamp attribute name

131

- case_id_key (str): Case ID attribute name

132

133

Returns:

134

Tuple[dict, dict, dict]: (dfg_dict, start_activities, end_activities)

135

"""

136

137

def discover_dfg_typed(log, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp'):

138

"""

139

Discover typed DFG object with enhanced functionality.

140

141

Parameters:

142

- log (Union[EventLog, pd.DataFrame]): Event log data

143

- case_id_key (str): Case ID attribute name

144

- activity_key (str): Activity attribute name

145

- timestamp_key (str): Timestamp attribute name

146

147

Returns:

148

DFG: Typed directly-follows graph object

149

"""

150

151

def discover_directly_follows_graph(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

152

"""

153

Alias for discover_dfg function.

154

155

Parameters:

156

- log (Union[EventLog, pd.DataFrame]): Event log data

157

- activity_key (str): Activity attribute name

158

- timestamp_key (str): Timestamp attribute name

159

- case_id_key (str): Case ID attribute name

160

161

Returns:

162

Tuple[dict, dict, dict]: (dfg_dict, start_activities, end_activities)

163

"""

164

165

def discover_performance_dfg(log, business_hours=False, business_hour_slots=None, workcalendar=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', perf_aggregation_key='all'):

166

"""

167

Discover performance DFG with timing information between activities.

168

169

Parameters:

170

- log (Union[EventLog, pd.DataFrame]): Event log data

171

- business_hours (bool): Consider only business hours

172

- business_hour_slots (Optional[List]): Business hour time slots

173

- workcalendar (Optional): Work calendar for time calculations

174

- activity_key (str): Activity attribute name

175

- timestamp_key (str): Timestamp attribute name

176

- case_id_key (str): Case ID attribute name

177

- perf_aggregation_key (str): Performance aggregation method

178

179

Returns:

180

Tuple[dict, dict, dict]: (performance_dfg, start_activities, end_activities)

181

"""

182

183

def discover_eventually_follows_graph(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

184

"""

185

Discover eventually-follows relationships between activities.

186

187

Parameters:

188

- log (Union[EventLog, pd.DataFrame]): Event log data

189

- activity_key (str): Activity attribute name

190

- timestamp_key (str): Timestamp attribute name

191

- case_id_key (str): Case ID attribute name

192

193

Returns:

194

Dict[Tuple[str, str], int]: Eventually-follows relationships with frequencies

195

"""

196

```

197

198

### Heuristics Net Discovery

199

200

Discover heuristics nets that balance between precision and noise tolerance using frequency and dependency metrics.

201

202

```python { .api }

203

def discover_heuristics_net(log, dependency_threshold=0.5, and_threshold=0.65, loop_two_threshold=0.5, min_act_count=1, min_dfg_occurrences=1, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', decoration='frequency'):

204

"""

205

Discover heuristics net using frequency and dependency heuristics.

206

207

Parameters:

208

- log (Union[EventLog, pd.DataFrame]): Event log data

209

- dependency_threshold (float): Dependency threshold

210

- and_threshold (float): AND-split threshold

211

- loop_two_threshold (float): Two-loop threshold

212

- min_act_count (int): Minimum activity count

213

- min_dfg_occurrences (int): Minimum DFG occurrences

214

- activity_key (str): Activity attribute name

215

- timestamp_key (str): Timestamp attribute name

216

- case_id_key (str): Case ID attribute name

217

- decoration (str): Decoration type ('frequency', 'performance')

218

219

Returns:

220

HeuristicsNet: Heuristics net object

221

"""

222

```

223

224

### Advanced Discovery Methods

225

226

Discover BPMN models and transition systems for different modeling requirements.

227

228

```python { .api }

229

def discover_bpmn_inductive(log, noise_threshold=0.0, multi_processing=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):

230

"""

231

Discover BPMN model using Inductive Miner algorithm.

232

233

Parameters:

234

- log (Union[EventLog, pd.DataFrame]): Event log data

235

- noise_threshold (float): Noise threshold

236

- multi_processing (bool): Enable parallel processing

237

- activity_key (str): Activity attribute name

238

- timestamp_key (str): Timestamp attribute name

239

- case_id_key (str): Case ID attribute name

240

- disable_fallthroughs (bool): Disable fallthrough operations

241

242

Returns:

243

BPMN: BPMN model object

244

"""

245

246

def discover_transition_system(log, direction='forward', window=2, view='sequence', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

247

"""

248

Discover transition system representing state space of the process.

249

250

Parameters:

251

- log (Union[EventLog, pd.DataFrame]): Event log data

252

- direction (str): Direction of analysis ('forward', 'backward')

253

- window (int): Window size for state construction

254

- view (str): View type ('sequence', 'set')

255

- activity_key (str): Activity attribute name

256

- timestamp_key (str): Timestamp attribute name

257

- case_id_key (str): Case ID attribute name

258

259

Returns:

260

TransitionSystem: Transition system model

261

"""

262

263

def discover_prefix_tree(log, max_path_length=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

264

"""

265

Discover prefix tree/trie structure from process traces.

266

267

Parameters:

268

- log (Union[EventLog, pd.DataFrame]): Event log data

269

- max_path_length (Optional[int]): Maximum path length

270

- activity_key (str): Activity attribute name

271

- timestamp_key (str): Timestamp attribute name

272

- case_id_key (str): Case ID attribute name

273

274

Returns:

275

Trie: Prefix tree structure

276

"""

277

```

278

279

### Temporal and Constraint Discovery

280

281

Discover temporal profiles and constraint-based models for time-aware process analysis.

282

283

```python { .api }

284

def discover_temporal_profile(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

285

"""

286

Discover temporal profile showing time relationships between activities.

287

288

Parameters:

289

- log (Union[EventLog, pd.DataFrame]): Event log data

290

- activity_key (str): Activity attribute name

291

- timestamp_key (str): Timestamp attribute name

292

- case_id_key (str): Case ID attribute name

293

294

Returns:

295

Dict[Tuple[str, str], Tuple[float, float]]: Temporal constraints (min_time, max_time)

296

"""

297

```

298

299

### Declarative Discovery

300

301

Discover declarative models including log skeletons and DECLARE constraints.

302

303

```python { .api }

304

def discover_log_skeleton(log, noise_threshold=0.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

305

"""

306

Discover log skeleton constraints from event log.

307

308

Parameters:

309

- log (Union[EventLog, pd.DataFrame]): Event log data

310

- noise_threshold (float): Noise threshold for constraint filtering

311

- activity_key (str): Activity attribute name

312

- timestamp_key (str): Timestamp attribute name

313

- case_id_key (str): Case ID attribute name

314

315

Returns:

316

Dict[str, Any]: Log skeleton constraints

317

"""

318

319

def discover_declare(log, allowed_templates=None, considered_activities=None, min_support_ratio=None, min_confidence_ratio=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

320

"""

321

Discover DECLARE model with temporal logic constraints.

322

323

Parameters:

324

- log (Union[EventLog, pd.DataFrame]): Event log data

325

- allowed_templates (Optional[List]): Allowed DECLARE templates

326

- considered_activities (Optional[List]): Activities to consider

327

- min_support_ratio (Optional[float]): Minimum support ratio

328

- min_confidence_ratio (Optional[float]): Minimum confidence ratio

329

- activity_key (str): Activity attribute name

330

- timestamp_key (str): Timestamp attribute name

331

- case_id_key (str): Case ID attribute name

332

333

Returns:

334

Dict[str, Dict[Any, Dict[str, int]]]: DECLARE model constraints

335

"""

336

337

def discover_powl(log, variant=None, filtering_weight_factor=0.0, order_graph_filtering_threshold=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

338

"""

339

Discover POWL (Partially Ordered Workflow Language) model.

340

341

Parameters:

342

- log (Union[EventLog, pd.DataFrame]): Event log data

343

- variant (Optional[str]): Algorithm variant

344

- filtering_weight_factor (float): Weight factor for filtering

345

- order_graph_filtering_threshold (Optional[float]): Filtering threshold

346

- activity_key (str): Activity attribute name

347

- timestamp_key (str): Timestamp attribute name

348

- case_id_key (str): Case ID attribute name

349

350

Returns:

351

POWL: POWL model object

352

"""

353

```

354

355

### Utility Discovery Functions

356

357

Discover footprints, batches, and other analytical structures from event logs.

358

359

```python { .api }

360

def discover_footprints(*args):

361

"""

362

Discover footprints from logs or models for comparison purposes.

363

364

Parameters:

365

- *args: Variable arguments (log or model objects)

366

367

Returns:

368

Union[List[Dict[str, Any]], Dict[str, Any]]: Footprint representations

369

"""

370

371

def derive_minimum_self_distance(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):

372

"""

373

Compute minimum self-distance for activities (loop detection).

374

375

Parameters:

376

- log (Union[EventLog, pd.DataFrame]): Event log data

377

- activity_key (str): Activity attribute name

378

- timestamp_key (str): Timestamp attribute name

379

- case_id_key (str): Case ID attribute name

380

381

Returns:

382

Dict[str, int]: Minimum self-distances per activity

383

"""

384

385

def discover_batches(log, merge_distance=900, min_batch_size=2, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', resource_key='org:resource'):

386

"""

387

Discover batch activities based on temporal and resource patterns.

388

389

Parameters:

390

- log (Union[EventLog, pd.DataFrame]): Event log data

391

- merge_distance (int): Maximum time distance for batching (seconds)

392

- min_batch_size (int): Minimum batch size

393

- activity_key (str): Activity attribute name

394

- timestamp_key (str): Timestamp attribute name

395

- case_id_key (str): Case ID attribute name

396

- resource_key (str): Resource attribute name

397

398

Returns:

399

List[Tuple[Tuple[str, str], int, Dict[str, Any]]]: Discovered batches

400

"""

401

402

def correlation_miner(df, annotation='frequency', activity_key='concept:name', timestamp_key='time:timestamp'):

403

"""

404

Correlation miner for logs without case IDs.

405

406

Parameters:

407

- df (pd.DataFrame): Event data without case identifiers

408

- annotation (str): Annotation type ('frequency', 'performance')

409

- activity_key (str): Activity attribute name

410

- timestamp_key (str): Timestamp attribute name

411

412

Returns:

413

Tuple[dict, dict, dict]: (dfg, start_activities, end_activities)

414

"""

415

```

416

417

## Usage Examples

418

419

### Basic Process Discovery

420

421

```python

422

import pm4py

423

424

# Load event log

425

log = pm4py.read_xes('event_log.xes')

426

427

# Discover Petri net using Inductive Miner (recommended)

428

net, initial_marking, final_marking = pm4py.discover_petri_net_inductive(log)

429

430

# Discover process tree

431

tree = pm4py.discover_process_tree_inductive(log)

432

433

# Discover DFG

434

dfg, start_activities, end_activities = pm4py.discover_dfg(log)

435

```

436

437

### Advanced Discovery with Parameters

438

439

```python

440

import pm4py

441

442

# Inductive Miner with noise handling

443

net, im, fm = pm4py.discover_petri_net_inductive(

444

log,

445

noise_threshold=0.2, # Handle 20% noise

446

multi_processing=True

447

)

448

449

# Heuristics Miner with custom thresholds

450

net, im, fm = pm4py.discover_petri_net_heuristics(

451

log,

452

dependency_threshold=0.7,

453

and_threshold=0.8,

454

loop_two_threshold=0.9

455

)

456

457

# Performance DFG with business hours

458

perf_dfg, start_acts, end_acts = pm4py.discover_performance_dfg(

459

log,

460

business_hours=True,

461

perf_aggregation_key='mean'

462

)

463

```

464

465

### Declarative Process Discovery

466

467

```python

468

import pm4py

469

470

# Discover DECLARE constraints

471

declare_model = pm4py.discover_declare(

472

log,

473

min_support_ratio=0.8,

474

min_confidence_ratio=0.9

475

)

476

477

# Discover log skeleton

478

skeleton = pm4py.discover_log_skeleton(log, noise_threshold=0.1)

479

480

# Discover POWL model

481

powl_model = pm4py.discover_powl(log, filtering_weight_factor=0.5)

482

```