Process mining library for discovering, analyzing and visualizing business processes from event data
—
Comprehensive process discovery algorithms for extracting process models from event logs. PM4PY implements classical and modern discovery techniques including Alpha Miner, Heuristics Miner, Inductive Miner, and advanced approaches like POWL and DECLARE.
Discover Petri net models using various algorithms, each with different strengths for handling noise, loops, and complex control flow.
def discover_petri_net_inductive(log, multi_processing=True, noise_threshold=0.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):
"""
Discover Petri net using Inductive Miner algorithm.
Best for handling noise and guaranteeing sound process models.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- multi_processing (bool): Enable parallel processing
- noise_threshold (float): Noise threshold (0.0-1.0)
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
- disable_fallthroughs (bool): Disable fallthrough operations
Returns:
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
"""
def discover_petri_net_alpha(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover Petri net using Alpha Miner algorithm.
Classical algorithm good for structured processes without noise.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
"""
def discover_petri_net_alpha_plus(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover Petri net using Alpha+ algorithm (deprecated in 2.3.0).
Enhanced Alpha Miner with improved loop handling.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
"""
def discover_petri_net_heuristics(log, dependency_threshold=0.5, and_threshold=0.65, loop_two_threshold=0.5, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover Petri net using Heuristics Miner algorithm.
Good balance between noise handling and model precision.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- dependency_threshold (float): Dependency threshold (0.0-1.0)
- and_threshold (float): AND-split threshold
- loop_two_threshold (float): Two-loop threshold
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
"""
def discover_petri_net_ilp(log, alpha=1.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover Petri net using ILP (Integer Linear Programming) Miner.
Optimization-based approach for optimal model discovery.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- alpha (float): Alpha parameter for optimization
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
"""Discover hierarchical process tree models that provide structured representations of process behavior.
def discover_process_tree_inductive(log, noise_threshold=0.0, multi_processing=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):
"""
Discover process tree using Inductive Miner algorithm.
Guarantees sound, block-structured process models.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- noise_threshold (float): Noise threshold for filtering
- multi_processing (bool): Enable parallel processing
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
- disable_fallthroughs (bool): Disable fallthrough operations
Returns:
ProcessTree: Hierarchical process tree model
"""Discover graph-based process representations including Directly-Follows Graphs and performance-enhanced variants.
def discover_dfg(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover Directly-Follows Graph showing direct successor relationships.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Tuple[dict, dict, dict]: (dfg_dict, start_activities, end_activities)
"""
def discover_dfg_typed(log, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp'):
"""
Discover typed DFG object with enhanced functionality.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- case_id_key (str): Case ID attribute name
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
Returns:
DFG: Typed directly-follows graph object
"""
def discover_directly_follows_graph(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Alias for discover_dfg function.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Tuple[dict, dict, dict]: (dfg_dict, start_activities, end_activities)
"""
def discover_performance_dfg(log, business_hours=False, business_hour_slots=None, workcalendar=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', perf_aggregation_key='all'):
"""
Discover performance DFG with timing information between activities.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- business_hours (bool): Consider only business hours
- business_hour_slots (Optional[List]): Business hour time slots
- workcalendar (Optional): Work calendar for time calculations
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
- perf_aggregation_key (str): Performance aggregation method
Returns:
Tuple[dict, dict, dict]: (performance_dfg, start_activities, end_activities)
"""
def discover_eventually_follows_graph(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover eventually-follows relationships between activities.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Dict[Tuple[str, str], int]: Eventually-follows relationships with frequencies
"""Discover heuristics nets that balance between precision and noise tolerance using frequency and dependency metrics.
def discover_heuristics_net(log, dependency_threshold=0.5, and_threshold=0.65, loop_two_threshold=0.5, min_act_count=1, min_dfg_occurrences=1, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', decoration='frequency'):
"""
Discover heuristics net using frequency and dependency heuristics.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- dependency_threshold (float): Dependency threshold
- and_threshold (float): AND-split threshold
- loop_two_threshold (float): Two-loop threshold
- min_act_count (int): Minimum activity count
- min_dfg_occurrences (int): Minimum DFG occurrences
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
- decoration (str): Decoration type ('frequency', 'performance')
Returns:
HeuristicsNet: Heuristics net object
"""Discover BPMN models and transition systems for different modeling requirements.
def discover_bpmn_inductive(log, noise_threshold=0.0, multi_processing=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):
"""
Discover BPMN model using Inductive Miner algorithm.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- noise_threshold (float): Noise threshold
- multi_processing (bool): Enable parallel processing
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
- disable_fallthroughs (bool): Disable fallthrough operations
Returns:
BPMN: BPMN model object
"""
def discover_transition_system(log, direction='forward', window=2, view='sequence', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover transition system representing state space of the process.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- direction (str): Direction of analysis ('forward', 'backward')
- window (int): Window size for state construction
- view (str): View type ('sequence', 'set')
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
TransitionSystem: Transition system model
"""
def discover_prefix_tree(log, max_path_length=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover prefix tree/trie structure from process traces.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- max_path_length (Optional[int]): Maximum path length
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Trie: Prefix tree structure
"""Discover temporal profiles and constraint-based models for time-aware process analysis.
def discover_temporal_profile(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover temporal profile showing time relationships between activities.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Dict[Tuple[str, str], Tuple[float, float]]: Temporal constraints (min_time, max_time)
"""Discover declarative models including log skeletons and DECLARE constraints.
def discover_log_skeleton(log, noise_threshold=0.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover log skeleton constraints from event log.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- noise_threshold (float): Noise threshold for constraint filtering
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Dict[str, Any]: Log skeleton constraints
"""
def discover_declare(log, allowed_templates=None, considered_activities=None, min_support_ratio=None, min_confidence_ratio=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover DECLARE model with temporal logic constraints.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- allowed_templates (Optional[List]): Allowed DECLARE templates
- considered_activities (Optional[List]): Activities to consider
- min_support_ratio (Optional[float]): Minimum support ratio
- min_confidence_ratio (Optional[float]): Minimum confidence ratio
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Dict[str, Dict[Any, Dict[str, int]]]: DECLARE model constraints
"""
def discover_powl(log, variant=None, filtering_weight_factor=0.0, order_graph_filtering_threshold=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Discover POWL (Partially Ordered Workflow Language) model.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- variant (Optional[str]): Algorithm variant
- filtering_weight_factor (float): Weight factor for filtering
- order_graph_filtering_threshold (Optional[float]): Filtering threshold
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
POWL: POWL model object
"""Discover footprints, batches, and other analytical structures from event logs.
def discover_footprints(*args):
"""
Discover footprints from logs or models for comparison purposes.
Parameters:
- *args: Variable arguments (log or model objects)
Returns:
Union[List[Dict[str, Any]], Dict[str, Any]]: Footprint representations
"""
def derive_minimum_self_distance(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
"""
Compute minimum self-distance for activities (loop detection).
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
Returns:
Dict[str, int]: Minimum self-distances per activity
"""
def discover_batches(log, merge_distance=900, min_batch_size=2, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', resource_key='org:resource'):
"""
Discover batch activities based on temporal and resource patterns.
Parameters:
- log (Union[EventLog, pd.DataFrame]): Event log data
- merge_distance (int): Maximum time distance for batching (seconds)
- min_batch_size (int): Minimum batch size
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
- case_id_key (str): Case ID attribute name
- resource_key (str): Resource attribute name
Returns:
List[Tuple[Tuple[str, str], int, Dict[str, Any]]]: Discovered batches
"""
def correlation_miner(df, annotation='frequency', activity_key='concept:name', timestamp_key='time:timestamp'):
"""
Correlation miner for logs without case IDs.
Parameters:
- df (pd.DataFrame): Event data without case identifiers
- annotation (str): Annotation type ('frequency', 'performance')
- activity_key (str): Activity attribute name
- timestamp_key (str): Timestamp attribute name
Returns:
Tuple[dict, dict, dict]: (dfg, start_activities, end_activities)
"""import pm4py
# Load event log
log = pm4py.read_xes('event_log.xes')
# Discover Petri net using Inductive Miner (recommended)
net, initial_marking, final_marking = pm4py.discover_petri_net_inductive(log)
# Discover process tree
tree = pm4py.discover_process_tree_inductive(log)
# Discover DFG
dfg, start_activities, end_activities = pm4py.discover_dfg(log)import pm4py
# Inductive Miner with noise handling
net, im, fm = pm4py.discover_petri_net_inductive(
log,
noise_threshold=0.2, # Handle 20% noise
multi_processing=True
)
# Heuristics Miner with custom thresholds
net, im, fm = pm4py.discover_petri_net_heuristics(
log,
dependency_threshold=0.7,
and_threshold=0.8,
loop_two_threshold=0.9
)
# Performance DFG with business hours
perf_dfg, start_acts, end_acts = pm4py.discover_performance_dfg(
log,
business_hours=True,
perf_aggregation_key='mean'
)import pm4py
# Discover DECLARE constraints
declare_model = pm4py.discover_declare(
log,
min_support_ratio=0.8,
min_confidence_ratio=0.9
)
# Discover log skeleton
skeleton = pm4py.discover_log_skeleton(log, noise_threshold=0.1)
# Discover POWL model
powl_model = pm4py.discover_powl(log, filtering_weight_factor=0.5)Install with Tessl CLI
npx tessl i tessl/pypi-pm4py