tessl/pypi-pm4py

Process mining library for discovering, analyzing and visualizing business processes from event data

—

Pending

Overview

Eval results

Files

Process Discovery Algorithms

Name: tessl/pypi-pm4py
Author: tessl

Comprehensive process discovery algorithms for extracting process models from event logs. PM4PY implements classical and modern discovery techniques including Alpha Miner, Heuristics Miner, Inductive Miner, and advanced approaches like POWL and DECLARE.

Capabilities

Petri Net Discovery

Discover Petri net models using various algorithms, each with different strengths for handling noise, loops, and complex control flow.

def discover_petri_net_inductive(log, multi_processing=True, noise_threshold=0.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):
    """
    Discover Petri net using Inductive Miner algorithm.
    Best for handling noise and guaranteeing sound process models.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - multi_processing (bool): Enable parallel processing
    - noise_threshold (float): Noise threshold (0.0-1.0)
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    - disable_fallthroughs (bool): Disable fallthrough operations
    
    Returns:
    Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
    """

def discover_petri_net_alpha(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover Petri net using Alpha Miner algorithm.
    Classical algorithm good for structured processes without noise.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
    """

def discover_petri_net_alpha_plus(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover Petri net using Alpha+ algorithm (deprecated in 2.3.0).
    Enhanced Alpha Miner with improved loop handling.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
    """

def discover_petri_net_heuristics(log, dependency_threshold=0.5, and_threshold=0.65, loop_two_threshold=0.5, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover Petri net using Heuristics Miner algorithm.
    Good balance between noise handling and model precision.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - dependency_threshold (float): Dependency threshold (0.0-1.0)
    - and_threshold (float): AND-split threshold
    - loop_two_threshold (float): Two-loop threshold
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
    """

def discover_petri_net_ilp(log, alpha=1.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover Petri net using ILP (Integer Linear Programming) Miner.
    Optimization-based approach for optimal model discovery.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - alpha (float): Alpha parameter for optimization
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Tuple[PetriNet, Marking, Marking]: (petri_net, initial_marking, final_marking)
    """

Process Tree Discovery

Discover hierarchical process tree models that provide structured representations of process behavior.

def discover_process_tree_inductive(log, noise_threshold=0.0, multi_processing=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):
    """
    Discover process tree using Inductive Miner algorithm.
    Guarantees sound, block-structured process models.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - noise_threshold (float): Noise threshold for filtering
    - multi_processing (bool): Enable parallel processing
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    - disable_fallthroughs (bool): Disable fallthrough operations
    
    Returns:
    ProcessTree: Hierarchical process tree model
    """

Graph-Based Discovery

Discover graph-based process representations including Directly-Follows Graphs and performance-enhanced variants.

def discover_dfg(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover Directly-Follows Graph showing direct successor relationships.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Tuple[dict, dict, dict]: (dfg_dict, start_activities, end_activities)
    """

def discover_dfg_typed(log, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp'):
    """
    Discover typed DFG object with enhanced functionality.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - case_id_key (str): Case ID attribute name
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    
    Returns:
    DFG: Typed directly-follows graph object
    """

def discover_directly_follows_graph(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Alias for discover_dfg function.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Tuple[dict, dict, dict]: (dfg_dict, start_activities, end_activities)
    """

def discover_performance_dfg(log, business_hours=False, business_hour_slots=None, workcalendar=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', perf_aggregation_key='all'):
    """
    Discover performance DFG with timing information between activities.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - business_hours (bool): Consider only business hours
    - business_hour_slots (Optional[List]): Business hour time slots
    - workcalendar (Optional): Work calendar for time calculations
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    - perf_aggregation_key (str): Performance aggregation method
    
    Returns:
    Tuple[dict, dict, dict]: (performance_dfg, start_activities, end_activities)
    """

def discover_eventually_follows_graph(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover eventually-follows relationships between activities.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Dict[Tuple[str, str], int]: Eventually-follows relationships with frequencies
    """

Heuristics Net Discovery

Discover heuristics nets that balance between precision and noise tolerance using frequency and dependency metrics.

def discover_heuristics_net(log, dependency_threshold=0.5, and_threshold=0.65, loop_two_threshold=0.5, min_act_count=1, min_dfg_occurrences=1, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', decoration='frequency'):
    """
    Discover heuristics net using frequency and dependency heuristics.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - dependency_threshold (float): Dependency threshold
    - and_threshold (float): AND-split threshold
    - loop_two_threshold (float): Two-loop threshold
    - min_act_count (int): Minimum activity count
    - min_dfg_occurrences (int): Minimum DFG occurrences
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    - decoration (str): Decoration type ('frequency', 'performance')
    
    Returns:
    HeuristicsNet: Heuristics net object
    """

Advanced Discovery Methods

Discover BPMN models and transition systems for different modeling requirements.

def discover_bpmn_inductive(log, noise_threshold=0.0, multi_processing=True, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', disable_fallthroughs=False):
    """
    Discover BPMN model using Inductive Miner algorithm.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - noise_threshold (float): Noise threshold
    - multi_processing (bool): Enable parallel processing
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    - disable_fallthroughs (bool): Disable fallthrough operations
    
    Returns:
    BPMN: BPMN model object
    """

def discover_transition_system(log, direction='forward', window=2, view='sequence', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover transition system representing state space of the process.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - direction (str): Direction of analysis ('forward', 'backward')
    - window (int): Window size for state construction
    - view (str): View type ('sequence', 'set')
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    TransitionSystem: Transition system model
    """

def discover_prefix_tree(log, max_path_length=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover prefix tree/trie structure from process traces.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - max_path_length (Optional[int]): Maximum path length
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Trie: Prefix tree structure
    """

Temporal and Constraint Discovery

Discover temporal profiles and constraint-based models for time-aware process analysis.

def discover_temporal_profile(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover temporal profile showing time relationships between activities.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Dict[Tuple[str, str], Tuple[float, float]]: Temporal constraints (min_time, max_time)
    """

Declarative Discovery

Discover declarative models including log skeletons and DECLARE constraints.

def discover_log_skeleton(log, noise_threshold=0.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover log skeleton constraints from event log.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - noise_threshold (float): Noise threshold for constraint filtering
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Dict[str, Any]: Log skeleton constraints
    """

def discover_declare(log, allowed_templates=None, considered_activities=None, min_support_ratio=None, min_confidence_ratio=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover DECLARE model with temporal logic constraints.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - allowed_templates (Optional[List]): Allowed DECLARE templates
    - considered_activities (Optional[List]): Activities to consider
    - min_support_ratio (Optional[float]): Minimum support ratio
    - min_confidence_ratio (Optional[float]): Minimum confidence ratio
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Dict[str, Dict[Any, Dict[str, int]]]: DECLARE model constraints
    """

def discover_powl(log, variant=None, filtering_weight_factor=0.0, order_graph_filtering_threshold=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Discover POWL (Partially Ordered Workflow Language) model.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - variant (Optional[str]): Algorithm variant
    - filtering_weight_factor (float): Weight factor for filtering
    - order_graph_filtering_threshold (Optional[float]): Filtering threshold
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    POWL: POWL model object
    """

Utility Discovery Functions

Discover footprints, batches, and other analytical structures from event logs.

def discover_footprints(*args):
    """
    Discover footprints from logs or models for comparison purposes.
    
    Parameters:
    - *args: Variable arguments (log or model objects)
    
    Returns:
    Union[List[Dict[str, Any]], Dict[str, Any]]: Footprint representations
    """

def derive_minimum_self_distance(log, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name'):
    """
    Compute minimum self-distance for activities (loop detection).
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    
    Returns:
    Dict[str, int]: Minimum self-distances per activity
    """

def discover_batches(log, merge_distance=900, min_batch_size=2, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', resource_key='org:resource'):
    """
    Discover batch activities based on temporal and resource patterns.
    
    Parameters:
    - log (Union[EventLog, pd.DataFrame]): Event log data
    - merge_distance (int): Maximum time distance for batching (seconds)
    - min_batch_size (int): Minimum batch size
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    - case_id_key (str): Case ID attribute name
    - resource_key (str): Resource attribute name
    
    Returns:
    List[Tuple[Tuple[str, str], int, Dict[str, Any]]]: Discovered batches
    """

def correlation_miner(df, annotation='frequency', activity_key='concept:name', timestamp_key='time:timestamp'):
    """
    Correlation miner for logs without case IDs.
    
    Parameters:
    - df (pd.DataFrame): Event data without case identifiers
    - annotation (str): Annotation type ('frequency', 'performance')
    - activity_key (str): Activity attribute name
    - timestamp_key (str): Timestamp attribute name
    
    Returns:
    Tuple[dict, dict, dict]: (dfg, start_activities, end_activities)
    """

Usage Examples

Basic Process Discovery

import pm4py

# Load event log
log = pm4py.read_xes('event_log.xes')

# Discover Petri net using Inductive Miner (recommended)
net, initial_marking, final_marking = pm4py.discover_petri_net_inductive(log)

# Discover process tree
tree = pm4py.discover_process_tree_inductive(log)

# Discover DFG
dfg, start_activities, end_activities = pm4py.discover_dfg(log)

Advanced Discovery with Parameters

import pm4py

# Inductive Miner with noise handling
net, im, fm = pm4py.discover_petri_net_inductive(
    log, 
    noise_threshold=0.2,  # Handle 20% noise
    multi_processing=True
)

# Heuristics Miner with custom thresholds
net, im, fm = pm4py.discover_petri_net_heuristics(
    log,
    dependency_threshold=0.7,
    and_threshold=0.8,
    loop_two_threshold=0.9
)

# Performance DFG with business hours
perf_dfg, start_acts, end_acts = pm4py.discover_performance_dfg(
    log,
    business_hours=True,
    perf_aggregation_key='mean'
)

Declarative Process Discovery

import pm4py

# Discover DECLARE constraints
declare_model = pm4py.discover_declare(
    log,
    min_support_ratio=0.8,
    min_confidence_ratio=0.9
)

# Discover log skeleton
skeleton = pm4py.discover_log_skeleton(log, noise_threshold=0.1)

# Discover POWL model
powl_model = pm4py.discover_powl(log, filtering_weight_factor=0.5)

Install with Tessl CLI