tessl/pypi-dtaidistance

Distance measures for time series with Dynamic Time Warping as the primary focus

—

Pending

Overview

Eval results

Files

Weighted DTW and Machine Learning

Name: tessl/pypi-dtaidistance
Author: tessl

Advanced DTW with custom weighting functions and machine learning integration for learning optimal feature weights from labeled data. This module enables domain-specific DTW customization through learned weights, decision tree-based feature importance, and constraint incorporation from must-link/cannot-link relationships.

Capabilities

Weighted DTW Core Functions

DTW computation with custom weighting functions that modify the local distance calculations based on learned or domain-specific importance patterns.

def warping_paths(s1, s2, weights=None, window=None, **kwargs):
    """
    DTW with custom weight functions.
    
    Applies position-dependent or feature-dependent weights to modify
    the local distance computation during DTW alignment.
    
    Parameters:
    - s1, s2: array-like, input sequences
    - weights: array-like/function, weight values or weighting function
    - window: int, warping window constraint
    - **kwargs: additional DTW parameters
    
    Returns:
    tuple: (distance, paths_matrix)
        - distance: float, weighted DTW distance
        - paths_matrix: 2D array, accumulated cost matrix with weights applied
    """

def distance_matrix(s, weights, window=None, show_progress=False, **kwargs):
    """
    Distance matrix computation with weights.
    
    Computes pairwise weighted DTW distances between all sequences
    in a collection using the specified weighting scheme.
    
    Parameters:
    - s: list/array, collection of sequences
    - weights: array-like/function, weights to apply during distance computation
    - window: int, warping window constraint
    - show_progress: bool, display progress bar
    - **kwargs: additional DTW parameters
    
    Returns:
    array: weighted distance matrix
    """

Machine Learning Integration

Learn optimal weights from labeled time series data using decision tree algorithms specifically designed for temporal data analysis.

def compute_weights_using_dt(series, labels, prototypeidx, **kwargs):
    """
    Learn weights using decision trees.
    
    Trains decision tree classifiers to identify discriminative time points
    or features for distinguishing between different time series classes.
    
    Parameters:
    - series: list/array, collection of time series sequences
    - labels: array-like, class labels for each sequence
    - prototypeidx: int, index of prototype sequence for each class
    - **kwargs: additional parameters for decision tree training
    
    Returns:
    tuple: (weights, importances)
        - weights: array, learned importance weights for time points/features
        - importances: array, feature importance scores from decision trees
    """

def series_to_dt(series, labels, prototypeidx, classifier=None, max_clfs=None, 
                 min_ig=0, **kwargs):
    """
    Convert time series to decision tree features.
    
    Extracts features from time series data and prepares them for
    decision tree classification, enabling weight learning.
    
    Parameters:
    - series: list/array, time series collection
    - labels: array-like, class labels
    - prototypeidx: int, prototype sequence indices
    - classifier: classifier object, optional pre-configured classifier
    - max_clfs: int, maximum number of classifiers to train
    - min_ig: float, minimum information gain threshold
    - **kwargs: additional feature extraction parameters
    
    Returns:
    tuple: (ml_values, cl_values, classifiers, importances)
        - ml_values: array, must-link constraint values
        - cl_values: array, cannot-link constraint values  
        - classifiers: list, trained decision tree classifiers
        - importances: array, feature importance scores
    """

Weight Computation from Constraints

Convert must-link and cannot-link constraints into weight values for DTW distance computation.

def compute_weights_from_mlclvalues(serie, ml_values, cl_values, only_max=False,
                                    strict_cl=True, **kwargs):
    """
    Compute weights from must-link/cannot-link values.
    
    Converts constraint information (which time points should be linked
    vs separated) into weight values for biasing DTW computations.
    
    Parameters:
    - serie: array-like, reference time series sequence
    - ml_values: array, must-link constraint strengths
    - cl_values: array, cannot-link constraint strengths
    - only_max: bool, use only maximum constraint values
    - strict_cl: bool, apply cannot-link constraints strictly
    - **kwargs: additional weight computation parameters
    
    Returns:
    array: computed weight values for DTW distance modification
    """

Visualization Integration

Specialized plotting functions for visualizing learned weights and their effects on time series analysis.

def plot_margins(serie, weights, filename=None, ax=None, origin=(0, 0), 
                 scaling=(1, 1), y_limit=None, importances=None):
    """
    Plot weight margins on time series.
    
    Visualizes the learned or assigned weights overlaid on the time series,
    showing which time points or regions are considered most important.
    
    Parameters:
    - serie: array-like, time series sequence to plot
    - weights: array-like, weight values corresponding to time points
    - filename: str, optional file path to save plot
    - ax: matplotlib axis, optional axis for plotting
    - origin: tuple, plot origin coordinates
    - scaling: tuple, scaling factors for axes
    - y_limit: tuple, y-axis limits
    - importances: array, optional feature importance values
    
    Returns:
    tuple: (figure, axes) matplotlib objects
    """

Decision Tree Classifier

Custom decision tree implementation optimized for time series weight learning with temporal-specific splitting criteria.

class DecisionTreeClassifier:
    """
    Custom decision tree for DTW weight learning.
    
    Specialized decision tree that considers temporal relationships
    and DTW-specific constraints when learning feature importance.
    """
    
    def __init__(self):
        """Initialize decision tree classifier."""
    
    def fit(self, features, targets, use_feature_once=True, 
            ignore_features=None, min_ig=0):
        """
        Train decision tree classifier.
        
        Parameters:
        - features: array, feature matrix from time series
        - targets: array, target labels for classification
        - use_feature_once: bool, prevent reusing features in same path
        - ignore_features: list, features to exclude from consideration
        - min_ig: float, minimum information gain for splits
        
        Returns:
        self: fitted classifier
        """
    
    def score(self, max_kd):
        """
        Calculate classifier score.
        
        Parameters:
        - max_kd: float, maximum k-distance threshold
        
        Returns:
        float: classifier performance score
        """
    
    @staticmethod
    def entropy(targets):
        """
        Calculate entropy of target distribution.
        
        Parameters:
        - targets: array, target labels
        
        Returns:
        float: entropy value
        """
    
    @staticmethod
    def informationgain_continuous(features, targets, threshold):
        """
        Calculate information gain for continuous features.
        
        Parameters:
        - features: array, feature values
        - targets: array, target labels
        - threshold: float, split threshold
        
        Returns:
        float: information gain value
        """
    
    @staticmethod
    def kdistance(point1, point2):
        """
        Calculate k-distance between points.
        
        Parameters:
        - point1, point2: array-like, data points
        
        Returns:
        float: k-distance value
        """

class Tree:
    """
    Decision tree representation for weight learning.
    
    Represents the structure of learned decision trees with
    nodes, splits, and importance information.
    """
    
    def add(self):
        """
        Add new node to the tree.
        
        Returns:
        int: new node identifier
        """
    
    @property
    def nb_nodes(self):
        """
        Get number of nodes in tree.
        
        Returns:
        int: node count
        """
    
    @property  
    def used_features(self):
        """
        Get set of features used in tree.
        
        Returns:
        set: feature indices used in decision tree
        """
    
    @property
    def depth(self):
        """
        Get tree depth.
        
        Returns:
        int: maximum depth of decision tree
        """

Usage Examples

Basic Weighted DTW

from dtaidistance import dtw_weighted
import numpy as np
import matplotlib.pyplot as plt

# Create time series with known important regions
np.random.seed(42)
t = np.linspace(0, 4*np.pi, 100)

# Base sequences
s1 = np.sin(t) + 0.1 * np.random.randn(100)
s2 = np.sin(t * 1.1) + 0.1 * np.random.randn(100)

# Define custom weights (higher weights = more important)
# Make the middle section more important
weights = np.ones(100)
weights[30:70] = 3.0  # Emphasize middle region
weights[45:55] = 5.0  # Highly emphasize center

# Compute weighted DTW
weighted_distance, weighted_paths = dtw_weighted.warping_paths(s1, s2, weights=weights)

# Compare with unweighted DTW
from dtaidistance import dtw
unweighted_distance, unweighted_paths = dtw.warping_paths(s1, s2)

print(f"Unweighted DTW distance: {unweighted_distance:.3f}")
print(f"Weighted DTW distance: {weighted_distance:.3f}")

# Visualize the effect of weighting
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(12, 10))

# Plot sequences with weights
ax1.plot(s1, 'b-', label='Sequence 1', linewidth=2)
ax1.plot(s2, 'r-', label='Sequence 2', linewidth=2)
ax1_twin = ax1.twinx()
ax1_twin.fill_between(range(len(weights)), 0, weights, alpha=0.3, color='green', label='Weights')
ax1.set_title('Time Series with Weight Distribution')
ax1.legend(loc='upper left')
ax1_twin.legend(loc='upper right')
ax1.grid(True)

# Plot unweighted warping paths
ax2.imshow(unweighted_paths, cmap='viridis', origin='lower')
ax2.set_title('Unweighted DTW Warping Paths')
ax2.set_xlabel('Sequence 2 Index')
ax2.set_ylabel('Sequence 1 Index')

# Plot weighted warping paths
ax3.imshow(weighted_paths, cmap='viridis', origin='lower')
ax3.set_title('Weighted DTW Warping Paths')
ax3.set_xlabel('Sequence 2 Index')
ax3.set_ylabel('Sequence 1 Index')

plt.tight_layout()
plt.show()

Learning Weights from Labeled Data

from dtaidistance import dtw_weighted
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic labeled time series data
np.random.seed(42)

def generate_class_data(class_type, n_samples=10, length=80):
    """Generate time series data for different classes."""
    t = np.linspace(0, 4*np.pi, length)
    sequences = []
    
    for i in range(n_samples):
        if class_type == 'sine':
            # Sine waves with characteristic frequency
            freq = 1.0 + 0.1 * np.random.randn()
            signal = np.sin(freq * t) + 0.1 * np.random.randn(length)
            # Add discriminative spike in middle region
            spike_pos = length // 2 + np.random.randint(-5, 6)
            signal[spike_pos] += 1.5
            
        elif class_type == 'cosine':
            # Cosine waves with characteristic frequency
            freq = 1.2 + 0.1 * np.random.randn()
            signal = np.cos(freq * t) + 0.1 * np.random.randn(length)
            # Add discriminative dip in first quarter
            dip_pos = length // 4 + np.random.randint(-5, 6)
            signal[dip_pos] -= 1.0
            
        elif class_type == 'linear':
            # Linear trends with characteristic slope
            slope = 0.5 + 0.2 * np.random.randn()
            signal = slope * np.linspace(0, 1, length) + 0.1 * np.random.randn(length)
            # Add discriminative oscillation in last quarter
            osc_region = slice(3*length//4, length)
            signal[osc_region] += 0.5 * np.sin(8 * t[osc_region])
        
        sequences.append(signal)
    
    return sequences

# Generate training data
class_sine = generate_class_data('sine', n_samples=8)
class_cosine = generate_class_data('cosine', n_samples=8)
class_linear = generate_class_data('linear', n_samples=6)

all_sequences = class_sine + class_cosine + class_linear
all_labels = [0] * 8 + [1] * 8 + [2] * 6

print(f"Generated {len(all_sequences)} labeled sequences")
print(f"Class distribution: {np.bincount(all_labels)}")

# Select prototype sequences (representative of each class)
prototype_indices = [0, 8, 16]  # First sequence from each class

# Learn weights using decision trees
try:
    weights, importances = dtw_weighted.compute_weights_using_dt(
        all_sequences, 
        all_labels, 
        prototype_indices,
        max_clfs=5,
        min_ig=0.01
    )
    
    print(f"Learned weights shape: {weights.shape}")
    print(f"Weight statistics: min={np.min(weights):.3f}, max={np.max(weights):.3f}, mean={np.mean(weights):.3f}")
    
    # Visualize learned weights for prototype sequences
    fig, axes = plt.subplots(3, 2, figsize=(14, 12))
    
    class_names = ['Sine', 'Cosine', 'Linear']
    for class_idx in range(3):
        proto_seq = all_sequences[prototype_indices[class_idx]]
        
        # Plot prototype sequence
        axes[class_idx, 0].plot(proto_seq, 'b-', linewidth=2)
        axes[class_idx, 0].set_title(f'{class_names[class_idx]} Class - Prototype Sequence')
        axes[class_idx, 0].grid(True)
        
        # Plot learned weights (assuming weights correspond to time points)
        if weights.ndim > 1:
            class_weights = weights[class_idx] if weights.shape[0] == 3 else weights[0]
        else:
            class_weights = weights
            
        axes[class_idx, 1].plot(class_weights, 'r-', linewidth=2)
        axes[class_idx, 1].set_title(f'{class_names[class_idx]} Class - Learned Weights')
        axes[class_idx, 1].set_ylabel('Weight Importance')
        axes[class_idx, 1].grid(True)
    
    plt.tight_layout()
    plt.show()
    
except Exception as e:
    print(f"Weight learning failed: {e}")
    print("Using uniform weights for demonstration")
    weights = np.ones(len(all_sequences[0]))

Must-Link/Cannot-Link Constraints

from dtaidistance import dtw_weighted
import numpy as np

# Create sequences with known constraint relationships
np.random.seed(42)

# Reference sequence
reference = np.sin(np.linspace(0, 4*np.pi, 60)) + 0.1 * np.random.randn(60)

# Sequence that should be similar (must-link)
similar_seq = reference + 0.2 * np.random.randn(60)

# Sequence that should be different (cannot-link)
different_seq = np.cos(np.linspace(0, 6*np.pi, 60)) + 0.1 * np.random.randn(60)

# Define must-link and cannot-link constraint values
# Higher values indicate stronger constraints
ml_values = np.zeros(len(reference))
cl_values = np.zeros(len(reference))

# Strong must-link constraints in middle region (these points should align)
ml_values[20:40] = 2.0
ml_values[28:32] = 5.0  # Very strong constraint

# Strong cannot-link constraints at the ends (these should not align)
cl_values[0:10] = 3.0
cl_values[50:60] = 3.0

# Compute weights from constraints
constraint_weights = dtw_weighted.compute_weights_from_mlclvalues(
    reference, 
    ml_values, 
    cl_values,
    only_max=False,
    strict_cl=True
)

print(f"Constraint weights shape: {constraint_weights.shape}")
print(f"Weight range: [{np.min(constraint_weights):.3f}, {np.max(constraint_weights):.3f}]")

# Apply constraint-based weights to DTW computations
from dtaidistance import dtw

# Regular DTW distances
dist_ref_similar = dtw.distance(reference, similar_seq)
dist_ref_different = dtw.distance(reference, different_seq)

# Weighted DTW distances (if implementation supports it)
try:
    weighted_dist_similar, _ = dtw_weighted.warping_paths(reference, similar_seq, weights=constraint_weights)
    weighted_dist_different, _ = dtw_weighted.warping_paths(reference, different_seq, weights=constraint_weights)
    
    print("\\nDistance Comparison:")
    print(f"Reference vs Similar (regular): {dist_ref_similar:.3f}")
    print(f"Reference vs Similar (weighted): {weighted_dist_similar:.3f}")
    print(f"Reference vs Different (regular): {dist_ref_different:.3f}")
    print(f"Reference vs Different (weighted): {weighted_dist_different:.3f}")
    
except Exception as e:
    print(f"Weighted distance computation failed: {e}")

# Visualize constraints and weights
import matplotlib.pyplot as plt

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))

# Plot sequences
ax1.plot(reference, 'b-', label='Reference', linewidth=2)
ax1.plot(similar_seq, 'g--', label='Similar (Must-Link)', linewidth=2)
ax1.plot(different_seq, 'r:', label='Different (Cannot-Link)', linewidth=2)
ax1.set_title('Time Series with Constraint Relationships')
ax1.legend()
ax1.grid(True)

# Plot must-link constraints
ax2.fill_between(range(len(ml_values)), 0, ml_values, alpha=0.7, color='green')
ax2.set_title('Must-Link Constraints')
ax2.set_ylabel('Constraint Strength')
ax2.grid(True)

# Plot cannot-link constraints  
ax3.fill_between(range(len(cl_values)), 0, cl_values, alpha=0.7, color='red')
ax3.set_title('Cannot-Link Constraints')
ax3.set_ylabel('Constraint Strength')
ax3.grid(True)

# Plot computed weights
ax4.plot(constraint_weights, 'purple', linewidth=2)
ax4.set_title('Computed Constraint Weights')
ax4.set_ylabel('Weight Value')
ax4.set_xlabel('Time Point')
ax4.grid(True)

plt.tight_layout()
plt.show()

Custom Decision Tree Weight Learning

from dtaidistance.dtw_weighted import DecisionTreeClassifier, Tree
import numpy as np

# Generate training data with clear discriminative patterns
np.random.seed(42)

def create_discriminative_series(class_id, n_samples=15, length=50):
    """Create series with class-specific discriminative patterns."""
    series_list = []
    
    for i in range(n_samples):
        t = np.linspace(0, 2*np.pi, length)
        
        if class_id == 0:
            # Class 0: Peak in first third
            signal = 0.2 * np.random.randn(length)
            peak_pos = length // 3 + np.random.randint(-3, 4)
            signal[peak_pos] = 2.0 + 0.3 * np.random.randn()
            
        elif class_id == 1:
            # Class 1: Peak in middle third
            signal = 0.2 * np.random.randn(length)
            peak_pos = length // 2 + np.random.randint(-3, 4)
            signal[peak_pos] = 2.0 + 0.3 * np.random.randn()
            
        else:
            # Class 2: Peak in last third
            signal = 0.2 * np.random.randn(length)
            peak_pos = 2 * length // 3 + np.random.randint(-3, 4)
            signal[peak_pos] = 2.0 + 0.3 * np.random.randn()
        
        series_list.append(signal)
    
    return series_list

# Generate training data
class0_series = create_discriminative_series(0, n_samples=10)
class1_series = create_discriminative_series(1, n_samples=10)
class2_series = create_discriminative_series(2, n_samples=8)

all_training_series = class0_series + class1_series + class2_series
training_labels = [0] * 10 + [1] * 10 + [2] * 8

print(f"Training data: {len(all_training_series)} series")
print(f"Class distribution: {np.bincount(training_labels)}")

# Extract features for decision tree (simple: use sequence values as features)
feature_matrix = np.array(all_training_series)
print(f"Feature matrix shape: {feature_matrix.shape}")

# Train custom decision tree
dt_classifier = DecisionTreeClassifier()

try:
    dt_classifier.fit(
        feature_matrix, 
        training_labels,
        use_feature_once=False,  # Allow reusing time points
        min_ig=0.1  # Require reasonable information gain
    )
    
    # Get classifier score
    score = dt_classifier.score(max_kd=1.0)
    print(f"Decision tree classifier score: {score:.3f}")
    
    # Create and analyze tree structure
    tree = Tree()
    for i in range(5):  # Add some nodes for demonstration
        node_id = tree.add()
        print(f"Added node {node_id}")
    
    print(f"Tree statistics:")
    print(f"  Number of nodes: {tree.nb_nodes}")
    print(f"  Tree depth: {tree.depth}")
    print(f"  Used features: {len(tree.used_features)} out of {feature_matrix.shape[1]}")
    
except Exception as e:
    print(f"Decision tree training failed: {e}")

# Visualize the discriminative patterns
import matplotlib.pyplot as plt

fig, axes = plt.subplots(3, 1, figsize=(12, 10))

class_names = ['Early Peak', 'Middle Peak', 'Late Peak']
class_data = [class0_series, class1_series, class2_series]

for class_idx, (class_series, class_name) in enumerate(zip(class_data, class_names)):
    ax = axes[class_idx]
    
    # Plot all series in the class
    for i, series in enumerate(class_series[:5]):  # Show first 5
        ax.plot(series, alpha=0.6, linewidth=1)
    
    # Plot class average
    class_mean = np.mean(class_series, axis=0)
    ax.plot(class_mean, 'k-', linewidth=3, label='Class Average')
    
    ax.set_title(f'Class {class_idx}: {class_name}')
    ax.legend()
    ax.grid(True)

plt.tight_layout()
plt.show()

This comprehensive weighted DTW module enables sophisticated customization of DTW distance computation through learned weights, constraint incorporation, and machine learning integration, making it possible to adapt DTW for domain-specific applications with prior knowledge or labeled training data.

Install with Tessl CLI