Distance measures for time series with Dynamic Time Warping as the primary focus
—
Advanced DTW with custom weighting functions and machine learning integration for learning optimal feature weights from labeled data. This module enables domain-specific DTW customization through learned weights, decision tree-based feature importance, and constraint incorporation from must-link/cannot-link relationships.
DTW computation with custom weighting functions that modify the local distance calculations based on learned or domain-specific importance patterns.
def warping_paths(s1, s2, weights=None, window=None, **kwargs):
"""
DTW with custom weight functions.
Applies position-dependent or feature-dependent weights to modify
the local distance computation during DTW alignment.
Parameters:
- s1, s2: array-like, input sequences
- weights: array-like/function, weight values or weighting function
- window: int, warping window constraint
- **kwargs: additional DTW parameters
Returns:
tuple: (distance, paths_matrix)
- distance: float, weighted DTW distance
- paths_matrix: 2D array, accumulated cost matrix with weights applied
"""
def distance_matrix(s, weights, window=None, show_progress=False, **kwargs):
"""
Distance matrix computation with weights.
Computes pairwise weighted DTW distances between all sequences
in a collection using the specified weighting scheme.
Parameters:
- s: list/array, collection of sequences
- weights: array-like/function, weights to apply during distance computation
- window: int, warping window constraint
- show_progress: bool, display progress bar
- **kwargs: additional DTW parameters
Returns:
array: weighted distance matrix
"""Learn optimal weights from labeled time series data using decision tree algorithms specifically designed for temporal data analysis.
def compute_weights_using_dt(series, labels, prototypeidx, **kwargs):
"""
Learn weights using decision trees.
Trains decision tree classifiers to identify discriminative time points
or features for distinguishing between different time series classes.
Parameters:
- series: list/array, collection of time series sequences
- labels: array-like, class labels for each sequence
- prototypeidx: int, index of prototype sequence for each class
- **kwargs: additional parameters for decision tree training
Returns:
tuple: (weights, importances)
- weights: array, learned importance weights for time points/features
- importances: array, feature importance scores from decision trees
"""
def series_to_dt(series, labels, prototypeidx, classifier=None, max_clfs=None,
min_ig=0, **kwargs):
"""
Convert time series to decision tree features.
Extracts features from time series data and prepares them for
decision tree classification, enabling weight learning.
Parameters:
- series: list/array, time series collection
- labels: array-like, class labels
- prototypeidx: int, prototype sequence indices
- classifier: classifier object, optional pre-configured classifier
- max_clfs: int, maximum number of classifiers to train
- min_ig: float, minimum information gain threshold
- **kwargs: additional feature extraction parameters
Returns:
tuple: (ml_values, cl_values, classifiers, importances)
- ml_values: array, must-link constraint values
- cl_values: array, cannot-link constraint values
- classifiers: list, trained decision tree classifiers
- importances: array, feature importance scores
"""Convert must-link and cannot-link constraints into weight values for DTW distance computation.
def compute_weights_from_mlclvalues(serie, ml_values, cl_values, only_max=False,
strict_cl=True, **kwargs):
"""
Compute weights from must-link/cannot-link values.
Converts constraint information (which time points should be linked
vs separated) into weight values for biasing DTW computations.
Parameters:
- serie: array-like, reference time series sequence
- ml_values: array, must-link constraint strengths
- cl_values: array, cannot-link constraint strengths
- only_max: bool, use only maximum constraint values
- strict_cl: bool, apply cannot-link constraints strictly
- **kwargs: additional weight computation parameters
Returns:
array: computed weight values for DTW distance modification
"""Specialized plotting functions for visualizing learned weights and their effects on time series analysis.
def plot_margins(serie, weights, filename=None, ax=None, origin=(0, 0),
scaling=(1, 1), y_limit=None, importances=None):
"""
Plot weight margins on time series.
Visualizes the learned or assigned weights overlaid on the time series,
showing which time points or regions are considered most important.
Parameters:
- serie: array-like, time series sequence to plot
- weights: array-like, weight values corresponding to time points
- filename: str, optional file path to save plot
- ax: matplotlib axis, optional axis for plotting
- origin: tuple, plot origin coordinates
- scaling: tuple, scaling factors for axes
- y_limit: tuple, y-axis limits
- importances: array, optional feature importance values
Returns:
tuple: (figure, axes) matplotlib objects
"""Custom decision tree implementation optimized for time series weight learning with temporal-specific splitting criteria.
class DecisionTreeClassifier:
"""
Custom decision tree for DTW weight learning.
Specialized decision tree that considers temporal relationships
and DTW-specific constraints when learning feature importance.
"""
def __init__(self):
"""Initialize decision tree classifier."""
def fit(self, features, targets, use_feature_once=True,
ignore_features=None, min_ig=0):
"""
Train decision tree classifier.
Parameters:
- features: array, feature matrix from time series
- targets: array, target labels for classification
- use_feature_once: bool, prevent reusing features in same path
- ignore_features: list, features to exclude from consideration
- min_ig: float, minimum information gain for splits
Returns:
self: fitted classifier
"""
def score(self, max_kd):
"""
Calculate classifier score.
Parameters:
- max_kd: float, maximum k-distance threshold
Returns:
float: classifier performance score
"""
@staticmethod
def entropy(targets):
"""
Calculate entropy of target distribution.
Parameters:
- targets: array, target labels
Returns:
float: entropy value
"""
@staticmethod
def informationgain_continuous(features, targets, threshold):
"""
Calculate information gain for continuous features.
Parameters:
- features: array, feature values
- targets: array, target labels
- threshold: float, split threshold
Returns:
float: information gain value
"""
@staticmethod
def kdistance(point1, point2):
"""
Calculate k-distance between points.
Parameters:
- point1, point2: array-like, data points
Returns:
float: k-distance value
"""
class Tree:
"""
Decision tree representation for weight learning.
Represents the structure of learned decision trees with
nodes, splits, and importance information.
"""
def add(self):
"""
Add new node to the tree.
Returns:
int: new node identifier
"""
@property
def nb_nodes(self):
"""
Get number of nodes in tree.
Returns:
int: node count
"""
@property
def used_features(self):
"""
Get set of features used in tree.
Returns:
set: feature indices used in decision tree
"""
@property
def depth(self):
"""
Get tree depth.
Returns:
int: maximum depth of decision tree
"""from dtaidistance import dtw_weighted
import numpy as np
import matplotlib.pyplot as plt
# Create time series with known important regions
np.random.seed(42)
t = np.linspace(0, 4*np.pi, 100)
# Base sequences
s1 = np.sin(t) + 0.1 * np.random.randn(100)
s2 = np.sin(t * 1.1) + 0.1 * np.random.randn(100)
# Define custom weights (higher weights = more important)
# Make the middle section more important
weights = np.ones(100)
weights[30:70] = 3.0 # Emphasize middle region
weights[45:55] = 5.0 # Highly emphasize center
# Compute weighted DTW
weighted_distance, weighted_paths = dtw_weighted.warping_paths(s1, s2, weights=weights)
# Compare with unweighted DTW
from dtaidistance import dtw
unweighted_distance, unweighted_paths = dtw.warping_paths(s1, s2)
print(f"Unweighted DTW distance: {unweighted_distance:.3f}")
print(f"Weighted DTW distance: {weighted_distance:.3f}")
# Visualize the effect of weighting
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(12, 10))
# Plot sequences with weights
ax1.plot(s1, 'b-', label='Sequence 1', linewidth=2)
ax1.plot(s2, 'r-', label='Sequence 2', linewidth=2)
ax1_twin = ax1.twinx()
ax1_twin.fill_between(range(len(weights)), 0, weights, alpha=0.3, color='green', label='Weights')
ax1.set_title('Time Series with Weight Distribution')
ax1.legend(loc='upper left')
ax1_twin.legend(loc='upper right')
ax1.grid(True)
# Plot unweighted warping paths
ax2.imshow(unweighted_paths, cmap='viridis', origin='lower')
ax2.set_title('Unweighted DTW Warping Paths')
ax2.set_xlabel('Sequence 2 Index')
ax2.set_ylabel('Sequence 1 Index')
# Plot weighted warping paths
ax3.imshow(weighted_paths, cmap='viridis', origin='lower')
ax3.set_title('Weighted DTW Warping Paths')
ax3.set_xlabel('Sequence 2 Index')
ax3.set_ylabel('Sequence 1 Index')
plt.tight_layout()
plt.show()from dtaidistance import dtw_weighted
import numpy as np
import matplotlib.pyplot as plt
# Generate synthetic labeled time series data
np.random.seed(42)
def generate_class_data(class_type, n_samples=10, length=80):
"""Generate time series data for different classes."""
t = np.linspace(0, 4*np.pi, length)
sequences = []
for i in range(n_samples):
if class_type == 'sine':
# Sine waves with characteristic frequency
freq = 1.0 + 0.1 * np.random.randn()
signal = np.sin(freq * t) + 0.1 * np.random.randn(length)
# Add discriminative spike in middle region
spike_pos = length // 2 + np.random.randint(-5, 6)
signal[spike_pos] += 1.5
elif class_type == 'cosine':
# Cosine waves with characteristic frequency
freq = 1.2 + 0.1 * np.random.randn()
signal = np.cos(freq * t) + 0.1 * np.random.randn(length)
# Add discriminative dip in first quarter
dip_pos = length // 4 + np.random.randint(-5, 6)
signal[dip_pos] -= 1.0
elif class_type == 'linear':
# Linear trends with characteristic slope
slope = 0.5 + 0.2 * np.random.randn()
signal = slope * np.linspace(0, 1, length) + 0.1 * np.random.randn(length)
# Add discriminative oscillation in last quarter
osc_region = slice(3*length//4, length)
signal[osc_region] += 0.5 * np.sin(8 * t[osc_region])
sequences.append(signal)
return sequences
# Generate training data
class_sine = generate_class_data('sine', n_samples=8)
class_cosine = generate_class_data('cosine', n_samples=8)
class_linear = generate_class_data('linear', n_samples=6)
all_sequences = class_sine + class_cosine + class_linear
all_labels = [0] * 8 + [1] * 8 + [2] * 6
print(f"Generated {len(all_sequences)} labeled sequences")
print(f"Class distribution: {np.bincount(all_labels)}")
# Select prototype sequences (representative of each class)
prototype_indices = [0, 8, 16] # First sequence from each class
# Learn weights using decision trees
try:
weights, importances = dtw_weighted.compute_weights_using_dt(
all_sequences,
all_labels,
prototype_indices,
max_clfs=5,
min_ig=0.01
)
print(f"Learned weights shape: {weights.shape}")
print(f"Weight statistics: min={np.min(weights):.3f}, max={np.max(weights):.3f}, mean={np.mean(weights):.3f}")
# Visualize learned weights for prototype sequences
fig, axes = plt.subplots(3, 2, figsize=(14, 12))
class_names = ['Sine', 'Cosine', 'Linear']
for class_idx in range(3):
proto_seq = all_sequences[prototype_indices[class_idx]]
# Plot prototype sequence
axes[class_idx, 0].plot(proto_seq, 'b-', linewidth=2)
axes[class_idx, 0].set_title(f'{class_names[class_idx]} Class - Prototype Sequence')
axes[class_idx, 0].grid(True)
# Plot learned weights (assuming weights correspond to time points)
if weights.ndim > 1:
class_weights = weights[class_idx] if weights.shape[0] == 3 else weights[0]
else:
class_weights = weights
axes[class_idx, 1].plot(class_weights, 'r-', linewidth=2)
axes[class_idx, 1].set_title(f'{class_names[class_idx]} Class - Learned Weights')
axes[class_idx, 1].set_ylabel('Weight Importance')
axes[class_idx, 1].grid(True)
plt.tight_layout()
plt.show()
except Exception as e:
print(f"Weight learning failed: {e}")
print("Using uniform weights for demonstration")
weights = np.ones(len(all_sequences[0]))from dtaidistance import dtw_weighted
import numpy as np
# Create sequences with known constraint relationships
np.random.seed(42)
# Reference sequence
reference = np.sin(np.linspace(0, 4*np.pi, 60)) + 0.1 * np.random.randn(60)
# Sequence that should be similar (must-link)
similar_seq = reference + 0.2 * np.random.randn(60)
# Sequence that should be different (cannot-link)
different_seq = np.cos(np.linspace(0, 6*np.pi, 60)) + 0.1 * np.random.randn(60)
# Define must-link and cannot-link constraint values
# Higher values indicate stronger constraints
ml_values = np.zeros(len(reference))
cl_values = np.zeros(len(reference))
# Strong must-link constraints in middle region (these points should align)
ml_values[20:40] = 2.0
ml_values[28:32] = 5.0 # Very strong constraint
# Strong cannot-link constraints at the ends (these should not align)
cl_values[0:10] = 3.0
cl_values[50:60] = 3.0
# Compute weights from constraints
constraint_weights = dtw_weighted.compute_weights_from_mlclvalues(
reference,
ml_values,
cl_values,
only_max=False,
strict_cl=True
)
print(f"Constraint weights shape: {constraint_weights.shape}")
print(f"Weight range: [{np.min(constraint_weights):.3f}, {np.max(constraint_weights):.3f}]")
# Apply constraint-based weights to DTW computations
from dtaidistance import dtw
# Regular DTW distances
dist_ref_similar = dtw.distance(reference, similar_seq)
dist_ref_different = dtw.distance(reference, different_seq)
# Weighted DTW distances (if implementation supports it)
try:
weighted_dist_similar, _ = dtw_weighted.warping_paths(reference, similar_seq, weights=constraint_weights)
weighted_dist_different, _ = dtw_weighted.warping_paths(reference, different_seq, weights=constraint_weights)
print("\\nDistance Comparison:")
print(f"Reference vs Similar (regular): {dist_ref_similar:.3f}")
print(f"Reference vs Similar (weighted): {weighted_dist_similar:.3f}")
print(f"Reference vs Different (regular): {dist_ref_different:.3f}")
print(f"Reference vs Different (weighted): {weighted_dist_different:.3f}")
except Exception as e:
print(f"Weighted distance computation failed: {e}")
# Visualize constraints and weights
import matplotlib.pyplot as plt
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))
# Plot sequences
ax1.plot(reference, 'b-', label='Reference', linewidth=2)
ax1.plot(similar_seq, 'g--', label='Similar (Must-Link)', linewidth=2)
ax1.plot(different_seq, 'r:', label='Different (Cannot-Link)', linewidth=2)
ax1.set_title('Time Series with Constraint Relationships')
ax1.legend()
ax1.grid(True)
# Plot must-link constraints
ax2.fill_between(range(len(ml_values)), 0, ml_values, alpha=0.7, color='green')
ax2.set_title('Must-Link Constraints')
ax2.set_ylabel('Constraint Strength')
ax2.grid(True)
# Plot cannot-link constraints
ax3.fill_between(range(len(cl_values)), 0, cl_values, alpha=0.7, color='red')
ax3.set_title('Cannot-Link Constraints')
ax3.set_ylabel('Constraint Strength')
ax3.grid(True)
# Plot computed weights
ax4.plot(constraint_weights, 'purple', linewidth=2)
ax4.set_title('Computed Constraint Weights')
ax4.set_ylabel('Weight Value')
ax4.set_xlabel('Time Point')
ax4.grid(True)
plt.tight_layout()
plt.show()from dtaidistance.dtw_weighted import DecisionTreeClassifier, Tree
import numpy as np
# Generate training data with clear discriminative patterns
np.random.seed(42)
def create_discriminative_series(class_id, n_samples=15, length=50):
"""Create series with class-specific discriminative patterns."""
series_list = []
for i in range(n_samples):
t = np.linspace(0, 2*np.pi, length)
if class_id == 0:
# Class 0: Peak in first third
signal = 0.2 * np.random.randn(length)
peak_pos = length // 3 + np.random.randint(-3, 4)
signal[peak_pos] = 2.0 + 0.3 * np.random.randn()
elif class_id == 1:
# Class 1: Peak in middle third
signal = 0.2 * np.random.randn(length)
peak_pos = length // 2 + np.random.randint(-3, 4)
signal[peak_pos] = 2.0 + 0.3 * np.random.randn()
else:
# Class 2: Peak in last third
signal = 0.2 * np.random.randn(length)
peak_pos = 2 * length // 3 + np.random.randint(-3, 4)
signal[peak_pos] = 2.0 + 0.3 * np.random.randn()
series_list.append(signal)
return series_list
# Generate training data
class0_series = create_discriminative_series(0, n_samples=10)
class1_series = create_discriminative_series(1, n_samples=10)
class2_series = create_discriminative_series(2, n_samples=8)
all_training_series = class0_series + class1_series + class2_series
training_labels = [0] * 10 + [1] * 10 + [2] * 8
print(f"Training data: {len(all_training_series)} series")
print(f"Class distribution: {np.bincount(training_labels)}")
# Extract features for decision tree (simple: use sequence values as features)
feature_matrix = np.array(all_training_series)
print(f"Feature matrix shape: {feature_matrix.shape}")
# Train custom decision tree
dt_classifier = DecisionTreeClassifier()
try:
dt_classifier.fit(
feature_matrix,
training_labels,
use_feature_once=False, # Allow reusing time points
min_ig=0.1 # Require reasonable information gain
)
# Get classifier score
score = dt_classifier.score(max_kd=1.0)
print(f"Decision tree classifier score: {score:.3f}")
# Create and analyze tree structure
tree = Tree()
for i in range(5): # Add some nodes for demonstration
node_id = tree.add()
print(f"Added node {node_id}")
print(f"Tree statistics:")
print(f" Number of nodes: {tree.nb_nodes}")
print(f" Tree depth: {tree.depth}")
print(f" Used features: {len(tree.used_features)} out of {feature_matrix.shape[1]}")
except Exception as e:
print(f"Decision tree training failed: {e}")
# Visualize the discriminative patterns
import matplotlib.pyplot as plt
fig, axes = plt.subplots(3, 1, figsize=(12, 10))
class_names = ['Early Peak', 'Middle Peak', 'Late Peak']
class_data = [class0_series, class1_series, class2_series]
for class_idx, (class_series, class_name) in enumerate(zip(class_data, class_names)):
ax = axes[class_idx]
# Plot all series in the class
for i, series in enumerate(class_series[:5]): # Show first 5
ax.plot(series, alpha=0.6, linewidth=1)
# Plot class average
class_mean = np.mean(class_series, axis=0)
ax.plot(class_mean, 'k-', linewidth=3, label='Class Average')
ax.set_title(f'Class {class_idx}: {class_name}')
ax.legend()
ax.grid(True)
plt.tight_layout()
plt.show()This comprehensive weighted DTW module enables sophisticated customization of DTW distance computation through learned weights, constraint incorporation, and machine learning integration, making it possible to adapt DTW for domain-specific applications with prior knowledge or labeled training data.
Install with Tessl CLI
npx tessl i tessl/pypi-dtaidistance