0
# Weighted DTW and Machine Learning
1
2
Advanced DTW with custom weighting functions and machine learning integration for learning optimal feature weights from labeled data. This module enables domain-specific DTW customization through learned weights, decision tree-based feature importance, and constraint incorporation from must-link/cannot-link relationships.
3
4
## Capabilities
5
6
### Weighted DTW Core Functions
7
8
DTW computation with custom weighting functions that modify the local distance calculations based on learned or domain-specific importance patterns.
9
10
```python { .api }
11
def warping_paths(s1, s2, weights=None, window=None, **kwargs):
12
"""
13
DTW with custom weight functions.
14
15
Applies position-dependent or feature-dependent weights to modify
16
the local distance computation during DTW alignment.
17
18
Parameters:
19
- s1, s2: array-like, input sequences
20
- weights: array-like/function, weight values or weighting function
21
- window: int, warping window constraint
22
- **kwargs: additional DTW parameters
23
24
Returns:
25
tuple: (distance, paths_matrix)
26
- distance: float, weighted DTW distance
27
- paths_matrix: 2D array, accumulated cost matrix with weights applied
28
"""
29
30
def distance_matrix(s, weights, window=None, show_progress=False, **kwargs):
31
"""
32
Distance matrix computation with weights.
33
34
Computes pairwise weighted DTW distances between all sequences
35
in a collection using the specified weighting scheme.
36
37
Parameters:
38
- s: list/array, collection of sequences
39
- weights: array-like/function, weights to apply during distance computation
40
- window: int, warping window constraint
41
- show_progress: bool, display progress bar
42
- **kwargs: additional DTW parameters
43
44
Returns:
45
array: weighted distance matrix
46
"""
47
```
48
49
### Machine Learning Integration
50
51
Learn optimal weights from labeled time series data using decision tree algorithms specifically designed for temporal data analysis.
52
53
```python { .api }
54
def compute_weights_using_dt(series, labels, prototypeidx, **kwargs):
55
"""
56
Learn weights using decision trees.
57
58
Trains decision tree classifiers to identify discriminative time points
59
or features for distinguishing between different time series classes.
60
61
Parameters:
62
- series: list/array, collection of time series sequences
63
- labels: array-like, class labels for each sequence
64
- prototypeidx: int, index of prototype sequence for each class
65
- **kwargs: additional parameters for decision tree training
66
67
Returns:
68
tuple: (weights, importances)
69
- weights: array, learned importance weights for time points/features
70
- importances: array, feature importance scores from decision trees
71
"""
72
73
def series_to_dt(series, labels, prototypeidx, classifier=None, max_clfs=None,
74
min_ig=0, **kwargs):
75
"""
76
Convert time series to decision tree features.
77
78
Extracts features from time series data and prepares them for
79
decision tree classification, enabling weight learning.
80
81
Parameters:
82
- series: list/array, time series collection
83
- labels: array-like, class labels
84
- prototypeidx: int, prototype sequence indices
85
- classifier: classifier object, optional pre-configured classifier
86
- max_clfs: int, maximum number of classifiers to train
87
- min_ig: float, minimum information gain threshold
88
- **kwargs: additional feature extraction parameters
89
90
Returns:
91
tuple: (ml_values, cl_values, classifiers, importances)
92
- ml_values: array, must-link constraint values
93
- cl_values: array, cannot-link constraint values
94
- classifiers: list, trained decision tree classifiers
95
- importances: array, feature importance scores
96
"""
97
```
98
99
### Weight Computation from Constraints
100
101
Convert must-link and cannot-link constraints into weight values for DTW distance computation.
102
103
```python { .api }
104
def compute_weights_from_mlclvalues(serie, ml_values, cl_values, only_max=False,
105
strict_cl=True, **kwargs):
106
"""
107
Compute weights from must-link/cannot-link values.
108
109
Converts constraint information (which time points should be linked
110
vs separated) into weight values for biasing DTW computations.
111
112
Parameters:
113
- serie: array-like, reference time series sequence
114
- ml_values: array, must-link constraint strengths
115
- cl_values: array, cannot-link constraint strengths
116
- only_max: bool, use only maximum constraint values
117
- strict_cl: bool, apply cannot-link constraints strictly
118
- **kwargs: additional weight computation parameters
119
120
Returns:
121
array: computed weight values for DTW distance modification
122
"""
123
```
124
125
### Visualization Integration
126
127
Specialized plotting functions for visualizing learned weights and their effects on time series analysis.
128
129
```python { .api }
130
def plot_margins(serie, weights, filename=None, ax=None, origin=(0, 0),
131
scaling=(1, 1), y_limit=None, importances=None):
132
"""
133
Plot weight margins on time series.
134
135
Visualizes the learned or assigned weights overlaid on the time series,
136
showing which time points or regions are considered most important.
137
138
Parameters:
139
- serie: array-like, time series sequence to plot
140
- weights: array-like, weight values corresponding to time points
141
- filename: str, optional file path to save plot
142
- ax: matplotlib axis, optional axis for plotting
143
- origin: tuple, plot origin coordinates
144
- scaling: tuple, scaling factors for axes
145
- y_limit: tuple, y-axis limits
146
- importances: array, optional feature importance values
147
148
Returns:
149
tuple: (figure, axes) matplotlib objects
150
"""
151
```
152
153
### Decision Tree Classifier
154
155
Custom decision tree implementation optimized for time series weight learning with temporal-specific splitting criteria.
156
157
```python { .api }
158
class DecisionTreeClassifier:
159
"""
160
Custom decision tree for DTW weight learning.
161
162
Specialized decision tree that considers temporal relationships
163
and DTW-specific constraints when learning feature importance.
164
"""
165
166
def __init__(self):
167
"""Initialize decision tree classifier."""
168
169
def fit(self, features, targets, use_feature_once=True,
170
ignore_features=None, min_ig=0):
171
"""
172
Train decision tree classifier.
173
174
Parameters:
175
- features: array, feature matrix from time series
176
- targets: array, target labels for classification
177
- use_feature_once: bool, prevent reusing features in same path
178
- ignore_features: list, features to exclude from consideration
179
- min_ig: float, minimum information gain for splits
180
181
Returns:
182
self: fitted classifier
183
"""
184
185
def score(self, max_kd):
186
"""
187
Calculate classifier score.
188
189
Parameters:
190
- max_kd: float, maximum k-distance threshold
191
192
Returns:
193
float: classifier performance score
194
"""
195
196
@staticmethod
197
def entropy(targets):
198
"""
199
Calculate entropy of target distribution.
200
201
Parameters:
202
- targets: array, target labels
203
204
Returns:
205
float: entropy value
206
"""
207
208
@staticmethod
209
def informationgain_continuous(features, targets, threshold):
210
"""
211
Calculate information gain for continuous features.
212
213
Parameters:
214
- features: array, feature values
215
- targets: array, target labels
216
- threshold: float, split threshold
217
218
Returns:
219
float: information gain value
220
"""
221
222
@staticmethod
223
def kdistance(point1, point2):
224
"""
225
Calculate k-distance between points.
226
227
Parameters:
228
- point1, point2: array-like, data points
229
230
Returns:
231
float: k-distance value
232
"""
233
234
class Tree:
235
"""
236
Decision tree representation for weight learning.
237
238
Represents the structure of learned decision trees with
239
nodes, splits, and importance information.
240
"""
241
242
def add(self):
243
"""
244
Add new node to the tree.
245
246
Returns:
247
int: new node identifier
248
"""
249
250
@property
251
def nb_nodes(self):
252
"""
253
Get number of nodes in tree.
254
255
Returns:
256
int: node count
257
"""
258
259
@property
260
def used_features(self):
261
"""
262
Get set of features used in tree.
263
264
Returns:
265
set: feature indices used in decision tree
266
"""
267
268
@property
269
def depth(self):
270
"""
271
Get tree depth.
272
273
Returns:
274
int: maximum depth of decision tree
275
"""
276
```
277
278
## Usage Examples
279
280
### Basic Weighted DTW
281
282
```python
283
from dtaidistance import dtw_weighted
284
import numpy as np
285
import matplotlib.pyplot as plt
286
287
# Create time series with known important regions
288
np.random.seed(42)
289
t = np.linspace(0, 4*np.pi, 100)
290
291
# Base sequences
292
s1 = np.sin(t) + 0.1 * np.random.randn(100)
293
s2 = np.sin(t * 1.1) + 0.1 * np.random.randn(100)
294
295
# Define custom weights (higher weights = more important)
296
# Make the middle section more important
297
weights = np.ones(100)
298
weights[30:70] = 3.0 # Emphasize middle region
299
weights[45:55] = 5.0 # Highly emphasize center
300
301
# Compute weighted DTW
302
weighted_distance, weighted_paths = dtw_weighted.warping_paths(s1, s2, weights=weights)
303
304
# Compare with unweighted DTW
305
from dtaidistance import dtw
306
unweighted_distance, unweighted_paths = dtw.warping_paths(s1, s2)
307
308
print(f"Unweighted DTW distance: {unweighted_distance:.3f}")
309
print(f"Weighted DTW distance: {weighted_distance:.3f}")
310
311
# Visualize the effect of weighting
312
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(12, 10))
313
314
# Plot sequences with weights
315
ax1.plot(s1, 'b-', label='Sequence 1', linewidth=2)
316
ax1.plot(s2, 'r-', label='Sequence 2', linewidth=2)
317
ax1_twin = ax1.twinx()
318
ax1_twin.fill_between(range(len(weights)), 0, weights, alpha=0.3, color='green', label='Weights')
319
ax1.set_title('Time Series with Weight Distribution')
320
ax1.legend(loc='upper left')
321
ax1_twin.legend(loc='upper right')
322
ax1.grid(True)
323
324
# Plot unweighted warping paths
325
ax2.imshow(unweighted_paths, cmap='viridis', origin='lower')
326
ax2.set_title('Unweighted DTW Warping Paths')
327
ax2.set_xlabel('Sequence 2 Index')
328
ax2.set_ylabel('Sequence 1 Index')
329
330
# Plot weighted warping paths
331
ax3.imshow(weighted_paths, cmap='viridis', origin='lower')
332
ax3.set_title('Weighted DTW Warping Paths')
333
ax3.set_xlabel('Sequence 2 Index')
334
ax3.set_ylabel('Sequence 1 Index')
335
336
plt.tight_layout()
337
plt.show()
338
```
339
340
### Learning Weights from Labeled Data
341
342
```python
343
from dtaidistance import dtw_weighted
344
import numpy as np
345
import matplotlib.pyplot as plt
346
347
# Generate synthetic labeled time series data
348
np.random.seed(42)
349
350
def generate_class_data(class_type, n_samples=10, length=80):
351
"""Generate time series data for different classes."""
352
t = np.linspace(0, 4*np.pi, length)
353
sequences = []
354
355
for i in range(n_samples):
356
if class_type == 'sine':
357
# Sine waves with characteristic frequency
358
freq = 1.0 + 0.1 * np.random.randn()
359
signal = np.sin(freq * t) + 0.1 * np.random.randn(length)
360
# Add discriminative spike in middle region
361
spike_pos = length // 2 + np.random.randint(-5, 6)
362
signal[spike_pos] += 1.5
363
364
elif class_type == 'cosine':
365
# Cosine waves with characteristic frequency
366
freq = 1.2 + 0.1 * np.random.randn()
367
signal = np.cos(freq * t) + 0.1 * np.random.randn(length)
368
# Add discriminative dip in first quarter
369
dip_pos = length // 4 + np.random.randint(-5, 6)
370
signal[dip_pos] -= 1.0
371
372
elif class_type == 'linear':
373
# Linear trends with characteristic slope
374
slope = 0.5 + 0.2 * np.random.randn()
375
signal = slope * np.linspace(0, 1, length) + 0.1 * np.random.randn(length)
376
# Add discriminative oscillation in last quarter
377
osc_region = slice(3*length//4, length)
378
signal[osc_region] += 0.5 * np.sin(8 * t[osc_region])
379
380
sequences.append(signal)
381
382
return sequences
383
384
# Generate training data
385
class_sine = generate_class_data('sine', n_samples=8)
386
class_cosine = generate_class_data('cosine', n_samples=8)
387
class_linear = generate_class_data('linear', n_samples=6)
388
389
all_sequences = class_sine + class_cosine + class_linear
390
all_labels = [0] * 8 + [1] * 8 + [2] * 6
391
392
print(f"Generated {len(all_sequences)} labeled sequences")
393
print(f"Class distribution: {np.bincount(all_labels)}")
394
395
# Select prototype sequences (representative of each class)
396
prototype_indices = [0, 8, 16] # First sequence from each class
397
398
# Learn weights using decision trees
399
try:
400
weights, importances = dtw_weighted.compute_weights_using_dt(
401
all_sequences,
402
all_labels,
403
prototype_indices,
404
max_clfs=5,
405
min_ig=0.01
406
)
407
408
print(f"Learned weights shape: {weights.shape}")
409
print(f"Weight statistics: min={np.min(weights):.3f}, max={np.max(weights):.3f}, mean={np.mean(weights):.3f}")
410
411
# Visualize learned weights for prototype sequences
412
fig, axes = plt.subplots(3, 2, figsize=(14, 12))
413
414
class_names = ['Sine', 'Cosine', 'Linear']
415
for class_idx in range(3):
416
proto_seq = all_sequences[prototype_indices[class_idx]]
417
418
# Plot prototype sequence
419
axes[class_idx, 0].plot(proto_seq, 'b-', linewidth=2)
420
axes[class_idx, 0].set_title(f'{class_names[class_idx]} Class - Prototype Sequence')
421
axes[class_idx, 0].grid(True)
422
423
# Plot learned weights (assuming weights correspond to time points)
424
if weights.ndim > 1:
425
class_weights = weights[class_idx] if weights.shape[0] == 3 else weights[0]
426
else:
427
class_weights = weights
428
429
axes[class_idx, 1].plot(class_weights, 'r-', linewidth=2)
430
axes[class_idx, 1].set_title(f'{class_names[class_idx]} Class - Learned Weights')
431
axes[class_idx, 1].set_ylabel('Weight Importance')
432
axes[class_idx, 1].grid(True)
433
434
plt.tight_layout()
435
plt.show()
436
437
except Exception as e:
438
print(f"Weight learning failed: {e}")
439
print("Using uniform weights for demonstration")
440
weights = np.ones(len(all_sequences[0]))
441
```
442
443
### Must-Link/Cannot-Link Constraints
444
445
```python
446
from dtaidistance import dtw_weighted
447
import numpy as np
448
449
# Create sequences with known constraint relationships
450
np.random.seed(42)
451
452
# Reference sequence
453
reference = np.sin(np.linspace(0, 4*np.pi, 60)) + 0.1 * np.random.randn(60)
454
455
# Sequence that should be similar (must-link)
456
similar_seq = reference + 0.2 * np.random.randn(60)
457
458
# Sequence that should be different (cannot-link)
459
different_seq = np.cos(np.linspace(0, 6*np.pi, 60)) + 0.1 * np.random.randn(60)
460
461
# Define must-link and cannot-link constraint values
462
# Higher values indicate stronger constraints
463
ml_values = np.zeros(len(reference))
464
cl_values = np.zeros(len(reference))
465
466
# Strong must-link constraints in middle region (these points should align)
467
ml_values[20:40] = 2.0
468
ml_values[28:32] = 5.0 # Very strong constraint
469
470
# Strong cannot-link constraints at the ends (these should not align)
471
cl_values[0:10] = 3.0
472
cl_values[50:60] = 3.0
473
474
# Compute weights from constraints
475
constraint_weights = dtw_weighted.compute_weights_from_mlclvalues(
476
reference,
477
ml_values,
478
cl_values,
479
only_max=False,
480
strict_cl=True
481
)
482
483
print(f"Constraint weights shape: {constraint_weights.shape}")
484
print(f"Weight range: [{np.min(constraint_weights):.3f}, {np.max(constraint_weights):.3f}]")
485
486
# Apply constraint-based weights to DTW computations
487
from dtaidistance import dtw
488
489
# Regular DTW distances
490
dist_ref_similar = dtw.distance(reference, similar_seq)
491
dist_ref_different = dtw.distance(reference, different_seq)
492
493
# Weighted DTW distances (if implementation supports it)
494
try:
495
weighted_dist_similar, _ = dtw_weighted.warping_paths(reference, similar_seq, weights=constraint_weights)
496
weighted_dist_different, _ = dtw_weighted.warping_paths(reference, different_seq, weights=constraint_weights)
497
498
print("\\nDistance Comparison:")
499
print(f"Reference vs Similar (regular): {dist_ref_similar:.3f}")
500
print(f"Reference vs Similar (weighted): {weighted_dist_similar:.3f}")
501
print(f"Reference vs Different (regular): {dist_ref_different:.3f}")
502
print(f"Reference vs Different (weighted): {weighted_dist_different:.3f}")
503
504
except Exception as e:
505
print(f"Weighted distance computation failed: {e}")
506
507
# Visualize constraints and weights
508
import matplotlib.pyplot as plt
509
510
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))
511
512
# Plot sequences
513
ax1.plot(reference, 'b-', label='Reference', linewidth=2)
514
ax1.plot(similar_seq, 'g--', label='Similar (Must-Link)', linewidth=2)
515
ax1.plot(different_seq, 'r:', label='Different (Cannot-Link)', linewidth=2)
516
ax1.set_title('Time Series with Constraint Relationships')
517
ax1.legend()
518
ax1.grid(True)
519
520
# Plot must-link constraints
521
ax2.fill_between(range(len(ml_values)), 0, ml_values, alpha=0.7, color='green')
522
ax2.set_title('Must-Link Constraints')
523
ax2.set_ylabel('Constraint Strength')
524
ax2.grid(True)
525
526
# Plot cannot-link constraints
527
ax3.fill_between(range(len(cl_values)), 0, cl_values, alpha=0.7, color='red')
528
ax3.set_title('Cannot-Link Constraints')
529
ax3.set_ylabel('Constraint Strength')
530
ax3.grid(True)
531
532
# Plot computed weights
533
ax4.plot(constraint_weights, 'purple', linewidth=2)
534
ax4.set_title('Computed Constraint Weights')
535
ax4.set_ylabel('Weight Value')
536
ax4.set_xlabel('Time Point')
537
ax4.grid(True)
538
539
plt.tight_layout()
540
plt.show()
541
```
542
543
### Custom Decision Tree Weight Learning
544
545
```python
546
from dtaidistance.dtw_weighted import DecisionTreeClassifier, Tree
547
import numpy as np
548
549
# Generate training data with clear discriminative patterns
550
np.random.seed(42)
551
552
def create_discriminative_series(class_id, n_samples=15, length=50):
553
"""Create series with class-specific discriminative patterns."""
554
series_list = []
555
556
for i in range(n_samples):
557
t = np.linspace(0, 2*np.pi, length)
558
559
if class_id == 0:
560
# Class 0: Peak in first third
561
signal = 0.2 * np.random.randn(length)
562
peak_pos = length // 3 + np.random.randint(-3, 4)
563
signal[peak_pos] = 2.0 + 0.3 * np.random.randn()
564
565
elif class_id == 1:
566
# Class 1: Peak in middle third
567
signal = 0.2 * np.random.randn(length)
568
peak_pos = length // 2 + np.random.randint(-3, 4)
569
signal[peak_pos] = 2.0 + 0.3 * np.random.randn()
570
571
else:
572
# Class 2: Peak in last third
573
signal = 0.2 * np.random.randn(length)
574
peak_pos = 2 * length // 3 + np.random.randint(-3, 4)
575
signal[peak_pos] = 2.0 + 0.3 * np.random.randn()
576
577
series_list.append(signal)
578
579
return series_list
580
581
# Generate training data
582
class0_series = create_discriminative_series(0, n_samples=10)
583
class1_series = create_discriminative_series(1, n_samples=10)
584
class2_series = create_discriminative_series(2, n_samples=8)
585
586
all_training_series = class0_series + class1_series + class2_series
587
training_labels = [0] * 10 + [1] * 10 + [2] * 8
588
589
print(f"Training data: {len(all_training_series)} series")
590
print(f"Class distribution: {np.bincount(training_labels)}")
591
592
# Extract features for decision tree (simple: use sequence values as features)
593
feature_matrix = np.array(all_training_series)
594
print(f"Feature matrix shape: {feature_matrix.shape}")
595
596
# Train custom decision tree
597
dt_classifier = DecisionTreeClassifier()
598
599
try:
600
dt_classifier.fit(
601
feature_matrix,
602
training_labels,
603
use_feature_once=False, # Allow reusing time points
604
min_ig=0.1 # Require reasonable information gain
605
)
606
607
# Get classifier score
608
score = dt_classifier.score(max_kd=1.0)
609
print(f"Decision tree classifier score: {score:.3f}")
610
611
# Create and analyze tree structure
612
tree = Tree()
613
for i in range(5): # Add some nodes for demonstration
614
node_id = tree.add()
615
print(f"Added node {node_id}")
616
617
print(f"Tree statistics:")
618
print(f" Number of nodes: {tree.nb_nodes}")
619
print(f" Tree depth: {tree.depth}")
620
print(f" Used features: {len(tree.used_features)} out of {feature_matrix.shape[1]}")
621
622
except Exception as e:
623
print(f"Decision tree training failed: {e}")
624
625
# Visualize the discriminative patterns
626
import matplotlib.pyplot as plt
627
628
fig, axes = plt.subplots(3, 1, figsize=(12, 10))
629
630
class_names = ['Early Peak', 'Middle Peak', 'Late Peak']
631
class_data = [class0_series, class1_series, class2_series]
632
633
for class_idx, (class_series, class_name) in enumerate(zip(class_data, class_names)):
634
ax = axes[class_idx]
635
636
# Plot all series in the class
637
for i, series in enumerate(class_series[:5]): # Show first 5
638
ax.plot(series, alpha=0.6, linewidth=1)
639
640
# Plot class average
641
class_mean = np.mean(class_series, axis=0)
642
ax.plot(class_mean, 'k-', linewidth=3, label='Class Average')
643
644
ax.set_title(f'Class {class_idx}: {class_name}')
645
ax.legend()
646
ax.grid(True)
647
648
plt.tight_layout()
649
plt.show()
650
```
651
652
This comprehensive weighted DTW module enables sophisticated customization of DTW distance computation through learned weights, constraint incorporation, and machine learning integration, making it possible to adapt DTW for domain-specific applications with prior knowledge or labeled training data.