0
# dtaidistance
1
2
A comprehensive Python library for computing distance measures between time series, with Dynamic Time Warping (DTW) as the primary focus. It offers both pure Python and optimized C implementations for performance-critical applications, enabling researchers and developers to measure similarity between temporal sequences with various constraints and optimization options.
3
4
## Package Information
5
6
- **Package Name**: dtaidistance
7
- **Language**: Python
8
- **Installation**: `pip install dtaidistance`
9
- **License**: Apache License, Version 2.0
10
11
## Core Imports
12
13
```python
14
import dtaidistance
15
```
16
17
For DTW functionality:
18
19
```python
20
from dtaidistance import dtw
21
```
22
23
For fast C implementation (if available):
24
25
```python
26
from dtaidistance import dtw_c
27
```
28
29
## Basic Usage
30
31
```python
32
from dtaidistance import dtw
33
import numpy as np
34
35
# Create two time series
36
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
37
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
38
39
# Compute DTW distance
40
distance = dtw.distance(s1, s2)
41
print(f"DTW distance: {distance}")
42
43
# Compute distance with constraints
44
distance_constrained = dtw.distance(s1, s2, window=3, max_dist=5.0)
45
46
# Get the optimal warping path
47
path = dtw.warping_path(s1, s2)
48
print(f"Warping path: {path}")
49
50
# Compute distance matrix for multiple series
51
series = [[0, 0, 1, 2, 1, 0, 1, 0, 0],
52
[0, 1, 2, 0, 0, 0, 0, 0, 0],
53
[1, 2, 0, 0, 0, 0, 0, 1, 1]]
54
55
distances = dtw.distance_matrix(series)
56
print("Distance matrix shape:", distances.shape)
57
```
58
59
## Architecture
60
61
The dtaidistance library is organized around several key components:
62
63
- **Core DTW**: Main distance calculation and warping path algorithms
64
- **C Extensions**: High-performance implementations for CPU-intensive operations
65
- **Constraints**: Windowing, early stopping, and penalty systems for optimized computation
66
- **Visualization**: Tools for plotting warping paths, distance matrices, and time series relationships
67
- **Clustering**: Hierarchical clustering algorithms specifically designed for time series
68
- **Multi-dimensional Support**: DTW for sequences with multiple features per time point
69
70
## Capabilities
71
72
### Core DTW Distance Calculation
73
74
Fundamental DTW distance computation between time series pairs, including constraint-based optimizations, early stopping mechanisms, and both Python and C implementations for performance flexibility.
75
76
```python { .api }
77
def distance(s1, s2, window=None, max_dist=None, max_step=None,
78
max_length_diff=None, penalty=None, psi=None, use_c=False):
79
"""
80
Compute DTW distance between two sequences.
81
82
Parameters:
83
- s1, s2: array-like, input sequences
84
- window: int, warping window constraint
85
- max_dist: float, early stopping threshold
86
- max_step: float, maximum step size
87
- max_length_diff: int, maximum length difference
88
- penalty: float, penalty for compression/expansion
89
- psi: int, psi relaxation parameter
90
- use_c: bool, use C implementation
91
92
Returns:
93
float: DTW distance
94
"""
95
96
def distance_fast(s1, s2, window=None, max_dist=None, max_step=None,
97
max_length_diff=None, penalty=None, psi=None):
98
"""Fast C version of DTW distance calculation."""
99
100
def lb_keogh(s1, s2, window=None, max_dist=None, max_step=None,
101
max_length_diff=None):
102
"""Lower bound LB_KEOGH calculation."""
103
```
104
105
[Core DTW](./core-dtw.md)
106
107
### Warping Path Analysis
108
109
Computation and analysis of optimal warping paths between sequences, including path extraction, penalty calculations, and warping amount quantification for understanding sequence alignment patterns.
110
111
```python { .api }
112
def warping_paths(s1, s2, window=None, max_dist=None, max_step=None,
113
max_length_diff=None, penalty=None, psi=None):
114
"""
115
DTW with full warping paths matrix.
116
117
Returns:
118
tuple: (distance, paths_matrix)
119
"""
120
121
def warping_path(from_s, to_s, **kwargs):
122
"""Compute optimal warping path between sequences."""
123
124
def warp(from_s, to_s, **kwargs):
125
"""
126
Warp one sequence to match another.
127
128
Returns:
129
tuple: (warped_sequence, path)
130
"""
131
```
132
133
[Warping Paths](./warping-paths.md)
134
135
### Distance Matrix Operations
136
137
Efficient computation of distance matrices for multiple time series, supporting parallel processing, memory optimization through blocking, and various output formats for large-scale time series analysis.
138
139
```python { .api }
140
def distance_matrix(s, max_dist=None, max_length_diff=None, window=None,
141
max_step=None, penalty=None, psi=None, block=None,
142
compact=False, parallel=False, use_c=False,
143
use_nogil=False, show_progress=False):
144
"""
145
Compute distance matrix for all sequence pairs.
146
147
Parameters:
148
- s: list/array of sequences
149
- compact: bool, return condensed array if True
150
- parallel: bool, enable parallel computation
151
- show_progress: bool, show progress bar
152
153
Returns:
154
array: distance matrix or condensed array
155
"""
156
157
def distances_array_to_matrix(dists, nb_series, block=None):
158
"""Convert condensed distance array to full matrix."""
159
```
160
161
[Distance Matrices](./distance-matrices.md)
162
163
### Time Series Clustering
164
165
Hierarchical clustering algorithms specifically designed for time series data, including multiple clustering strategies, tree representations, and visualization capabilities for discovering patterns in temporal datasets.
166
167
```python { .api }
168
class Hierarchical:
169
"""Hierarchical clustering for time series."""
170
171
def __init__(self, dists_fun, dists_options, max_dist=np.inf,
172
merge_hook=None, order_hook=None, show_progress=True):
173
"""Initialize hierarchical clustering."""
174
175
def fit(self, series):
176
"""
177
Perform clustering.
178
179
Returns:
180
dict: cluster hierarchy
181
"""
182
183
class HierarchicalTree:
184
"""Hierarchical clustering with tree tracking."""
185
186
def plot(self, filename=None, axes=None, **kwargs):
187
"""Plot hierarchy and time series."""
188
```
189
190
[Clustering](./clustering.md)
191
192
### Visualization Tools
193
194
Comprehensive visualization capabilities for DTW analysis, including warping path plots, distance matrix heatmaps, and time series alignment visualizations for both 1D and multi-dimensional data.
195
196
```python { .api }
197
def plot_warping(s1, s2, path, filename=None):
198
"""
199
Plot optimal warping between sequences.
200
201
Returns:
202
tuple: (figure, axes)
203
"""
204
205
def plot_warpingpaths(s1, s2, paths, path=None, filename=None,
206
shownumbers=False):
207
"""Plot warping paths matrix with sequences."""
208
209
def plot_matrix(distances, filename=None, ax=None, shownumbers=False):
210
"""Plot distance matrix."""
211
```
212
213
[Visualization](./visualization.md)
214
215
### N-Dimensional DTW
216
217
DTW algorithms optimized for multi-dimensional time series where each time point contains multiple features, using Euclidean distance for point-wise comparisons and supporting the same constraint and optimization options as 1D DTW.
218
219
```python { .api }
220
def distance(s1, s2, window=None, max_dist=None, max_step=None,
221
max_length_diff=None, penalty=None, psi=None, use_c=False):
222
"""DTW for N-dimensional sequences using Euclidean distance."""
223
224
def distance_matrix(s, max_dist=None, max_length_diff=None, window=None,
225
max_step=None, penalty=None, psi=None, block=None,
226
parallel=False, use_c=False, show_progress=False):
227
"""Distance matrix for N-dimensional sequences."""
228
```
229
230
[N-Dimensional DTW](./ndim-dtw.md)
231
232
### Weighted DTW and Machine Learning
233
234
Advanced DTW with custom weighting functions and machine learning integration for learning optimal feature weights from labeled data, including decision tree-based weight learning and must-link/cannot-link constraint incorporation.
235
236
```python { .api }
237
def warping_paths(s1, s2, weights=None, window=None, **kwargs):
238
"""DTW with custom weight functions."""
239
240
def compute_weights_using_dt(series, labels, prototypeidx, **kwargs):
241
"""
242
Learn weights using decision trees.
243
244
Returns:
245
tuple: (weights, importances)
246
"""
247
248
class DecisionTreeClassifier:
249
"""Custom decision tree for DTW weight learning."""
250
251
def fit(self, features, targets, use_feature_once=True,
252
ignore_features=None, min_ig=0):
253
"""Train classifier."""
254
```
255
256
[Weighted DTW](./weighted-dtw.md)
257
258
### Sequence Alignment
259
260
Global sequence alignment algorithms like Needleman-Wunsch for optimal alignment of time series with gap penalties, providing alternative approaches to DTW for sequence comparison and alignment tasks.
261
262
```python { .api }
263
def needleman_wunsch(s1, s2, window=None, max_dist=None, max_step=None,
264
max_length_diff=None, psi=None):
265
"""
266
Global sequence alignment.
267
268
Returns:
269
tuple: (alignment_score, alignment_matrix)
270
"""
271
272
def best_alignment(paths, s1=None, s2=None, gap="-", order=None):
273
"""
274
Compute optimal alignment from paths matrix.
275
276
Returns:
277
tuple: (path, aligned_s1, aligned_s2)
278
"""
279
```
280
281
[Sequence Alignment](./alignment.md)
282
283
## Common Parameters
284
285
Most DTW functions share these common constraint parameters:
286
287
- **window**: Warping window constraint limiting how far warping can deviate from diagonal
288
- **max_dist**: Early stopping threshold to terminate computation when distance exceeds limit
289
- **max_step**: Maximum allowable step size in warping path
290
- **max_length_diff**: Maximum allowed difference in sequence lengths
291
- **penalty**: Penalty applied to compression/expansion operations in warping
292
- **psi**: Psi relaxation parameter for handling cyclical or periodic sequences
293
- **use_c**: Flag to enable optimized C implementation when available
294
- **parallel**: Enable parallel computation for distance matrices
295
- **show_progress**: Display progress bar for long-running computations
296
297
## Types
298
299
```python { .api }
300
class SeriesContainer:
301
"""Container for handling multiple sequence data formats."""
302
303
def __init__(self, series):
304
"""Initialize with various data types (list, numpy array, etc.)."""
305
306
def c_data(self):
307
"""Return C-compatible data structure."""
308
309
def get_max_y(self):
310
"""Get maximum Y value across all series."""
311
312
@staticmethod
313
def wrap(series):
314
"""Wrap series in container."""
315
```