Tessl Tile for pypi/dtaidistance@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

alignment.md clustering.md core-dtw.md distance-matrices.md index.md ndim-dtw.md visualization.md warping-paths.md weighted-dtw.md

distance-matrices.mddocs/

0
# Distance Matrices
1

2
Efficient computation of distance matrices for multiple time series, supporting parallel processing, memory optimization through blocking, and various output formats for large-scale time series analysis. Distance matrices are essential for clustering, similarity search, and comparative analysis of time series datasets.
3

4
## Capabilities
5

6
### Full Distance Matrix Computation
7

8
Compute pairwise DTW distances between all sequences in a collection, with support for both full matrices and condensed formats.
9

10
```python { .api }
11
def distance_matrix(s, max_dist=None, max_length_diff=None, window=None,
12
                    max_step=None, penalty=None, psi=None, block=None,
13
                    compact=False, parallel=False, use_c=False, 
14
                    use_nogil=False, show_progress=False):
15
    """
16
    Compute distance matrix for all sequence pairs in a collection.
17
    
18
    Parameters:
19
    - s: list/array/SeriesContainer, collection of time series sequences
20
    - max_dist: float, early stopping threshold for individual distances
21
    - max_length_diff: int, maximum length difference between sequences
22
    - window: int, warping window constraint
23
    - max_step: float, maximum step size constraint  
24
    - penalty: float, penalty for compression/expansion
25
    - psi: int, psi relaxation parameter
26
    - block: tuple, memory blocking configuration (start_idx, end_idx)
27
    - compact: bool, return condensed array instead of full matrix
28
    - parallel: bool, enable parallel computation
29
    - use_c: bool, use C implementation
30
    - use_nogil: bool, use GIL-free C implementation for better parallelization
31
    - show_progress: bool, display progress bar
32
    
33
    Returns:
34
    array: distance matrix of shape (n, n) or condensed array of length n*(n-1)/2
35
    """
36

37
def distance_matrix_fast(s, max_dist=None, max_length_diff=None, window=None,
38
                         max_step=None, penalty=None, psi=None, block=None,
39
                         compact=False, parallel=True):
40
    """
41
    Fast C version of distance matrix computation.
42
    
43
    Automatically uses optimized C implementation with parallel processing
44
    enabled by default for better performance on multi-core systems.
45
    
46
    Parameters:
47
    Same as distance_matrix() but with parallel=True by default.
48
    
49
    Returns:
50
    array: distance matrix or condensed array
51
    """
52

53
def distance_matrix_python(s, block=None, show_progress=False, 
54
                           max_length_diff=None, dist_opts=None):
55
    """
56
    Pure Python distance matrix implementation.
57
    
58
    Provides a reference implementation that doesn't require C extensions,
59
    useful for debugging or when C extensions are unavailable.
60
    
61
    Parameters:
62
    - s: list/array, collection of sequences
63
    - block: tuple, memory blocking configuration
64
    - show_progress: bool, display progress bar
65
    - max_length_diff: int, maximum length difference
66
    - dist_opts: dict, options passed to distance function
67
    
68
    Returns:
69
    array: condensed distance array
70
    """
71
```
72

73
### Distance Matrix Configuration
74

75
Get pre-configured distance matrix functions with specific settings for repeated use with consistent parameters.
76

77
```python { .api }
78
def distance_matrix_func(use_c=False, use_nogil=False, parallel=False, 
79
                         show_progress=False):
80
    """
81
    Return a configured distance matrix function.
82
    
83
    Creates a partially configured distance matrix function with specified
84
    performance and display options, useful for repeated computations.
85
    
86
    Parameters:
87
    - use_c: bool, use C implementation
88
    - use_nogil: bool, use GIL-free implementation
89
    - parallel: bool, enable parallel processing
90
    - show_progress: bool, show progress bar
91
    
92
    Returns:
93
    function: configured distance matrix function
94
    """
95
```
96

97
### Matrix Format Conversion
98

99
Convert between different distance matrix representations for compatibility with various analysis tools.
100

101
```python { .api }
102
def distances_array_to_matrix(dists, nb_series, block=None):
103
    """
104
    Convert condensed distance array to full symmetric matrix.
105
    
106
    Transforms the condensed array format (n*(n-1)/2 elements) to a full
107
    symmetric matrix format (n x n) with zeros on the diagonal.
108
    
109
    Parameters:
110
    - dists: array, condensed distance array
111
    - nb_series: int, number of original sequences
112
    - block: tuple, optional blocking for memory efficiency
113
    
114
    Returns:
115
    array: full symmetric distance matrix of shape (nb_series, nb_series)
116
    """
117

118
def distance_array_index(a, b, nb_series):
119
    """
120
    Get index in condensed distance array for sequence pair (a, b).
121
    
122
    Computes the position in the condensed array where the distance
123
    between sequences a and b is stored.
124
    
125
    Parameters:
126
    - a, b: int, sequence indices (where a < b)
127
    - nb_series: int, total number of sequences
128
    
129
    Returns:
130
    int: index in condensed distance array
131
    """
132
```
133

134
## Usage Examples
135

136
### Basic Distance Matrix
137

138
```python
139
from dtaidistance import dtw
140
import numpy as np
141

142
# Create a collection of time series
143
series = [
144
    [1, 2, 3, 2, 1],
145
    [0, 1, 2, 3, 2, 1, 0],
146
    [1, 3, 2, 1],
147
    [2, 1, 0, 1, 2, 3, 2]
148
]
149

150
# Compute full distance matrix
151
distances = dtw.distance_matrix(series)
152
print("Distance matrix shape:", distances.shape)
153
print("Distance matrix:")
154
print(distances)
155

156
# Compute condensed distance array (more memory efficient)
157
distances_condensed = dtw.distance_matrix(series, compact=True)
158
print("Condensed array length:", len(distances_condensed))
159
print("Condensed distances:", distances_condensed)
160
```
161

162
### High-Performance Distance Matrix
163

164
```python
165
from dtaidistance import dtw
166
import numpy as np
167
import time
168

169
# Generate larger dataset
170
np.random.seed(42)
171
series = [np.cumsum(np.random.randn(100)) for _ in range(50)]
172

173
# Compare different implementations
174
methods = [
175
    ("Python", lambda: dtw.distance_matrix_python(series)),
176
    ("C sequential", lambda: dtw.distance_matrix(series, use_c=True, parallel=False)),
177
    ("C parallel", lambda: dtw.distance_matrix_fast(series)),
178
    ("C nogil parallel", lambda: dtw.distance_matrix(series, use_c=True, 
179
                                                     use_nogil=True, parallel=True))
180
]
181

182
for name, method in methods:
183
    start = time.time()
184
    result = method()
185
    elapsed = time.time() - start
186
    print(f"{name}: {elapsed:.2f}s, shape: {result.shape}")
187
```
188

189
### Memory-Efficient Blocking
190

191
```python
192
from dtaidistance import dtw
193
import numpy as np
194

195
# Large dataset that might not fit in memory
196
large_series = [np.random.randn(200) for _ in range(100)]
197

198
# Process in blocks to manage memory usage
199
block_size = 25
200
n_series = len(large_series)
201

202
# Compute distance matrix in blocks
203
distances = np.zeros((n_series, n_series))
204

205
for i in range(0, n_series, block_size):
206
    for j in range(i, n_series, block_size):
207
        end_i = min(i + block_size, n_series)
208
        end_j = min(j + block_size, n_series)
209
        
210
        # Extract block
211
        block_series = large_series[i:end_i]
212
        
213
        if i == j:
214
            # Diagonal block
215
            block_distances = dtw.distance_matrix(block_series, compact=False)
216
            distances[i:end_i, i:end_i] = block_distances
217
        else:
218
            # Off-diagonal block - compute cross distances
219
            for x in range(len(block_series)):
220
                for y in range(j, end_j):
221
                    if y < len(large_series):
222
                        dist = dtw.distance(block_series[x], large_series[y])
223
                        distances[i + x, y] = dist
224
                        distances[y, i + x] = dist  # Symmetric
225

226
print(f"Computed {n_series}x{n_series} distance matrix in blocks")
227
```
228

229
### Distance Matrix with Constraints
230

231
```python
232
from dtaidistance import dtw
233

234
# Time series with different characteristics
235
series = [
236
    [1, 2, 3, 4, 5],           # Increasing trend
237
    [5, 4, 3, 2, 1],           # Decreasing trend  
238
    [1, 3, 2, 4, 1],           # Oscillating
239
    [2, 2, 2, 2, 2],           # Constant
240
    [1, 2, 3, 4, 5, 6, 7, 8]   # Longer increasing
241
]
242

243
# Distance matrix with various constraints
244
constraints = {
245
    'window': 3,                # Sakoe-Chiba band
246
    'max_dist': 10.0,          # Early stopping
247
    'max_length_diff': 4,      # Length difference limit
248
    'penalty': 0.1             # Warping penalty
249
}
250

251
distances = dtw.distance_matrix(series, **constraints, show_progress=True)
252

253
print("Constrained distance matrix:")
254
print(distances)
255

256
# Find most similar pairs
257
n = len(series)
258
for i in range(n):
259
    for j in range(i + 1, n):
260
        print(f"Distance between series {i} and {j}: {distances[i, j]:.2f}")
261
```
262

263
### Integration with Clustering
264

265
```python
266
from dtaidistance import dtw, clustering
267
import numpy as np
268

269
# Generate sample time series data
270
np.random.seed(42)
271
n_series = 20
272
series_length = 50
273

274
# Create three clusters of similar time series
275
cluster1 = [np.sin(np.linspace(0, 4*np.pi, series_length)) + 
276
           0.1*np.random.randn(series_length) for _ in range(7)]
277
cluster2 = [np.cos(np.linspace(0, 3*np.pi, series_length)) + 
278
           0.1*np.random.randn(series_length) for _ in range(6)]
279
cluster3 = [np.linspace(0, 1, series_length) + 
280
           0.1*np.random.randn(series_length) for _ in range(7)]
281

282
all_series = cluster1 + cluster2 + cluster3
283

284
# Compute distance matrix
285
distances = dtw.distance_matrix(all_series, use_c=True, parallel=True)
286

287
# Perform hierarchical clustering
288
clusterer = clustering.Hierarchical(
289
    dists_fun=dtw.distance_matrix_fast,
290
    dists_options={},
291
    max_dist=np.inf
292
)
293

294
cluster_tree = clusterer.fit(all_series)
295
print("Clustering completed")
296
print(f"Number of clusters found: {len(cluster_tree)}")
297
```
298

299
### Format Conversion Examples
300

301
```python
302
from dtaidistance import dtw
303
import numpy as np
304

305
series = [[1, 2, 3], [2, 3, 1], [3, 1, 2], [1, 3, 2]]
306

307
# Get condensed distance array
308
condensed = dtw.distance_matrix(series, compact=True)
309
print("Condensed array:", condensed)
310

311
# Convert to full matrix
312
full_matrix = dtw.distances_array_to_matrix(condensed, len(series))
313
print("Full matrix:")
314
print(full_matrix)
315

316
# Verify conversion by accessing specific distances
317
for i in range(len(series)):
318
    for j in range(i + 1, len(series)):
319
        condensed_idx = dtw.distance_array_index(i, j, len(series))
320
        condensed_dist = condensed[condensed_idx]
321
        matrix_dist = full_matrix[i, j]
322
        print(f"Distance ({i},{j}): condensed={condensed_dist:.3f}, "
323
              f"matrix={matrix_dist:.3f}, match={abs(condensed_dist - matrix_dist) < 1e-10}")
324
```
325

326
### Performance Optimization Strategies
327

328
```python
329
from dtaidistance import dtw
330
from dtaidistance.util import SeriesContainer
331
import numpy as np
332

333
# Optimize data format
334
raw_series = [list(np.random.randn(100)) for _ in range(30)]
335

336
# Use SeriesContainer for better performance
337
series_container = SeriesContainer(raw_series)
338

339
# Configure optimized distance matrix function
340
fast_distance_matrix = dtw.distance_matrix_func(
341
    use_c=True,
342
    use_nogil=True,
343
    parallel=True,
344
    show_progress=True
345
)
346

347
# Compute with optimization
348
distances = fast_distance_matrix(
349
    series_container,
350
    window=10,           # Reasonable window constraint
351
    max_dist=50.0,      # Early stopping
352
    compact=True        # Memory efficient format
353
)
354

355
print(f"Computed {len(raw_series)}x{len(raw_series)} distances")
356
print(f"Result shape: {distances.shape if hasattr(distances, 'shape') else len(distances)}")
357
```
358

359
## Memory and Performance Considerations
360

361
### Memory Usage
362
- **Full matrix**: O(n²) memory for n sequences
363
- **Condensed array**: O(n²/2) memory, more efficient for large datasets  
364
- **Blocking**: Process subsets to manage memory with very large datasets
365

366
### Computational Complexity
367
- **Time complexity**: O(n² × m²) where n is number of sequences, m is average sequence length
368
- **Parallel processing**: Near-linear speedup on multi-core systems with `parallel=True`
369
- **C extensions**: 10-100x speedup over pure Python implementation
370

371
### Optimization Guidelines
372
1. Use `compact=True` for memory efficiency
373
2. Enable `parallel=True` for multi-core systems
374
3. Set appropriate `window` constraints to reduce computation
375
4. Use `max_dist` for early stopping in similarity search
376
5. Consider blocking for very large datasets (>1000 sequences)

Version

Tile

Files

distance-matrices.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

distance-matrices.mddocs/