or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

alignment.mdclustering.mdcore-dtw.mddistance-matrices.mdindex.mdndim-dtw.mdvisualization.mdwarping-paths.mdweighted-dtw.md

distance-matrices.mddocs/

0

# Distance Matrices

1

2

Efficient computation of distance matrices for multiple time series, supporting parallel processing, memory optimization through blocking, and various output formats for large-scale time series analysis. Distance matrices are essential for clustering, similarity search, and comparative analysis of time series datasets.

3

4

## Capabilities

5

6

### Full Distance Matrix Computation

7

8

Compute pairwise DTW distances between all sequences in a collection, with support for both full matrices and condensed formats.

9

10

```python { .api }

11

def distance_matrix(s, max_dist=None, max_length_diff=None, window=None,

12

max_step=None, penalty=None, psi=None, block=None,

13

compact=False, parallel=False, use_c=False,

14

use_nogil=False, show_progress=False):

15

"""

16

Compute distance matrix for all sequence pairs in a collection.

17

18

Parameters:

19

- s: list/array/SeriesContainer, collection of time series sequences

20

- max_dist: float, early stopping threshold for individual distances

21

- max_length_diff: int, maximum length difference between sequences

22

- window: int, warping window constraint

23

- max_step: float, maximum step size constraint

24

- penalty: float, penalty for compression/expansion

25

- psi: int, psi relaxation parameter

26

- block: tuple, memory blocking configuration (start_idx, end_idx)

27

- compact: bool, return condensed array instead of full matrix

28

- parallel: bool, enable parallel computation

29

- use_c: bool, use C implementation

30

- use_nogil: bool, use GIL-free C implementation for better parallelization

31

- show_progress: bool, display progress bar

32

33

Returns:

34

array: distance matrix of shape (n, n) or condensed array of length n*(n-1)/2

35

"""

36

37

def distance_matrix_fast(s, max_dist=None, max_length_diff=None, window=None,

38

max_step=None, penalty=None, psi=None, block=None,

39

compact=False, parallel=True):

40

"""

41

Fast C version of distance matrix computation.

42

43

Automatically uses optimized C implementation with parallel processing

44

enabled by default for better performance on multi-core systems.

45

46

Parameters:

47

Same as distance_matrix() but with parallel=True by default.

48

49

Returns:

50

array: distance matrix or condensed array

51

"""

52

53

def distance_matrix_python(s, block=None, show_progress=False,

54

max_length_diff=None, dist_opts=None):

55

"""

56

Pure Python distance matrix implementation.

57

58

Provides a reference implementation that doesn't require C extensions,

59

useful for debugging or when C extensions are unavailable.

60

61

Parameters:

62

- s: list/array, collection of sequences

63

- block: tuple, memory blocking configuration

64

- show_progress: bool, display progress bar

65

- max_length_diff: int, maximum length difference

66

- dist_opts: dict, options passed to distance function

67

68

Returns:

69

array: condensed distance array

70

"""

71

```

72

73

### Distance Matrix Configuration

74

75

Get pre-configured distance matrix functions with specific settings for repeated use with consistent parameters.

76

77

```python { .api }

78

def distance_matrix_func(use_c=False, use_nogil=False, parallel=False,

79

show_progress=False):

80

"""

81

Return a configured distance matrix function.

82

83

Creates a partially configured distance matrix function with specified

84

performance and display options, useful for repeated computations.

85

86

Parameters:

87

- use_c: bool, use C implementation

88

- use_nogil: bool, use GIL-free implementation

89

- parallel: bool, enable parallel processing

90

- show_progress: bool, show progress bar

91

92

Returns:

93

function: configured distance matrix function

94

"""

95

```

96

97

### Matrix Format Conversion

98

99

Convert between different distance matrix representations for compatibility with various analysis tools.

100

101

```python { .api }

102

def distances_array_to_matrix(dists, nb_series, block=None):

103

"""

104

Convert condensed distance array to full symmetric matrix.

105

106

Transforms the condensed array format (n*(n-1)/2 elements) to a full

107

symmetric matrix format (n x n) with zeros on the diagonal.

108

109

Parameters:

110

- dists: array, condensed distance array

111

- nb_series: int, number of original sequences

112

- block: tuple, optional blocking for memory efficiency

113

114

Returns:

115

array: full symmetric distance matrix of shape (nb_series, nb_series)

116

"""

117

118

def distance_array_index(a, b, nb_series):

119

"""

120

Get index in condensed distance array for sequence pair (a, b).

121

122

Computes the position in the condensed array where the distance

123

between sequences a and b is stored.

124

125

Parameters:

126

- a, b: int, sequence indices (where a < b)

127

- nb_series: int, total number of sequences

128

129

Returns:

130

int: index in condensed distance array

131

"""

132

```

133

134

## Usage Examples

135

136

### Basic Distance Matrix

137

138

```python

139

from dtaidistance import dtw

140

import numpy as np

141

142

# Create a collection of time series

143

series = [

144

[1, 2, 3, 2, 1],

145

[0, 1, 2, 3, 2, 1, 0],

146

[1, 3, 2, 1],

147

[2, 1, 0, 1, 2, 3, 2]

148

]

149

150

# Compute full distance matrix

151

distances = dtw.distance_matrix(series)

152

print("Distance matrix shape:", distances.shape)

153

print("Distance matrix:")

154

print(distances)

155

156

# Compute condensed distance array (more memory efficient)

157

distances_condensed = dtw.distance_matrix(series, compact=True)

158

print("Condensed array length:", len(distances_condensed))

159

print("Condensed distances:", distances_condensed)

160

```

161

162

### High-Performance Distance Matrix

163

164

```python

165

from dtaidistance import dtw

166

import numpy as np

167

import time

168

169

# Generate larger dataset

170

np.random.seed(42)

171

series = [np.cumsum(np.random.randn(100)) for _ in range(50)]

172

173

# Compare different implementations

174

methods = [

175

("Python", lambda: dtw.distance_matrix_python(series)),

176

("C sequential", lambda: dtw.distance_matrix(series, use_c=True, parallel=False)),

177

("C parallel", lambda: dtw.distance_matrix_fast(series)),

178

("C nogil parallel", lambda: dtw.distance_matrix(series, use_c=True,

179

use_nogil=True, parallel=True))

180

]

181

182

for name, method in methods:

183

start = time.time()

184

result = method()

185

elapsed = time.time() - start

186

print(f"{name}: {elapsed:.2f}s, shape: {result.shape}")

187

```

188

189

### Memory-Efficient Blocking

190

191

```python

192

from dtaidistance import dtw

193

import numpy as np

194

195

# Large dataset that might not fit in memory

196

large_series = [np.random.randn(200) for _ in range(100)]

197

198

# Process in blocks to manage memory usage

199

block_size = 25

200

n_series = len(large_series)

201

202

# Compute distance matrix in blocks

203

distances = np.zeros((n_series, n_series))

204

205

for i in range(0, n_series, block_size):

206

for j in range(i, n_series, block_size):

207

end_i = min(i + block_size, n_series)

208

end_j = min(j + block_size, n_series)

209

210

# Extract block

211

block_series = large_series[i:end_i]

212

213

if i == j:

214

# Diagonal block

215

block_distances = dtw.distance_matrix(block_series, compact=False)

216

distances[i:end_i, i:end_i] = block_distances

217

else:

218

# Off-diagonal block - compute cross distances

219

for x in range(len(block_series)):

220

for y in range(j, end_j):

221

if y < len(large_series):

222

dist = dtw.distance(block_series[x], large_series[y])

223

distances[i + x, y] = dist

224

distances[y, i + x] = dist # Symmetric

225

226

print(f"Computed {n_series}x{n_series} distance matrix in blocks")

227

```

228

229

### Distance Matrix with Constraints

230

231

```python

232

from dtaidistance import dtw

233

234

# Time series with different characteristics

235

series = [

236

[1, 2, 3, 4, 5], # Increasing trend

237

[5, 4, 3, 2, 1], # Decreasing trend

238

[1, 3, 2, 4, 1], # Oscillating

239

[2, 2, 2, 2, 2], # Constant

240

[1, 2, 3, 4, 5, 6, 7, 8] # Longer increasing

241

]

242

243

# Distance matrix with various constraints

244

constraints = {

245

'window': 3, # Sakoe-Chiba band

246

'max_dist': 10.0, # Early stopping

247

'max_length_diff': 4, # Length difference limit

248

'penalty': 0.1 # Warping penalty

249

}

250

251

distances = dtw.distance_matrix(series, **constraints, show_progress=True)

252

253

print("Constrained distance matrix:")

254

print(distances)

255

256

# Find most similar pairs

257

n = len(series)

258

for i in range(n):

259

for j in range(i + 1, n):

260

print(f"Distance between series {i} and {j}: {distances[i, j]:.2f}")

261

```

262

263

### Integration with Clustering

264

265

```python

266

from dtaidistance import dtw, clustering

267

import numpy as np

268

269

# Generate sample time series data

270

np.random.seed(42)

271

n_series = 20

272

series_length = 50

273

274

# Create three clusters of similar time series

275

cluster1 = [np.sin(np.linspace(0, 4*np.pi, series_length)) +

276

0.1*np.random.randn(series_length) for _ in range(7)]

277

cluster2 = [np.cos(np.linspace(0, 3*np.pi, series_length)) +

278

0.1*np.random.randn(series_length) for _ in range(6)]

279

cluster3 = [np.linspace(0, 1, series_length) +

280

0.1*np.random.randn(series_length) for _ in range(7)]

281

282

all_series = cluster1 + cluster2 + cluster3

283

284

# Compute distance matrix

285

distances = dtw.distance_matrix(all_series, use_c=True, parallel=True)

286

287

# Perform hierarchical clustering

288

clusterer = clustering.Hierarchical(

289

dists_fun=dtw.distance_matrix_fast,

290

dists_options={},

291

max_dist=np.inf

292

)

293

294

cluster_tree = clusterer.fit(all_series)

295

print("Clustering completed")

296

print(f"Number of clusters found: {len(cluster_tree)}")

297

```

298

299

### Format Conversion Examples

300

301

```python

302

from dtaidistance import dtw

303

import numpy as np

304

305

series = [[1, 2, 3], [2, 3, 1], [3, 1, 2], [1, 3, 2]]

306

307

# Get condensed distance array

308

condensed = dtw.distance_matrix(series, compact=True)

309

print("Condensed array:", condensed)

310

311

# Convert to full matrix

312

full_matrix = dtw.distances_array_to_matrix(condensed, len(series))

313

print("Full matrix:")

314

print(full_matrix)

315

316

# Verify conversion by accessing specific distances

317

for i in range(len(series)):

318

for j in range(i + 1, len(series)):

319

condensed_idx = dtw.distance_array_index(i, j, len(series))

320

condensed_dist = condensed[condensed_idx]

321

matrix_dist = full_matrix[i, j]

322

print(f"Distance ({i},{j}): condensed={condensed_dist:.3f}, "

323

f"matrix={matrix_dist:.3f}, match={abs(condensed_dist - matrix_dist) < 1e-10}")

324

```

325

326

### Performance Optimization Strategies

327

328

```python

329

from dtaidistance import dtw

330

from dtaidistance.util import SeriesContainer

331

import numpy as np

332

333

# Optimize data format

334

raw_series = [list(np.random.randn(100)) for _ in range(30)]

335

336

# Use SeriesContainer for better performance

337

series_container = SeriesContainer(raw_series)

338

339

# Configure optimized distance matrix function

340

fast_distance_matrix = dtw.distance_matrix_func(

341

use_c=True,

342

use_nogil=True,

343

parallel=True,

344

show_progress=True

345

)

346

347

# Compute with optimization

348

distances = fast_distance_matrix(

349

series_container,

350

window=10, # Reasonable window constraint

351

max_dist=50.0, # Early stopping

352

compact=True # Memory efficient format

353

)

354

355

print(f"Computed {len(raw_series)}x{len(raw_series)} distances")

356

print(f"Result shape: {distances.shape if hasattr(distances, 'shape') else len(distances)}")

357

```

358

359

## Memory and Performance Considerations

360

361

### Memory Usage

362

- **Full matrix**: O(n²) memory for n sequences

363

- **Condensed array**: O(n²/2) memory, more efficient for large datasets

364

- **Blocking**: Process subsets to manage memory with very large datasets

365

366

### Computational Complexity

367

- **Time complexity**: O(n² × m²) where n is number of sequences, m is average sequence length

368

- **Parallel processing**: Near-linear speedup on multi-core systems with `parallel=True`

369

- **C extensions**: 10-100x speedup over pure Python implementation

370

371

### Optimization Guidelines

372

1. Use `compact=True` for memory efficiency

373

2. Enable `parallel=True` for multi-core systems

374

3. Set appropriate `window` constraints to reduce computation

375

4. Use `max_dist` for early stopping in similarity search

376

5. Consider blocking for very large datasets (>1000 sequences)