docs
0
# Utility Functions and Tools
1
2
The `ot.utils` and `ot.datasets` modules provide essential utility functions and data generation tools that support optimal transport computations. These include distance calculations, distribution generators, timing functions, array manipulations, and synthetic datasets for testing and benchmarking.
3
4
## Timing Functions
5
6
```python { .api }
7
def ot.utils.tic():
8
"""
9
Start timer for performance measurement.
10
11
Initializes a global timer to measure elapsed time for code execution.
12
Use in combination with toc() or toq() for timing code blocks.
13
14
Example:
15
ot.tic()
16
# ... code to time ...
17
elapsed = ot.toq()
18
"""
19
20
def ot.utils.toc(message="Elapsed time : {} s"):
21
"""
22
End timer and print elapsed time with custom message.
23
24
Prints the elapsed time since the last tic() call with a customizable
25
message format.
26
27
Parameters:
28
- message: str, default="Elapsed time : {} s"
29
Format string for the elapsed time message. Should contain {} placeholder
30
for the time value.
31
32
Example:
33
ot.tic()
34
# ... computation ...
35
ot.toc("Computation took: {:.3f} seconds")
36
"""
37
38
def ot.utils.toq():
39
"""
40
End timer and return elapsed time without printing.
41
42
Returns the elapsed time since the last tic() call as a float value
43
without printing any message.
44
45
Returns:
46
- elapsed_time: float
47
Elapsed time in seconds.
48
49
Example:
50
ot.tic()
51
result = expensive_computation()
52
time_taken = ot.toq()
53
print(f"Computation took {time_taken:.2f} seconds")
54
"""
55
```
56
57
## Distribution Functions
58
59
```python { .api }
60
def ot.utils.unif(n, type_as=None):
61
"""
62
Generate uniform distribution over n points.
63
64
Creates a uniform probability distribution (histogram) with equal mass
65
on each of n support points.
66
67
Parameters:
68
- n: int
69
Number of points in the distribution.
70
- type_as: array-like, optional
71
Reference array to determine the output array type and backend.
72
If None, returns numpy array.
73
74
Returns:
75
- distribution: ndarray, shape (n,)
76
Uniform distribution with each entry equal to 1/n.
77
78
Example:
79
uniform_dist = ot.unif(5) # [0.2, 0.2, 0.2, 0.2, 0.2]
80
"""
81
82
def ot.utils.clean_zeros(a, b, M):
83
"""
84
Remove zero entries from distributions and corresponding cost matrix entries.
85
86
Filters out zero-weight points from source and target distributions
87
and removes corresponding rows/columns from the cost matrix to avoid
88
numerical issues and reduce computation.
89
90
Parameters:
91
- a: array-like, shape (n_source,)
92
Source distribution (may contain zeros).
93
- b: array-like, shape (n_target,)
94
Target distribution (may contain zeros).
95
- M: array-like, shape (n_source, n_target)
96
Cost matrix.
97
98
Returns:
99
- a_clean: ndarray
100
Source distribution with zeros removed.
101
- b_clean: ndarray
102
Target distribution with zeros removed.
103
- M_clean: ndarray
104
Cost matrix with corresponding rows/columns removed.
105
106
Example:
107
a = [0.5, 0.0, 0.5]
108
b = [0.3, 0.7]
109
M = [[1, 2], [3, 4], [5, 6]]
110
a_clean, b_clean, M_clean = ot.utils.clean_zeros(a, b, M)
111
# Returns: [0.5, 0.5], [0.3, 0.7], [[1, 2], [5, 6]]
112
"""
113
```
114
115
## Distance Functions
116
117
```python { .api }
118
def ot.utils.dist(x1, x2=None, metric='sqeuclidean'):
119
"""
120
Compute distance matrix between sample sets.
121
122
Computes pairwise distances between points in x1 and x2 using the
123
specified metric. This is the primary function for generating cost
124
matrices from sample coordinates.
125
126
Parameters:
127
- x1: array-like, shape (n1, d)
128
First set of samples (source points).
129
- x2: array-like, shape (n2, d), optional
130
Second set of samples (target points). If None, computes distances
131
within x1 (i.e., x2 = x1).
132
- metric: str, default='sqeuclidean'
133
Distance metric to use. Options include:
134
'sqeuclidean', 'euclidean', 'cityblock', 'cosine', 'correlation',
135
'hamming', 'jaccard', 'chebyshev', 'minkowski', 'mahalanobis'
136
137
Returns:
138
- distance_matrix: ndarray, shape (n1, n2)
139
Matrix of pairwise distances. Entry (i,j) is the distance between
140
x1[i] and x2[j].
141
142
Example:
143
X1 = np.array([[0, 0], [1, 1]])
144
X2 = np.array([[0, 1], [1, 0]])
145
M = ot.dist(X1, X2) # [[1, 1], [1, 1]]
146
"""
147
148
def ot.utils.euclidean_distances(X, Y, squared=False):
149
"""
150
Compute Euclidean distances between samples.
151
152
Efficient computation of Euclidean distances with option for squared distances.
153
154
Parameters:
155
- X: array-like, shape (n_samples_X, n_features)
156
First sample set.
157
- Y: array-like, shape (n_samples_Y, n_features)
158
Second sample set.
159
- squared: bool, default=False
160
If True, return squared Euclidean distances.
161
162
Returns:
163
- distances: ndarray, shape (n_samples_X, n_samples_Y)
164
Euclidean distance matrix.
165
"""
166
167
def ot.utils.dist0(n, method='lin_square'):
168
"""
169
Generate ground cost matrix for n points on a grid.
170
171
Creates standard cost matrices for points arranged on 1D or 2D grids,
172
commonly used for image processing and discrete optimal transport.
173
174
Parameters:
175
- n: int
176
Number of points (for 1D) or side length (for 2D grid).
177
- method: str, default='lin_square'
178
Grid arrangement and distance metric. Options:
179
'lin_square': 1D grid with squared distances
180
'lin': 1D grid with linear distances
181
'square': 2D square grid
182
183
Returns:
184
- cost_matrix: ndarray, shape (n, n) or (n*n, n*n)
185
Ground cost matrix for the specified grid arrangement.
186
187
Example:
188
M = ot.utils.dist0(3, method='lin_square')
189
# Returns 3x3 matrix with squared distances on 1D line
190
"""
191
```
192
193
## Projection Functions
194
195
```python { .api }
196
def ot.utils.proj_simplex(v, z=1):
197
"""
198
Projection onto the probability simplex.
199
200
Projects a vector onto the probability simplex: {x : x_i >= 0, sum(x) = z}.
201
Essential for many optimization algorithms in optimal transport.
202
203
Parameters:
204
- v: array-like, shape (n,)
205
Input vector to project.
206
- z: float, default=1
207
Sum constraint for the simplex.
208
209
Returns:
210
- projected_vector: ndarray, shape (n,)
211
Projection of v onto the simplex.
212
213
Example:
214
v = np.array([2.0, -1.0, 3.0])
215
p = ot.utils.proj_simplex(v) # Projects to valid probability distribution
216
"""
217
218
def ot.utils.projection_sparse_simplex(V, max_nz, z=1):
219
"""
220
Projection onto sparse simplex with cardinality constraint.
221
222
Projects onto the intersection of probability simplex and sparsity constraint
223
(at most max_nz non-zero entries).
224
225
Parameters:
226
- V: array-like, shape (n,)
227
Input vector.
228
- max_nz: int
229
Maximum number of non-zero entries.
230
- z: float, default=1
231
Sum constraint.
232
233
Returns:
234
- projected_vector: ndarray, shape (n,)
235
Sparse simplex projection.
236
"""
237
238
def ot.utils.proj_SDP(S, nx=None, vmin=0.0):
239
"""
240
Projection onto positive semidefinite cone.
241
242
Projects a symmetric matrix onto the cone of positive semidefinite matrices
243
by eigendecomposition and thresholding negative eigenvalues.
244
245
Parameters:
246
- S: array-like, shape (n, n)
247
Symmetric matrix to project.
248
- nx: backend, optional
249
Numerical backend to use.
250
- vmin: float, default=0.0
251
Minimum eigenvalue threshold.
252
253
Returns:
254
- S_projected: ndarray, shape (n, n)
255
Positive semidefinite projection of S.
256
"""
257
```
258
259
## Array Manipulation Functions
260
261
```python { .api }
262
def ot.utils.list_to_array(*lst, nx=None):
263
"""
264
Convert lists or mixed types to arrays with consistent backend.
265
266
Standardizes input data to arrays using the specified backend,
267
handling mixed input types and ensuring compatibility.
268
269
Parameters:
270
- lst: sequence of array-like objects
271
Input data to convert to arrays.
272
- nx: backend, optional
273
Target backend for conversion.
274
275
Returns:
276
- arrays: tuple of ndarrays
277
Converted arrays in the target backend format.
278
"""
279
280
def ot.utils.cost_normalization(C, norm=None, nx=None):
281
"""
282
Normalize cost matrix using various normalization schemes.
283
284
Applies normalization to cost matrices to improve numerical stability
285
and algorithm convergence.
286
287
Parameters:
288
- C: array-like, shape (n, m)
289
Cost matrix to normalize.
290
- norm: str, optional
291
Normalization method. Options: 'median', 'max', 'log', 'loglog'
292
- nx: backend, optional
293
Numerical backend.
294
295
Returns:
296
- C_normalized: ndarray
297
Normalized cost matrix.
298
"""
299
300
def ot.utils.dots(*args):
301
"""
302
Compute chained dot products efficiently.
303
304
Computes the dot product of multiple matrices in the optimal order
305
to minimize computational cost.
306
307
Parameters:
308
- args: sequence of arrays
309
Matrices to multiply in sequence.
310
311
Returns:
312
- result: ndarray
313
Result of chained matrix multiplication.
314
315
Example:
316
A, B, C = random_matrices()
317
result = ot.utils.dots(A, B, C) # Equivalent to A @ B @ C
318
"""
319
320
def ot.utils.is_all_finite(*args):
321
"""
322
Check if all elements in arrays are finite.
323
324
Validates that arrays contain only finite values (no NaN or infinity),
325
useful for debugging numerical issues.
326
327
Parameters:
328
- args: sequence of arrays
329
Arrays to check.
330
331
Returns:
332
- all_finite: bool
333
True if all elements in all arrays are finite.
334
"""
335
```
336
337
## Label Processing Functions
338
339
```python { .api }
340
def ot.utils.label_normalization(y, start=0, nx=None):
341
"""
342
Normalize label array to consecutive integers starting from specified value.
343
344
Converts arbitrary label values to normalized consecutive integers,
345
useful for domain adaptation and classification tasks.
346
347
Parameters:
348
- y: array-like, shape (n,)
349
Input labels (can be strings, integers, etc.).
350
- start: int, default=0
351
Starting value for normalized labels.
352
- nx: backend, optional
353
Numerical backend for array operations.
354
355
Returns:
356
- y_normalized: ndarray, shape (n,)
357
Normalized integer labels starting from 'start'.
358
- unique_labels: list
359
Original unique label values in order.
360
361
Example:
362
y = ['cat', 'dog', 'cat', 'bird']
363
y_norm, labels = ot.utils.label_normalization(y)
364
# y_norm: [0, 1, 0, 2], labels: ['cat', 'dog', 'bird']
365
"""
366
367
def ot.utils.labels_to_masks(y, type_as=None, nx=None):
368
"""
369
Convert label array to binary mask matrix.
370
371
Creates one-hot encoded masks from categorical labels, where each column
372
corresponds to one class.
373
374
Parameters:
375
- y: array-like, shape (n,)
376
Integer labels.
377
- type_as: array-like, optional
378
Reference array for output type.
379
- nx: backend, optional
380
Numerical backend.
381
382
Returns:
383
- masks: ndarray, shape (n, n_classes)
384
Binary mask matrix where masks[i, j] = 1 if y[i] == j.
385
386
Example:
387
y = [0, 1, 0, 2]
388
masks = ot.utils.labels_to_masks(y)
389
# masks: [[1, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1]]
390
"""
391
```
392
393
## Geometric and Kernel Functions
394
395
```python { .api }
396
def ot.utils.kernel(x1, x2, method='gaussian', sigma=1.0):
397
"""
398
Compute kernel matrix between sample sets.
399
400
Generates kernel matrices for various kernel functions, useful for
401
kernel-based optimal transport methods.
402
403
Parameters:
404
- x1: array-like, shape (n1, d)
405
First sample set.
406
- x2: array-like, shape (n2, d)
407
Second sample set.
408
- method: str, default='gaussian'
409
Kernel type. Options: 'gaussian', 'linear', 'polynomial'
410
- sigma: float, default=1.0
411
Kernel bandwidth parameter (for Gaussian kernel).
412
413
Returns:
414
- kernel_matrix: ndarray, shape (n1, n2)
415
Kernel values between samples.
416
"""
417
418
def ot.utils.laplacian(x):
419
"""
420
Compute graph Laplacian matrix.
421
422
Constructs the graph Laplacian for samples, used in graph-based
423
optimal transport and manifold learning.
424
425
Parameters:
426
- x: array-like, shape (n, d)
427
Sample coordinates.
428
429
Returns:
430
- laplacian: ndarray, shape (n, n)
431
Graph Laplacian matrix.
432
"""
433
434
def ot.utils.get_coordinate_circle(x):
435
"""
436
Get coordinates on unit circle for circular optimal transport.
437
438
Maps 1D coordinates to points on the unit circle, used for
439
circular/periodic optimal transport problems.
440
441
Parameters:
442
- x: array-like, shape (n,)
443
1D coordinates (angles).
444
445
Returns:
446
- circle_coords: ndarray, shape (n, 2)
447
2D coordinates on unit circle.
448
"""
449
```
450
451
## Parallel and Random Utilities
452
453
```python { .api }
454
def ot.utils.parmap(f, X, nprocs='default'):
455
"""
456
Parallel map function for multiprocessing.
457
458
Applies function f to elements of X in parallel using multiple processes.
459
460
Parameters:
461
- f: callable
462
Function to apply to each element.
463
- X: iterable
464
Input data to process.
465
- nprocs: int or 'default'
466
Number of processes. If 'default', uses all available cores.
467
468
Returns:
469
- results: list
470
Results of applying f to each element of X.
471
"""
472
473
def ot.utils.check_random_state(seed):
474
"""
475
Validate and convert random seed to RandomState object.
476
477
Ensures consistent random number generation across different input types.
478
479
Parameters:
480
- seed: int, RandomState, or None
481
Random seed specification.
482
483
Returns:
484
- random_state: numpy.random.RandomState
485
Validated random state object.
486
"""
487
488
def ot.utils.check_params(**kwargs):
489
"""
490
Validate function parameters and provide defaults.
491
492
Generic parameter validation utility for POT functions.
493
494
Parameters:
495
- kwargs: dict
496
Parameter dictionary to validate.
497
498
Returns:
499
- validated_params: dict
500
Validated parameters with defaults filled in.
501
"""
502
```
503
504
## Backend Utilities
505
506
```python { .api }
507
def ot.utils.reduce_lazytensor(a, func, dim=None, **kwargs):
508
"""
509
Reduce lazy tensor along specified dimensions.
510
511
Efficient reduction operations for lazy tensor backends like KeOps.
512
513
Parameters:
514
- a: LazyTensor
515
Input lazy tensor.
516
- func: str
517
Reduction function ('sum', 'max', 'min', etc.).
518
- dim: int, optional
519
Dimension along which to reduce.
520
- kwargs: dict
521
Additional arguments for reduction.
522
523
Returns:
524
- result: array
525
Result of reduction operation.
526
"""
527
528
def ot.utils.get_lowrank_lazytensor(Q, R, X, Y):
529
"""
530
Create low-rank lazy tensor representation.
531
532
Constructs efficient lazy tensor for low-rank matrix operations.
533
534
Parameters:
535
- Q: array-like
536
Left factor matrix.
537
- R: array-like
538
Right factor matrix.
539
- X: array-like
540
Source coordinates.
541
- Y: array-like
542
Target coordinates.
543
544
Returns:
545
- lazy_tensor: LazyTensor
546
Low-rank lazy tensor representation.
547
"""
548
549
def ot.utils.get_parameter_pair(parameter):
550
"""
551
Convert single parameter to parameter pair for source/target.
552
553
Utility for handling parameters that can be specified as single values
554
or pairs for source and target separately.
555
556
Parameters:
557
- parameter: float or tuple
558
Parameter value(s).
559
560
Returns:
561
- param_source: float
562
- param_target: float
563
"""
564
```
565
566
## Dataset Generation (`ot.datasets`)
567
568
```python { .api }
569
def ot.datasets.make_1D_gauss(n, m, s):
570
"""
571
Generate 1D Gaussian histogram.
572
573
Creates a discrete 1D Gaussian distribution on a regular grid.
574
575
Parameters:
576
- n: int
577
Number of bins/points in the histogram.
578
- m: float
579
Mean of the Gaussian distribution.
580
- s: float
581
Standard deviation of the Gaussian.
582
583
Returns:
584
- histogram: ndarray, shape (n,)
585
Normalized 1D Gaussian histogram.
586
- x: ndarray, shape (n,)
587
Bin centers (x-coordinates).
588
589
Example:
590
hist, x = ot.datasets.make_1D_gauss(100, 0.5, 0.1)
591
"""
592
593
def ot.datasets.make_2D_samples_gauss(n, m, sigma, random_state=None):
594
"""
595
Generate 2D Gaussian samples.
596
597
Creates n samples from a 2D Gaussian distribution with specified
598
mean and covariance matrix.
599
600
Parameters:
601
- n: int
602
Number of samples to generate.
603
- m: array-like, shape (2,)
604
Mean vector of the Gaussian.
605
- sigma: array-like, shape (2, 2)
606
Covariance matrix of the Gaussian.
607
- random_state: int, optional
608
Random seed for reproducibility.
609
610
Returns:
611
- samples: ndarray, shape (n, 2)
612
Generated 2D Gaussian samples.
613
614
Example:
615
mean = [0, 0]
616
cov = [[1, 0.5], [0.5, 1]]
617
X = ot.datasets.make_2D_samples_gauss(1000, mean, cov)
618
"""
619
620
def ot.datasets.make_data_classif(dataset, n, nz=0.5, theta=0, p=0.5, random_state=None, **kwargs):
621
"""
622
Generate classification datasets for domain adaptation.
623
624
Creates synthetic datasets commonly used for testing domain adaptation
625
algorithms with optimal transport.
626
627
Parameters:
628
- dataset: str
629
Dataset type. Options: 'gaussians', 'moons', 'circles'
630
- n: int
631
Number of samples per class.
632
- nz: float, default=0.5
633
Noise level.
634
- theta: float, default=0
635
Rotation angle for domain shift.
636
- p: float, default=0.5
637
Proportion parameter.
638
- random_state: int, optional
639
Random seed.
640
- kwargs: dict
641
Additional dataset-specific parameters.
642
643
Returns:
644
- X: ndarray, shape (n_total, n_features)
645
Sample coordinates.
646
- y: ndarray, shape (n_total,)
647
Class labels.
648
649
Example:
650
X, y = ot.datasets.make_data_classif('moons', 100, nz=0.1)
651
"""
652
```
653
654
## Usage Examples
655
656
### Basic Utility Usage
657
```python
658
import ot
659
import numpy as np
660
661
# Timing code execution
662
ot.tic()
663
result = np.linalg.eig(np.random.rand(1000, 1000))
664
elapsed = ot.toq()
665
print(f"Eigendecomposition took {elapsed:.3f} seconds")
666
667
# Generate uniform distribution
668
uniform_dist = ot.unif(10)
669
print("Uniform distribution:", uniform_dist)
670
671
# Compute distance matrix
672
X = np.random.rand(5, 2)
673
Y = np.random.rand(3, 2)
674
distances = ot.dist(X, Y)
675
print("Distance matrix shape:", distances.shape)
676
```
677
678
### Working with Labels
679
```python
680
# Label normalization
681
labels = ['cat', 'dog', 'cat', 'bird', 'dog']
682
normalized_labels, unique = ot.utils.label_normalization(labels)
683
print("Normalized labels:", normalized_labels)
684
print("Unique labels:", unique)
685
686
# Convert to masks
687
masks = ot.utils.labels_to_masks(normalized_labels)
688
print("One-hot masks shape:", masks.shape)
689
```
690
691
### Dataset Generation
692
```python
693
# 1D Gaussian histogram
694
hist, x = ot.datasets.make_1D_gauss(50, 0.3, 0.1)
695
print("1D histogram sum:", np.sum(hist))
696
697
# 2D Gaussian samples
698
mean = [1, -1]
699
cov = [[0.5, 0.2], [0.2, 0.8]]
700
samples = ot.datasets.make_2D_samples_gauss(200, mean, cov)
701
print("2D samples shape:", samples.shape)
702
703
# Classification dataset
704
X_moons, y_moons = ot.datasets.make_data_classif('moons', 100, nz=0.2)
705
print("Moons dataset:", X_moons.shape, "Classes:", np.unique(y_moons))
706
```
707
708
### Projections and Normalizations
709
```python
710
# Simplex projection
711
v = np.array([2.0, -1.0, 3.0, 0.5])
712
projected = ot.utils.proj_simplex(v)
713
print("Original vector:", v)
714
print("Projected (simplex):", projected)
715
print("Sum after projection:", np.sum(projected))
716
717
# Cost matrix normalization
718
C = np.random.rand(10, 10) * 100
719
C_normalized = ot.utils.cost_normalization(C, norm='median')
720
print("Original cost range:", [np.min(C), np.max(C)])
721
print("Normalized cost range:", [np.min(C_normalized), np.max(C_normalized)])
722
```
723
724
The utilities and datasets modules provide the foundational tools needed for most optimal transport applications, from basic array manipulations to specialized dataset generation for research and benchmarking.