Tessl Tile for pypi/pot@0.9.0

or run

npx @tessl/cli init

utilities.mddocs/

0
# Utility Functions and Tools
1

2
The `ot.utils` and `ot.datasets` modules provide essential utility functions and data generation tools that support optimal transport computations. These include distance calculations, distribution generators, timing functions, array manipulations, and synthetic datasets for testing and benchmarking.
3

4
## Timing Functions
5

6
```python { .api }
7
def ot.utils.tic():
8
    """
9
    Start timer for performance measurement.
10
    
11
    Initializes a global timer to measure elapsed time for code execution.
12
    Use in combination with toc() or toq() for timing code blocks.
13
    
14
    Example:
15
        ot.tic()
16
        # ... code to time ...
17
        elapsed = ot.toq()
18
    """
19

20
def ot.utils.toc(message="Elapsed time : {} s"):
21
    """
22
    End timer and print elapsed time with custom message.
23
    
24
    Prints the elapsed time since the last tic() call with a customizable
25
    message format.
26
    
27
    Parameters:
28
    - message: str, default="Elapsed time : {} s"
29
         Format string for the elapsed time message. Should contain {} placeholder
30
         for the time value.
31
    
32
    Example:
33
        ot.tic()
34
        # ... computation ...
35
        ot.toc("Computation took: {:.3f} seconds")
36
    """
37

38
def ot.utils.toq():
39
    """
40
    End timer and return elapsed time without printing.
41
    
42
    Returns the elapsed time since the last tic() call as a float value
43
    without printing any message.
44
    
45
    Returns:
46
    - elapsed_time: float
47
         Elapsed time in seconds.
48
    
49
    Example:
50
        ot.tic()
51
        result = expensive_computation()
52
        time_taken = ot.toq()
53
        print(f"Computation took {time_taken:.2f} seconds")
54
    """
55
```
56

57
## Distribution Functions
58

59
```python { .api }
60
def ot.utils.unif(n, type_as=None):
61
    """
62
    Generate uniform distribution over n points.
63
    
64
    Creates a uniform probability distribution (histogram) with equal mass
65
    on each of n support points.
66
    
67
    Parameters:
68
    - n: int
69
         Number of points in the distribution.
70
    - type_as: array-like, optional
71
         Reference array to determine the output array type and backend.
72
         If None, returns numpy array.
73
    
74
    Returns:
75
    - distribution: ndarray, shape (n,)
76
         Uniform distribution with each entry equal to 1/n.
77
    
78
    Example:
79
        uniform_dist = ot.unif(5)  # [0.2, 0.2, 0.2, 0.2, 0.2]
80
    """
81

82
def ot.utils.clean_zeros(a, b, M):
83
    """
84
    Remove zero entries from distributions and corresponding cost matrix entries.
85
    
86
    Filters out zero-weight points from source and target distributions
87
    and removes corresponding rows/columns from the cost matrix to avoid
88
    numerical issues and reduce computation.
89
    
90
    Parameters:
91
    - a: array-like, shape (n_source,)
92
         Source distribution (may contain zeros).
93
    - b: array-like, shape (n_target,)
94
         Target distribution (may contain zeros).
95
    - M: array-like, shape (n_source, n_target)
96
         Cost matrix.
97
    
98
    Returns:
99
    - a_clean: ndarray
100
         Source distribution with zeros removed.
101
    - b_clean: ndarray
102
         Target distribution with zeros removed.
103
    - M_clean: ndarray
104
         Cost matrix with corresponding rows/columns removed.
105
    
106
    Example:
107
        a = [0.5, 0.0, 0.5]
108
        b = [0.3, 0.7]
109
        M = [[1, 2], [3, 4], [5, 6]]
110
        a_clean, b_clean, M_clean = ot.utils.clean_zeros(a, b, M)
111
        # Returns: [0.5, 0.5], [0.3, 0.7], [[1, 2], [5, 6]]
112
    """
113
```
114

115
## Distance Functions
116

117
```python { .api }
118
def ot.utils.dist(x1, x2=None, metric='sqeuclidean'):
119
    """
120
    Compute distance matrix between sample sets.
121
    
122
    Computes pairwise distances between points in x1 and x2 using the
123
    specified metric. This is the primary function for generating cost
124
    matrices from sample coordinates.
125
    
126
    Parameters:
127
    - x1: array-like, shape (n1, d)
128
         First set of samples (source points).
129
    - x2: array-like, shape (n2, d), optional
130
         Second set of samples (target points). If None, computes distances
131
         within x1 (i.e., x2 = x1).
132
    - metric: str, default='sqeuclidean'
133
         Distance metric to use. Options include:
134
         'sqeuclidean', 'euclidean', 'cityblock', 'cosine', 'correlation',
135
         'hamming', 'jaccard', 'chebyshev', 'minkowski', 'mahalanobis'
136
    
137
    Returns:
138
    - distance_matrix: ndarray, shape (n1, n2)
139
         Matrix of pairwise distances. Entry (i,j) is the distance between
140
         x1[i] and x2[j].
141
    
142
    Example:
143
        X1 = np.array([[0, 0], [1, 1]])
144
        X2 = np.array([[0, 1], [1, 0]])
145
        M = ot.dist(X1, X2)  # [[1, 1], [1, 1]]
146
    """
147

148
def ot.utils.euclidean_distances(X, Y, squared=False):
149
    """
150
    Compute Euclidean distances between samples.
151
    
152
    Efficient computation of Euclidean distances with option for squared distances.
153
    
154
    Parameters:
155
    - X: array-like, shape (n_samples_X, n_features)
156
         First sample set.
157
    - Y: array-like, shape (n_samples_Y, n_features)
158
         Second sample set.
159
    - squared: bool, default=False
160
         If True, return squared Euclidean distances.
161
    
162
    Returns:
163
    - distances: ndarray, shape (n_samples_X, n_samples_Y)
164
         Euclidean distance matrix.
165
    """
166

167
def ot.utils.dist0(n, method='lin_square'):
168
    """
169
    Generate ground cost matrix for n points on a grid.
170
    
171
    Creates standard cost matrices for points arranged on 1D or 2D grids,
172
    commonly used for image processing and discrete optimal transport.
173
    
174
    Parameters:
175
    - n: int
176
         Number of points (for 1D) or side length (for 2D grid).
177
    - method: str, default='lin_square'
178
         Grid arrangement and distance metric. Options:
179
         'lin_square': 1D grid with squared distances
180
         'lin': 1D grid with linear distances
181
         'square': 2D square grid
182
    
183
    Returns:
184
    - cost_matrix: ndarray, shape (n, n) or (n*n, n*n)
185
         Ground cost matrix for the specified grid arrangement.
186
    
187
    Example:
188
        M = ot.utils.dist0(3, method='lin_square')
189
        # Returns 3x3 matrix with squared distances on 1D line
190
    """
191
```
192

193
## Projection Functions
194

195
```python { .api }
196
def ot.utils.proj_simplex(v, z=1):
197
    """
198
    Projection onto the probability simplex.
199
    
200
    Projects a vector onto the probability simplex: {x : x_i >= 0, sum(x) = z}.
201
    Essential for many optimization algorithms in optimal transport.
202
    
203
    Parameters:
204
    - v: array-like, shape (n,)
205
         Input vector to project.
206
    - z: float, default=1
207
         Sum constraint for the simplex.
208
    
209
    Returns:
210
    - projected_vector: ndarray, shape (n,)
211
         Projection of v onto the simplex.
212
    
213
    Example:
214
        v = np.array([2.0, -1.0, 3.0])
215
        p = ot.utils.proj_simplex(v)  # Projects to valid probability distribution
216
    """
217

218
def ot.utils.projection_sparse_simplex(V, max_nz, z=1):
219
    """
220
    Projection onto sparse simplex with cardinality constraint.
221
    
222
    Projects onto the intersection of probability simplex and sparsity constraint
223
    (at most max_nz non-zero entries).
224
    
225
    Parameters:
226
    - V: array-like, shape (n,)
227
         Input vector.
228
    - max_nz: int
229
         Maximum number of non-zero entries.
230
    - z: float, default=1
231
         Sum constraint.
232
    
233
    Returns:
234
    - projected_vector: ndarray, shape (n,)
235
         Sparse simplex projection.
236
    """
237

238
def ot.utils.proj_SDP(S, nx=None, vmin=0.0):
239
    """
240
    Projection onto positive semidefinite cone.
241
    
242
    Projects a symmetric matrix onto the cone of positive semidefinite matrices
243
    by eigendecomposition and thresholding negative eigenvalues.
244
    
245
    Parameters:
246
    - S: array-like, shape (n, n)
247
         Symmetric matrix to project.
248
    - nx: backend, optional
249
         Numerical backend to use.
250
    - vmin: float, default=0.0
251
         Minimum eigenvalue threshold.
252
    
253
    Returns:
254
    - S_projected: ndarray, shape (n, n)
255
         Positive semidefinite projection of S.
256
    """
257
```
258

259
## Array Manipulation Functions
260

261
```python { .api }
262
def ot.utils.list_to_array(*lst, nx=None):
263
    """
264
    Convert lists or mixed types to arrays with consistent backend.
265
    
266
    Standardizes input data to arrays using the specified backend,
267
    handling mixed input types and ensuring compatibility.
268
    
269
    Parameters:
270
    - lst: sequence of array-like objects
271
         Input data to convert to arrays.
272
    - nx: backend, optional
273
         Target backend for conversion.
274
    
275
    Returns:
276
    - arrays: tuple of ndarrays
277
         Converted arrays in the target backend format.
278
    """
279

280
def ot.utils.cost_normalization(C, norm=None, nx=None):
281
    """
282
    Normalize cost matrix using various normalization schemes.
283
    
284
    Applies normalization to cost matrices to improve numerical stability
285
    and algorithm convergence.
286
    
287
    Parameters:
288
    - C: array-like, shape (n, m)
289
         Cost matrix to normalize.
290
    - norm: str, optional
291
         Normalization method. Options: 'median', 'max', 'log', 'loglog'
292
    - nx: backend, optional
293
         Numerical backend.
294
    
295
    Returns:
296
    - C_normalized: ndarray
297
         Normalized cost matrix.
298
    """
299

300
def ot.utils.dots(*args):
301
    """
302
    Compute chained dot products efficiently.
303
    
304
    Computes the dot product of multiple matrices in the optimal order
305
    to minimize computational cost.
306
    
307
    Parameters:
308
    - args: sequence of arrays
309
         Matrices to multiply in sequence.
310
    
311
    Returns:
312
    - result: ndarray
313
         Result of chained matrix multiplication.
314
    
315
    Example:
316
        A, B, C = random_matrices()
317
        result = ot.utils.dots(A, B, C)  # Equivalent to A @ B @ C
318
    """
319

320
def ot.utils.is_all_finite(*args):
321
    """
322
    Check if all elements in arrays are finite.
323
    
324
    Validates that arrays contain only finite values (no NaN or infinity),
325
    useful for debugging numerical issues.
326
    
327
    Parameters:
328
    - args: sequence of arrays
329
         Arrays to check.
330
    
331
    Returns:
332
    - all_finite: bool
333
         True if all elements in all arrays are finite.
334
    """
335
```
336

337
## Label Processing Functions
338

339
```python { .api }
340
def ot.utils.label_normalization(y, start=0, nx=None):
341
    """
342
    Normalize label array to consecutive integers starting from specified value.
343
    
344
    Converts arbitrary label values to normalized consecutive integers,
345
    useful for domain adaptation and classification tasks.
346
    
347
    Parameters:
348
    - y: array-like, shape (n,)
349
         Input labels (can be strings, integers, etc.).
350
    - start: int, default=0
351
         Starting value for normalized labels.
352
    - nx: backend, optional
353
         Numerical backend for array operations.
354
    
355
    Returns:
356
    - y_normalized: ndarray, shape (n,)
357
         Normalized integer labels starting from 'start'.
358
    - unique_labels: list
359
         Original unique label values in order.
360
    
361
    Example:
362
        y = ['cat', 'dog', 'cat', 'bird']
363
        y_norm, labels = ot.utils.label_normalization(y)
364
        # y_norm: [0, 1, 0, 2], labels: ['cat', 'dog', 'bird']
365
    """
366

367
def ot.utils.labels_to_masks(y, type_as=None, nx=None):
368
    """
369
    Convert label array to binary mask matrix.
370
    
371
    Creates one-hot encoded masks from categorical labels, where each column
372
    corresponds to one class.
373
    
374
    Parameters:
375
    - y: array-like, shape (n,)
376
         Integer labels.
377
    - type_as: array-like, optional
378
         Reference array for output type.
379
    - nx: backend, optional
380
         Numerical backend.
381
    
382
    Returns:
383
    - masks: ndarray, shape (n, n_classes)
384
         Binary mask matrix where masks[i, j] = 1 if y[i] == j.
385
    
386
    Example:
387
        y = [0, 1, 0, 2]
388
        masks = ot.utils.labels_to_masks(y)
389
        # masks: [[1, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1]]
390
    """
391
```
392

393
## Geometric and Kernel Functions
394

395
```python { .api }
396
def ot.utils.kernel(x1, x2, method='gaussian', sigma=1.0):
397
    """
398
    Compute kernel matrix between sample sets.
399
    
400
    Generates kernel matrices for various kernel functions, useful for
401
    kernel-based optimal transport methods.
402
    
403
    Parameters:
404
    - x1: array-like, shape (n1, d)
405
         First sample set.
406
    - x2: array-like, shape (n2, d)
407
         Second sample set.
408
    - method: str, default='gaussian'
409
         Kernel type. Options: 'gaussian', 'linear', 'polynomial'
410
    - sigma: float, default=1.0
411
         Kernel bandwidth parameter (for Gaussian kernel).
412
    
413
    Returns:
414
    - kernel_matrix: ndarray, shape (n1, n2)
415
         Kernel values between samples.
416
    """
417

418
def ot.utils.laplacian(x):
419
    """
420
    Compute graph Laplacian matrix.
421
    
422
    Constructs the graph Laplacian for samples, used in graph-based
423
    optimal transport and manifold learning.
424
    
425
    Parameters:
426
    - x: array-like, shape (n, d)
427
         Sample coordinates.
428
    
429
    Returns:
430
    - laplacian: ndarray, shape (n, n)
431
         Graph Laplacian matrix.
432
    """
433

434
def ot.utils.get_coordinate_circle(x):
435
    """
436
    Get coordinates on unit circle for circular optimal transport.
437
    
438
    Maps 1D coordinates to points on the unit circle, used for
439
    circular/periodic optimal transport problems.
440
    
441
    Parameters:
442
    - x: array-like, shape (n,)
443
         1D coordinates (angles).
444
    
445
    Returns:
446
    - circle_coords: ndarray, shape (n, 2)
447
         2D coordinates on unit circle.
448
    """
449
```
450

451
## Parallel and Random Utilities
452

453
```python { .api }
454
def ot.utils.parmap(f, X, nprocs='default'):
455
    """
456
    Parallel map function for multiprocessing.
457
    
458
    Applies function f to elements of X in parallel using multiple processes.
459
    
460
    Parameters:
461
    - f: callable
462
         Function to apply to each element.
463
    - X: iterable
464
         Input data to process.
465
    - nprocs: int or 'default'
466
         Number of processes. If 'default', uses all available cores.
467
    
468
    Returns:
469
    - results: list
470
         Results of applying f to each element of X.
471
    """
472

473
def ot.utils.check_random_state(seed):
474
    """
475
    Validate and convert random seed to RandomState object.
476
    
477
    Ensures consistent random number generation across different input types.
478
    
479
    Parameters:
480
    - seed: int, RandomState, or None
481
         Random seed specification.
482
    
483
    Returns:
484
    - random_state: numpy.random.RandomState
485
         Validated random state object.
486
    """
487

488
def ot.utils.check_params(**kwargs):
489
    """
490
    Validate function parameters and provide defaults.
491
    
492
    Generic parameter validation utility for POT functions.
493
    
494
    Parameters:
495
    - kwargs: dict
496
         Parameter dictionary to validate.
497
    
498
    Returns:
499
    - validated_params: dict
500
         Validated parameters with defaults filled in.
501
    """
502
```
503

504
## Backend Utilities
505

506
```python { .api }
507
def ot.utils.reduce_lazytensor(a, func, dim=None, **kwargs):
508
    """
509
    Reduce lazy tensor along specified dimensions.
510
    
511
    Efficient reduction operations for lazy tensor backends like KeOps.
512
    
513
    Parameters:
514
    - a: LazyTensor
515
         Input lazy tensor.
516
    - func: str
517
         Reduction function ('sum', 'max', 'min', etc.).
518
    - dim: int, optional
519
         Dimension along which to reduce.
520
    - kwargs: dict
521
         Additional arguments for reduction.
522
    
523
    Returns:
524
    - result: array
525
         Result of reduction operation.
526
    """
527

528
def ot.utils.get_lowrank_lazytensor(Q, R, X, Y):
529
    """
530
    Create low-rank lazy tensor representation.
531
    
532
    Constructs efficient lazy tensor for low-rank matrix operations.
533
    
534
    Parameters:
535
    - Q: array-like
536
         Left factor matrix.
537
    - R: array-like
538
         Right factor matrix.
539
    - X: array-like
540
         Source coordinates.
541
    - Y: array-like
542
         Target coordinates.
543
    
544
    Returns:
545
    - lazy_tensor: LazyTensor
546
         Low-rank lazy tensor representation.
547
    """
548

549
def ot.utils.get_parameter_pair(parameter):
550
    """
551
    Convert single parameter to parameter pair for source/target.
552
    
553
    Utility for handling parameters that can be specified as single values
554
    or pairs for source and target separately.
555
    
556
    Parameters:
557
    - parameter: float or tuple
558
         Parameter value(s).
559
    
560
    Returns:
561
    - param_source: float
562
    - param_target: float
563
    """
564
```
565

566
## Dataset Generation (`ot.datasets`)
567

568
```python { .api }
569
def ot.datasets.make_1D_gauss(n, m, s):
570
    """
571
    Generate 1D Gaussian histogram.
572
    
573
    Creates a discrete 1D Gaussian distribution on a regular grid.
574
    
575
    Parameters:
576
    - n: int
577
         Number of bins/points in the histogram.
578
    - m: float
579
         Mean of the Gaussian distribution.
580
    - s: float
581
         Standard deviation of the Gaussian.
582
    
583
    Returns:
584
    - histogram: ndarray, shape (n,)
585
         Normalized 1D Gaussian histogram.
586
    - x: ndarray, shape (n,)
587
         Bin centers (x-coordinates).
588
    
589
    Example:
590
        hist, x = ot.datasets.make_1D_gauss(100, 0.5, 0.1)
591
    """
592

593
def ot.datasets.make_2D_samples_gauss(n, m, sigma, random_state=None):
594
    """
595
    Generate 2D Gaussian samples.
596
    
597
    Creates n samples from a 2D Gaussian distribution with specified
598
    mean and covariance matrix.
599
    
600
    Parameters:
601
    - n: int
602
         Number of samples to generate.
603
    - m: array-like, shape (2,)
604
         Mean vector of the Gaussian.
605
    - sigma: array-like, shape (2, 2)
606
         Covariance matrix of the Gaussian.
607
    - random_state: int, optional
608
         Random seed for reproducibility.
609
    
610
    Returns:
611
    - samples: ndarray, shape (n, 2)
612
         Generated 2D Gaussian samples.
613
    
614
    Example:
615
        mean = [0, 0]
616
        cov = [[1, 0.5], [0.5, 1]]
617
        X = ot.datasets.make_2D_samples_gauss(1000, mean, cov)
618
    """
619

620
def ot.datasets.make_data_classif(dataset, n, nz=0.5, theta=0, p=0.5, random_state=None, **kwargs):
621
    """
622
    Generate classification datasets for domain adaptation.
623
    
624
    Creates synthetic datasets commonly used for testing domain adaptation
625
    algorithms with optimal transport.
626
    
627
    Parameters:
628
    - dataset: str
629
         Dataset type. Options: 'gaussians', 'moons', 'circles'
630
    - n: int
631
         Number of samples per class.
632
    - nz: float, default=0.5
633
         Noise level.
634
    - theta: float, default=0
635
         Rotation angle for domain shift.
636
    - p: float, default=0.5
637
         Proportion parameter.
638
    - random_state: int, optional
639
         Random seed.
640
    - kwargs: dict
641
         Additional dataset-specific parameters.
642
    
643
    Returns:
644
    - X: ndarray, shape (n_total, n_features)
645
         Sample coordinates.
646
    - y: ndarray, shape (n_total,)
647
         Class labels.
648
    
649
    Example:
650
        X, y = ot.datasets.make_data_classif('moons', 100, nz=0.1)
651
    """
652
```
653

654
## Usage Examples
655

656
### Basic Utility Usage
657
```python
658
import ot
659
import numpy as np
660

661
# Timing code execution
662
ot.tic()
663
result = np.linalg.eig(np.random.rand(1000, 1000))
664
elapsed = ot.toq()
665
print(f"Eigendecomposition took {elapsed:.3f} seconds")
666

667
# Generate uniform distribution
668
uniform_dist = ot.unif(10)
669
print("Uniform distribution:", uniform_dist)
670

671
# Compute distance matrix
672
X = np.random.rand(5, 2)
673
Y = np.random.rand(3, 2)
674
distances = ot.dist(X, Y)
675
print("Distance matrix shape:", distances.shape)
676
```
677

678
### Working with Labels
679
```python
680
# Label normalization
681
labels = ['cat', 'dog', 'cat', 'bird', 'dog']
682
normalized_labels, unique = ot.utils.label_normalization(labels)
683
print("Normalized labels:", normalized_labels)
684
print("Unique labels:", unique)
685

686
# Convert to masks
687
masks = ot.utils.labels_to_masks(normalized_labels)
688
print("One-hot masks shape:", masks.shape)
689
```
690

691
### Dataset Generation
692
```python
693
# 1D Gaussian histogram
694
hist, x = ot.datasets.make_1D_gauss(50, 0.3, 0.1)
695
print("1D histogram sum:", np.sum(hist))
696

697
# 2D Gaussian samples
698
mean = [1, -1]
699
cov = [[0.5, 0.2], [0.2, 0.8]]
700
samples = ot.datasets.make_2D_samples_gauss(200, mean, cov)
701
print("2D samples shape:", samples.shape)
702

703
# Classification dataset
704
X_moons, y_moons = ot.datasets.make_data_classif('moons', 100, nz=0.2)
705
print("Moons dataset:", X_moons.shape, "Classes:", np.unique(y_moons))
706
```
707

708
### Projections and Normalizations
709
```python
710
# Simplex projection
711
v = np.array([2.0, -1.0, 3.0, 0.5])
712
projected = ot.utils.proj_simplex(v)
713
print("Original vector:", v)
714
print("Projected (simplex):", projected)
715
print("Sum after projection:", np.sum(projected))
716

717
# Cost matrix normalization
718
C = np.random.rand(10, 10) * 100
719
C_normalized = ot.utils.cost_normalization(C, norm='median')
720
print("Original cost range:", [np.min(C), np.max(C)])
721
print("Normalized cost range:", [np.min(C_normalized), np.max(C_normalized)])
722
```
723

724
The utilities and datasets modules provide the foundational tools needed for most optimal transport applications, from basic array manipulations to specialized dataset generation for research and benchmarking.

Version

Tile

Files

utilities.mddocs/

Version

Tile

Files

utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

utilities.mddocs/