or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-methods.mdbackend-system.mddomain-adaptation.mdentropic-transport.mdfactored-transport.mdgromov-wasserstein.mdindex.mdlinear-programming.mdpartial-transport.mdregularization-path.mdsliced-wasserstein.mdsmooth-transport.mdstochastic-solvers.mdunbalanced-transport.mdunified-solvers.mdutilities.mdweak-transport.md

utilities.mddocs/

0

# Utility Functions and Tools

1

2

The `ot.utils` and `ot.datasets` modules provide essential utility functions and data generation tools that support optimal transport computations. These include distance calculations, distribution generators, timing functions, array manipulations, and synthetic datasets for testing and benchmarking.

3

4

## Timing Functions

5

6

```python { .api }

7

def ot.utils.tic():

8

"""

9

Start timer for performance measurement.

10

11

Initializes a global timer to measure elapsed time for code execution.

12

Use in combination with toc() or toq() for timing code blocks.

13

14

Example:

15

ot.tic()

16

# ... code to time ...

17

elapsed = ot.toq()

18

"""

19

20

def ot.utils.toc(message="Elapsed time : {} s"):

21

"""

22

End timer and print elapsed time with custom message.

23

24

Prints the elapsed time since the last tic() call with a customizable

25

message format.

26

27

Parameters:

28

- message: str, default="Elapsed time : {} s"

29

Format string for the elapsed time message. Should contain {} placeholder

30

for the time value.

31

32

Example:

33

ot.tic()

34

# ... computation ...

35

ot.toc("Computation took: {:.3f} seconds")

36

"""

37

38

def ot.utils.toq():

39

"""

40

End timer and return elapsed time without printing.

41

42

Returns the elapsed time since the last tic() call as a float value

43

without printing any message.

44

45

Returns:

46

- elapsed_time: float

47

Elapsed time in seconds.

48

49

Example:

50

ot.tic()

51

result = expensive_computation()

52

time_taken = ot.toq()

53

print(f"Computation took {time_taken:.2f} seconds")

54

"""

55

```

56

57

## Distribution Functions

58

59

```python { .api }

60

def ot.utils.unif(n, type_as=None):

61

"""

62

Generate uniform distribution over n points.

63

64

Creates a uniform probability distribution (histogram) with equal mass

65

on each of n support points.

66

67

Parameters:

68

- n: int

69

Number of points in the distribution.

70

- type_as: array-like, optional

71

Reference array to determine the output array type and backend.

72

If None, returns numpy array.

73

74

Returns:

75

- distribution: ndarray, shape (n,)

76

Uniform distribution with each entry equal to 1/n.

77

78

Example:

79

uniform_dist = ot.unif(5) # [0.2, 0.2, 0.2, 0.2, 0.2]

80

"""

81

82

def ot.utils.clean_zeros(a, b, M):

83

"""

84

Remove zero entries from distributions and corresponding cost matrix entries.

85

86

Filters out zero-weight points from source and target distributions

87

and removes corresponding rows/columns from the cost matrix to avoid

88

numerical issues and reduce computation.

89

90

Parameters:

91

- a: array-like, shape (n_source,)

92

Source distribution (may contain zeros).

93

- b: array-like, shape (n_target,)

94

Target distribution (may contain zeros).

95

- M: array-like, shape (n_source, n_target)

96

Cost matrix.

97

98

Returns:

99

- a_clean: ndarray

100

Source distribution with zeros removed.

101

- b_clean: ndarray

102

Target distribution with zeros removed.

103

- M_clean: ndarray

104

Cost matrix with corresponding rows/columns removed.

105

106

Example:

107

a = [0.5, 0.0, 0.5]

108

b = [0.3, 0.7]

109

M = [[1, 2], [3, 4], [5, 6]]

110

a_clean, b_clean, M_clean = ot.utils.clean_zeros(a, b, M)

111

# Returns: [0.5, 0.5], [0.3, 0.7], [[1, 2], [5, 6]]

112

"""

113

```

114

115

## Distance Functions

116

117

```python { .api }

118

def ot.utils.dist(x1, x2=None, metric='sqeuclidean'):

119

"""

120

Compute distance matrix between sample sets.

121

122

Computes pairwise distances between points in x1 and x2 using the

123

specified metric. This is the primary function for generating cost

124

matrices from sample coordinates.

125

126

Parameters:

127

- x1: array-like, shape (n1, d)

128

First set of samples (source points).

129

- x2: array-like, shape (n2, d), optional

130

Second set of samples (target points). If None, computes distances

131

within x1 (i.e., x2 = x1).

132

- metric: str, default='sqeuclidean'

133

Distance metric to use. Options include:

134

'sqeuclidean', 'euclidean', 'cityblock', 'cosine', 'correlation',

135

'hamming', 'jaccard', 'chebyshev', 'minkowski', 'mahalanobis'

136

137

Returns:

138

- distance_matrix: ndarray, shape (n1, n2)

139

Matrix of pairwise distances. Entry (i,j) is the distance between

140

x1[i] and x2[j].

141

142

Example:

143

X1 = np.array([[0, 0], [1, 1]])

144

X2 = np.array([[0, 1], [1, 0]])

145

M = ot.dist(X1, X2) # [[1, 1], [1, 1]]

146

"""

147

148

def ot.utils.euclidean_distances(X, Y, squared=False):

149

"""

150

Compute Euclidean distances between samples.

151

152

Efficient computation of Euclidean distances with option for squared distances.

153

154

Parameters:

155

- X: array-like, shape (n_samples_X, n_features)

156

First sample set.

157

- Y: array-like, shape (n_samples_Y, n_features)

158

Second sample set.

159

- squared: bool, default=False

160

If True, return squared Euclidean distances.

161

162

Returns:

163

- distances: ndarray, shape (n_samples_X, n_samples_Y)

164

Euclidean distance matrix.

165

"""

166

167

def ot.utils.dist0(n, method='lin_square'):

168

"""

169

Generate ground cost matrix for n points on a grid.

170

171

Creates standard cost matrices for points arranged on 1D or 2D grids,

172

commonly used for image processing and discrete optimal transport.

173

174

Parameters:

175

- n: int

176

Number of points (for 1D) or side length (for 2D grid).

177

- method: str, default='lin_square'

178

Grid arrangement and distance metric. Options:

179

'lin_square': 1D grid with squared distances

180

'lin': 1D grid with linear distances

181

'square': 2D square grid

182

183

Returns:

184

- cost_matrix: ndarray, shape (n, n) or (n*n, n*n)

185

Ground cost matrix for the specified grid arrangement.

186

187

Example:

188

M = ot.utils.dist0(3, method='lin_square')

189

# Returns 3x3 matrix with squared distances on 1D line

190

"""

191

```

192

193

## Projection Functions

194

195

```python { .api }

196

def ot.utils.proj_simplex(v, z=1):

197

"""

198

Projection onto the probability simplex.

199

200

Projects a vector onto the probability simplex: {x : x_i >= 0, sum(x) = z}.

201

Essential for many optimization algorithms in optimal transport.

202

203

Parameters:

204

- v: array-like, shape (n,)

205

Input vector to project.

206

- z: float, default=1

207

Sum constraint for the simplex.

208

209

Returns:

210

- projected_vector: ndarray, shape (n,)

211

Projection of v onto the simplex.

212

213

Example:

214

v = np.array([2.0, -1.0, 3.0])

215

p = ot.utils.proj_simplex(v) # Projects to valid probability distribution

216

"""

217

218

def ot.utils.projection_sparse_simplex(V, max_nz, z=1):

219

"""

220

Projection onto sparse simplex with cardinality constraint.

221

222

Projects onto the intersection of probability simplex and sparsity constraint

223

(at most max_nz non-zero entries).

224

225

Parameters:

226

- V: array-like, shape (n,)

227

Input vector.

228

- max_nz: int

229

Maximum number of non-zero entries.

230

- z: float, default=1

231

Sum constraint.

232

233

Returns:

234

- projected_vector: ndarray, shape (n,)

235

Sparse simplex projection.

236

"""

237

238

def ot.utils.proj_SDP(S, nx=None, vmin=0.0):

239

"""

240

Projection onto positive semidefinite cone.

241

242

Projects a symmetric matrix onto the cone of positive semidefinite matrices

243

by eigendecomposition and thresholding negative eigenvalues.

244

245

Parameters:

246

- S: array-like, shape (n, n)

247

Symmetric matrix to project.

248

- nx: backend, optional

249

Numerical backend to use.

250

- vmin: float, default=0.0

251

Minimum eigenvalue threshold.

252

253

Returns:

254

- S_projected: ndarray, shape (n, n)

255

Positive semidefinite projection of S.

256

"""

257

```

258

259

## Array Manipulation Functions

260

261

```python { .api }

262

def ot.utils.list_to_array(*lst, nx=None):

263

"""

264

Convert lists or mixed types to arrays with consistent backend.

265

266

Standardizes input data to arrays using the specified backend,

267

handling mixed input types and ensuring compatibility.

268

269

Parameters:

270

- lst: sequence of array-like objects

271

Input data to convert to arrays.

272

- nx: backend, optional

273

Target backend for conversion.

274

275

Returns:

276

- arrays: tuple of ndarrays

277

Converted arrays in the target backend format.

278

"""

279

280

def ot.utils.cost_normalization(C, norm=None, nx=None):

281

"""

282

Normalize cost matrix using various normalization schemes.

283

284

Applies normalization to cost matrices to improve numerical stability

285

and algorithm convergence.

286

287

Parameters:

288

- C: array-like, shape (n, m)

289

Cost matrix to normalize.

290

- norm: str, optional

291

Normalization method. Options: 'median', 'max', 'log', 'loglog'

292

- nx: backend, optional

293

Numerical backend.

294

295

Returns:

296

- C_normalized: ndarray

297

Normalized cost matrix.

298

"""

299

300

def ot.utils.dots(*args):

301

"""

302

Compute chained dot products efficiently.

303

304

Computes the dot product of multiple matrices in the optimal order

305

to minimize computational cost.

306

307

Parameters:

308

- args: sequence of arrays

309

Matrices to multiply in sequence.

310

311

Returns:

312

- result: ndarray

313

Result of chained matrix multiplication.

314

315

Example:

316

A, B, C = random_matrices()

317

result = ot.utils.dots(A, B, C) # Equivalent to A @ B @ C

318

"""

319

320

def ot.utils.is_all_finite(*args):

321

"""

322

Check if all elements in arrays are finite.

323

324

Validates that arrays contain only finite values (no NaN or infinity),

325

useful for debugging numerical issues.

326

327

Parameters:

328

- args: sequence of arrays

329

Arrays to check.

330

331

Returns:

332

- all_finite: bool

333

True if all elements in all arrays are finite.

334

"""

335

```

336

337

## Label Processing Functions

338

339

```python { .api }

340

def ot.utils.label_normalization(y, start=0, nx=None):

341

"""

342

Normalize label array to consecutive integers starting from specified value.

343

344

Converts arbitrary label values to normalized consecutive integers,

345

useful for domain adaptation and classification tasks.

346

347

Parameters:

348

- y: array-like, shape (n,)

349

Input labels (can be strings, integers, etc.).

350

- start: int, default=0

351

Starting value for normalized labels.

352

- nx: backend, optional

353

Numerical backend for array operations.

354

355

Returns:

356

- y_normalized: ndarray, shape (n,)

357

Normalized integer labels starting from 'start'.

358

- unique_labels: list

359

Original unique label values in order.

360

361

Example:

362

y = ['cat', 'dog', 'cat', 'bird']

363

y_norm, labels = ot.utils.label_normalization(y)

364

# y_norm: [0, 1, 0, 2], labels: ['cat', 'dog', 'bird']

365

"""

366

367

def ot.utils.labels_to_masks(y, type_as=None, nx=None):

368

"""

369

Convert label array to binary mask matrix.

370

371

Creates one-hot encoded masks from categorical labels, where each column

372

corresponds to one class.

373

374

Parameters:

375

- y: array-like, shape (n,)

376

Integer labels.

377

- type_as: array-like, optional

378

Reference array for output type.

379

- nx: backend, optional

380

Numerical backend.

381

382

Returns:

383

- masks: ndarray, shape (n, n_classes)

384

Binary mask matrix where masks[i, j] = 1 if y[i] == j.

385

386

Example:

387

y = [0, 1, 0, 2]

388

masks = ot.utils.labels_to_masks(y)

389

# masks: [[1, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1]]

390

"""

391

```

392

393

## Geometric and Kernel Functions

394

395

```python { .api }

396

def ot.utils.kernel(x1, x2, method='gaussian', sigma=1.0):

397

"""

398

Compute kernel matrix between sample sets.

399

400

Generates kernel matrices for various kernel functions, useful for

401

kernel-based optimal transport methods.

402

403

Parameters:

404

- x1: array-like, shape (n1, d)

405

First sample set.

406

- x2: array-like, shape (n2, d)

407

Second sample set.

408

- method: str, default='gaussian'

409

Kernel type. Options: 'gaussian', 'linear', 'polynomial'

410

- sigma: float, default=1.0

411

Kernel bandwidth parameter (for Gaussian kernel).

412

413

Returns:

414

- kernel_matrix: ndarray, shape (n1, n2)

415

Kernel values between samples.

416

"""

417

418

def ot.utils.laplacian(x):

419

"""

420

Compute graph Laplacian matrix.

421

422

Constructs the graph Laplacian for samples, used in graph-based

423

optimal transport and manifold learning.

424

425

Parameters:

426

- x: array-like, shape (n, d)

427

Sample coordinates.

428

429

Returns:

430

- laplacian: ndarray, shape (n, n)

431

Graph Laplacian matrix.

432

"""

433

434

def ot.utils.get_coordinate_circle(x):

435

"""

436

Get coordinates on unit circle for circular optimal transport.

437

438

Maps 1D coordinates to points on the unit circle, used for

439

circular/periodic optimal transport problems.

440

441

Parameters:

442

- x: array-like, shape (n,)

443

1D coordinates (angles).

444

445

Returns:

446

- circle_coords: ndarray, shape (n, 2)

447

2D coordinates on unit circle.

448

"""

449

```

450

451

## Parallel and Random Utilities

452

453

```python { .api }

454

def ot.utils.parmap(f, X, nprocs='default'):

455

"""

456

Parallel map function for multiprocessing.

457

458

Applies function f to elements of X in parallel using multiple processes.

459

460

Parameters:

461

- f: callable

462

Function to apply to each element.

463

- X: iterable

464

Input data to process.

465

- nprocs: int or 'default'

466

Number of processes. If 'default', uses all available cores.

467

468

Returns:

469

- results: list

470

Results of applying f to each element of X.

471

"""

472

473

def ot.utils.check_random_state(seed):

474

"""

475

Validate and convert random seed to RandomState object.

476

477

Ensures consistent random number generation across different input types.

478

479

Parameters:

480

- seed: int, RandomState, or None

481

Random seed specification.

482

483

Returns:

484

- random_state: numpy.random.RandomState

485

Validated random state object.

486

"""

487

488

def ot.utils.check_params(**kwargs):

489

"""

490

Validate function parameters and provide defaults.

491

492

Generic parameter validation utility for POT functions.

493

494

Parameters:

495

- kwargs: dict

496

Parameter dictionary to validate.

497

498

Returns:

499

- validated_params: dict

500

Validated parameters with defaults filled in.

501

"""

502

```

503

504

## Backend Utilities

505

506

```python { .api }

507

def ot.utils.reduce_lazytensor(a, func, dim=None, **kwargs):

508

"""

509

Reduce lazy tensor along specified dimensions.

510

511

Efficient reduction operations for lazy tensor backends like KeOps.

512

513

Parameters:

514

- a: LazyTensor

515

Input lazy tensor.

516

- func: str

517

Reduction function ('sum', 'max', 'min', etc.).

518

- dim: int, optional

519

Dimension along which to reduce.

520

- kwargs: dict

521

Additional arguments for reduction.

522

523

Returns:

524

- result: array

525

Result of reduction operation.

526

"""

527

528

def ot.utils.get_lowrank_lazytensor(Q, R, X, Y):

529

"""

530

Create low-rank lazy tensor representation.

531

532

Constructs efficient lazy tensor for low-rank matrix operations.

533

534

Parameters:

535

- Q: array-like

536

Left factor matrix.

537

- R: array-like

538

Right factor matrix.

539

- X: array-like

540

Source coordinates.

541

- Y: array-like

542

Target coordinates.

543

544

Returns:

545

- lazy_tensor: LazyTensor

546

Low-rank lazy tensor representation.

547

"""

548

549

def ot.utils.get_parameter_pair(parameter):

550

"""

551

Convert single parameter to parameter pair for source/target.

552

553

Utility for handling parameters that can be specified as single values

554

or pairs for source and target separately.

555

556

Parameters:

557

- parameter: float or tuple

558

Parameter value(s).

559

560

Returns:

561

- param_source: float

562

- param_target: float

563

"""

564

```

565

566

## Dataset Generation (`ot.datasets`)

567

568

```python { .api }

569

def ot.datasets.make_1D_gauss(n, m, s):

570

"""

571

Generate 1D Gaussian histogram.

572

573

Creates a discrete 1D Gaussian distribution on a regular grid.

574

575

Parameters:

576

- n: int

577

Number of bins/points in the histogram.

578

- m: float

579

Mean of the Gaussian distribution.

580

- s: float

581

Standard deviation of the Gaussian.

582

583

Returns:

584

- histogram: ndarray, shape (n,)

585

Normalized 1D Gaussian histogram.

586

- x: ndarray, shape (n,)

587

Bin centers (x-coordinates).

588

589

Example:

590

hist, x = ot.datasets.make_1D_gauss(100, 0.5, 0.1)

591

"""

592

593

def ot.datasets.make_2D_samples_gauss(n, m, sigma, random_state=None):

594

"""

595

Generate 2D Gaussian samples.

596

597

Creates n samples from a 2D Gaussian distribution with specified

598

mean and covariance matrix.

599

600

Parameters:

601

- n: int

602

Number of samples to generate.

603

- m: array-like, shape (2,)

604

Mean vector of the Gaussian.

605

- sigma: array-like, shape (2, 2)

606

Covariance matrix of the Gaussian.

607

- random_state: int, optional

608

Random seed for reproducibility.

609

610

Returns:

611

- samples: ndarray, shape (n, 2)

612

Generated 2D Gaussian samples.

613

614

Example:

615

mean = [0, 0]

616

cov = [[1, 0.5], [0.5, 1]]

617

X = ot.datasets.make_2D_samples_gauss(1000, mean, cov)

618

"""

619

620

def ot.datasets.make_data_classif(dataset, n, nz=0.5, theta=0, p=0.5, random_state=None, **kwargs):

621

"""

622

Generate classification datasets for domain adaptation.

623

624

Creates synthetic datasets commonly used for testing domain adaptation

625

algorithms with optimal transport.

626

627

Parameters:

628

- dataset: str

629

Dataset type. Options: 'gaussians', 'moons', 'circles'

630

- n: int

631

Number of samples per class.

632

- nz: float, default=0.5

633

Noise level.

634

- theta: float, default=0

635

Rotation angle for domain shift.

636

- p: float, default=0.5

637

Proportion parameter.

638

- random_state: int, optional

639

Random seed.

640

- kwargs: dict

641

Additional dataset-specific parameters.

642

643

Returns:

644

- X: ndarray, shape (n_total, n_features)

645

Sample coordinates.

646

- y: ndarray, shape (n_total,)

647

Class labels.

648

649

Example:

650

X, y = ot.datasets.make_data_classif('moons', 100, nz=0.1)

651

"""

652

```

653

654

## Usage Examples

655

656

### Basic Utility Usage

657

```python

658

import ot

659

import numpy as np

660

661

# Timing code execution

662

ot.tic()

663

result = np.linalg.eig(np.random.rand(1000, 1000))

664

elapsed = ot.toq()

665

print(f"Eigendecomposition took {elapsed:.3f} seconds")

666

667

# Generate uniform distribution

668

uniform_dist = ot.unif(10)

669

print("Uniform distribution:", uniform_dist)

670

671

# Compute distance matrix

672

X = np.random.rand(5, 2)

673

Y = np.random.rand(3, 2)

674

distances = ot.dist(X, Y)

675

print("Distance matrix shape:", distances.shape)

676

```

677

678

### Working with Labels

679

```python

680

# Label normalization

681

labels = ['cat', 'dog', 'cat', 'bird', 'dog']

682

normalized_labels, unique = ot.utils.label_normalization(labels)

683

print("Normalized labels:", normalized_labels)

684

print("Unique labels:", unique)

685

686

# Convert to masks

687

masks = ot.utils.labels_to_masks(normalized_labels)

688

print("One-hot masks shape:", masks.shape)

689

```

690

691

### Dataset Generation

692

```python

693

# 1D Gaussian histogram

694

hist, x = ot.datasets.make_1D_gauss(50, 0.3, 0.1)

695

print("1D histogram sum:", np.sum(hist))

696

697

# 2D Gaussian samples

698

mean = [1, -1]

699

cov = [[0.5, 0.2], [0.2, 0.8]]

700

samples = ot.datasets.make_2D_samples_gauss(200, mean, cov)

701

print("2D samples shape:", samples.shape)

702

703

# Classification dataset

704

X_moons, y_moons = ot.datasets.make_data_classif('moons', 100, nz=0.2)

705

print("Moons dataset:", X_moons.shape, "Classes:", np.unique(y_moons))

706

```

707

708

### Projections and Normalizations

709

```python

710

# Simplex projection

711

v = np.array([2.0, -1.0, 3.0, 0.5])

712

projected = ot.utils.proj_simplex(v)

713

print("Original vector:", v)

714

print("Projected (simplex):", projected)

715

print("Sum after projection:", np.sum(projected))

716

717

# Cost matrix normalization

718

C = np.random.rand(10, 10) * 100

719

C_normalized = ot.utils.cost_normalization(C, norm='median')

720

print("Original cost range:", [np.min(C), np.max(C)])

721

print("Normalized cost range:", [np.min(C_normalized), np.max(C_normalized)])

722

```

723

724

The utilities and datasets modules provide the foundational tools needed for most optimal transport applications, from basic array manipulations to specialized dataset generation for research and benchmarking.