Tessl Tile for pypi/torchmetrics@1.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

audio.md classification.md clustering.md detection.md functional.md image.md index.md multimodal.md nominal.md regression.md retrieval.md segmentation.md shape.md text.md utilities.md video.md

multimodal.mddocs/

0
# Multimodal Metrics
1

2
Metrics for evaluating multimodal AI systems including video-audio synchronization and cross-modal quality assessment for applications involving multiple data modalities.
3

4
## Capabilities
5

6
### Video-Audio Synchronization
7

8
Metrics for evaluating lip-sync and audio-visual alignment quality.
9

10
```python { .api }
11
class LipVertexError(Metric):
12
    def __init__(
13
        self,
14
        **kwargs
15
    ): ...
16
```
17

18
### Cross-Modal Quality Assessment
19

20
Deep learning-based metrics for evaluating cross-modal quality (require optional dependencies).
21

22
```python { .api }
23
class CLIPScore(Metric):
24
    def __init__(
25
        self,
26
        model_name_or_path: str = "openai/clip-vit-base-patch16",
27
        **kwargs
28
    ): ...
29

30
class CLIPImageQualityAssessment(Metric):
31
    def __init__(
32
        self,
33
        model_name_or_path: str = "openai/clip-vit-base-patch16",
34
        **kwargs
35
    ): ...
36
```
37

38
## Usage Examples
39

40
```python
41
import torch
42
from torchmetrics.multimodal import LipVertexError
43

44
# Lip vertex error for video analysis
45
lve = LipVertexError()
46

47
# Sample video landmarks (batch, time, landmarks, coords)
48
preds = torch.randn(2, 10, 68, 2)  # 2 videos, 10 frames, 68 landmarks, x-y coords
49
target = torch.randn(2, 10, 68, 2)
50

51
# Compute lip synchronization error
52
lve_score = lve(preds, target)
53
print(f"Lip Vertex Error: {lve_score:.4f}")
54

55
# CLIP Score (requires transformers)
56
try:
57
    from torchmetrics.multimodal import CLIPScore
58
    
59
    clip_metric = CLIPScore()
60
    
61
    # Sample text and images
62
    images = torch.randint(0, 256, (4, 3, 224, 224), dtype=torch.uint8)
63
    texts = ["a photo of a cat", "a dog playing", "a beautiful sunset", "a city skyline"]
64
    
65
    # Compute CLIP score
66
    clip_score = clip_metric(images, texts)
67
    print(f"CLIP Score: {clip_score:.4f}")
68
    
69
except ImportError:
70
    print("CLIP metrics require 'transformers' package")
71
```
72

73
## Types
74

75
```python { .api }
76
VideoLandmarks = Tensor  # Shape: (batch, time, landmarks, coordinates)
77
TextPrompts = List[str]  # Text descriptions or prompts
78
```

Version

Tile

Files

multimodal.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

multimodal.mddocs/