Tessl Tile for pypi/deepeval@3.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

agentic-metrics.md benchmarks.md content-quality-metrics.md conversational-metrics.md core-evaluation.md custom-metrics.md dataset.md index.md integrations.md models.md multimodal-metrics.md rag-metrics.md synthesizer.md test-cases.md tracing.md

multimodal-metrics.mddocs/

0
# Multimodal Metrics
1

2
Metrics for evaluating multimodal LLM outputs involving text and images. These metrics assess image generation quality, visual question answering, image coherence, and multimodal RAG systems.
3

4
## Imports
5

6
```python
7
from deepeval.metrics import (
8
    MultimodalGEval,
9
    TextToImageMetric,
10
    ImageEditingMetric,
11
    ImageCoherenceMetric,
12
    ImageHelpfulnessMetric,
13
    ImageReferenceMetric,
14
    MultimodalContextualRecallMetric,
15
    MultimodalContextualRelevancyMetric,
16
    MultimodalContextualPrecisionMetric,
17
    MultimodalAnswerRelevancyMetric,
18
    MultimodalFaithfulnessMetric,
19
    MultimodalToolCorrectnessMetric
20
)
21
```
22

23
## Capabilities
24

25
### Multimodal G-Eval
26

27
G-Eval for multimodal test cases with custom evaluation criteria.
28

29
```python { .api }
30
class MultimodalGEval:
31
    """
32
    G-Eval for multimodal test cases.
33

34
    Parameters:
35
    - name (str): Name of the metric
36
    - criteria (str): Evaluation criteria
37
    - evaluation_params (List[MLLMTestCaseParams]): Parameters to evaluate
38
    - evaluation_steps (List[str], optional): Steps for evaluation
39
    - threshold (float): Success threshold (default: 0.5)
40
    - model (Union[str, DeepEvalBaseMLLM], optional): Multimodal evaluation model
41
    - async_mode (bool): Async mode (default: True)
42

43
    Attributes:
44
    - score (float): Evaluation score (0-1)
45
    - reason (str): Explanation
46
    - success (bool): Whether score meets threshold
47
    """
48
```
49

50
### Text-to-Image Metric
51

52
Evaluates text-to-image generation quality.
53

54
```python { .api }
55
class TextToImageMetric:
56
    """
57
    Evaluates text-to-image generation quality.
58

59
    Parameters:
60
    - threshold (float): Success threshold (default: 0.5)
61
    - model (Union[str, DeepEvalBaseMLLM], optional): Evaluation model
62
    - include_reason (bool): Include reason (default: True)
63

64
    Required Test Case Parameters:
65
    - INPUT (text prompt)
66
    - ACTUAL_OUTPUT (generated image)
67

68
    Attributes:
69
    - score (float): Image quality score (0-1)
70
    - reason (str): Explanation
71
    - success (bool): Whether score meets threshold
72
    """
73
```
74

75
### Image Coherence Metric
76

77
Evaluates coherence of images in context.
78

79
```python { .api }
80
class ImageCoherenceMetric:
81
    """
82
    Evaluates coherence of images in context.
83

84
    Parameters:
85
    - threshold (float): Success threshold (default: 0.5)
86
    - model (Union[str, DeepEvalBaseMLLM], optional): Evaluation model
87

88
    Required Test Case Parameters:
89
    - INPUT
90
    - ACTUAL_OUTPUT (images)
91
    - CONTEXT
92

93
    Attributes:
94
    - score (float): Coherence score (0-1)
95
    - reason (str): Explanation
96
    - success (bool): Whether score meets threshold
97
    """
98
```
99

100
### Image Helpfulness Metric
101

102
Evaluates helpfulness of images in responses.
103

104
```python { .api }
105
class ImageHelpfulnessMetric:
106
    """
107
    Evaluates helpfulness of images.
108

109
    Parameters:
110
    - threshold (float): Success threshold (default: 0.5)
111
    - model (Union[str, DeepEvalBaseMLLM], optional): Evaluation model
112

113
    Required Test Case Parameters:
114
    - INPUT
115
    - ACTUAL_OUTPUT (response with images)
116

117
    Attributes:
118
    - score (float): Helpfulness score (0-1)
119
    - reason (str): Explanation
120
    - success (bool): Whether score meets threshold
121
    """
122
```
123

124
### Multimodal RAG Metrics
125

126
RAG metrics adapted for multimodal inputs and outputs.
127

128
```python { .api }
129
class MultimodalAnswerRelevancyMetric:
130
    """
131
    Answer relevancy for multimodal inputs.
132

133
    Parameters:
134
    - threshold (float): Success threshold (default: 0.5)
135
    - model (Union[str, DeepEvalBaseMLLM], optional): Evaluation model
136
    """
137

138
class MultimodalFaithfulnessMetric:
139
    """
140
    Faithfulness for multimodal outputs.
141

142
    Parameters:
143
    - threshold (float): Success threshold (default: 0.5)
144
    - model (Union[str, DeepEvalBaseMLLM], optional): Evaluation model
145
    """
146

147
class MultimodalContextualRecallMetric:
148
    """
149
    Contextual recall for multimodal inputs.
150

151
    Parameters:
152
    - threshold (float): Success threshold (default: 0.5)
153
    - model (Union[str, DeepEvalBaseMLLM], optional): Evaluation model
154
    """
155

156
class MultimodalContextualRelevancyMetric:
157
    """
158
    Contextual relevancy for multimodal inputs.
159

160
    Parameters:
161
    - threshold (float): Success threshold (default: 0.5)
162
    - model (Union[str, DeepEvalBaseMLLM], optional): Evaluation model
163
    """
164

165
class MultimodalContextualPrecisionMetric:
166
    """
167
    Contextual precision for multimodal inputs.
168

169
    Parameters:
170
    - threshold (float): Success threshold (default: 0.5)
171
    - model (Union[str, DeepEvalBaseMLLM], optional): Evaluation model
172
    """
173
```
174

175
Usage example:
176

177
```python
178
from deepeval.metrics import (
179
    MultimodalAnswerRelevancyMetric,
180
    MultimodalFaithfulnessMetric
181
)
182
from deepeval.test_case import MLLMTestCase, MLLMImage
183

184
# Visual QA with retrieval
185
test_case = MLLMTestCase(
186
    input=[
187
        "What safety equipment is visible in this image?",
188
        MLLMImage(url="construction_site.jpg", local=True)
189
    ],
190
    actual_output=["Hard hats, safety vests, and steel-toed boots are visible."],
191
    retrieval_context=[
192
        "Safety requirements: hard hats, safety vests, steel-toed boots",
193
        MLLMImage(url="safety_guide.jpg")
194
    ]
195
)
196

197
metrics = [
198
    MultimodalAnswerRelevancyMetric(threshold=0.7),
199
    MultimodalFaithfulnessMetric(threshold=0.8)
200
]
201

202
for metric in metrics:
203
    metric.measure(test_case)
204
    print(f"{metric.__class__.__name__}: {metric.score:.2f}")
205
```
206

207
### Multimodal Tool Correctness
208

209
Tool correctness for multimodal contexts.
210

211
```python { .api }
212
class MultimodalToolCorrectnessMetric:
213
    """
214
    Tool correctness for multimodal contexts.
215

216
    Parameters:
217
    - threshold (float): Success threshold (default: 0.5)
218
    - model (Union[str, DeepEvalBaseMLLM], optional): Evaluation model
219

220
    Required Test Case Parameters:
221
    - TOOLS_CALLED
222
    - EXPECTED_TOOLS
223
    """
224
```
225

Version

Tile

Files

multimodal-metrics.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

multimodal-metrics.mddocs/