Tessl Tile for pypi/deepeval@3.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

agentic-metrics.md benchmarks.md content-quality-metrics.md conversational-metrics.md core-evaluation.md custom-metrics.md dataset.md index.md integrations.md models.md multimodal-metrics.md rag-metrics.md synthesizer.md test-cases.md tracing.md

test-cases.mddocs/

0
# Test Cases
1

2
Test cases are structured containers representing LLM interactions to be evaluated. DeepEval provides specialized test case classes for different evaluation scenarios: standard LLM tests, multi-turn conversations, multimodal inputs, and arena-style comparisons.
3

4
## Imports
5

6
```python
7
from deepeval.test_case import (
8
    LLMTestCase,
9
    LLMTestCaseParams,
10
    ConversationalTestCase,
11
    Turn,
12
    TurnParams,
13
    MLLMTestCase,
14
    MLLMImage,
15
    MLLMTestCaseParams,
16
    ArenaTestCase,
17
    Arena,
18
    ToolCall,
19
    ToolCallParams,
20
    MCPServer,
21
    MCPToolCall,
22
    MCPPromptCall,
23
    MCPResourceCall
24
)
25
```
26

27
## Capabilities
28

29
### LLM Test Case
30

31
Standard test case for evaluating single LLM interactions, supporting inputs, outputs, context, and tool usage.
32

33
```python { .api }
34
class LLMTestCase:
35
    """
36
    Represents a test case for evaluating LLM outputs.
37

38
    Parameters:
39
    - input (str): Input prompt to the LLM
40
    - actual_output (str, optional): Actual output from the LLM
41
    - expected_output (str, optional): Expected output
42
    - context (List[str], optional): Context information
43
    - retrieval_context (List[str], optional): Retrieved context for RAG applications
44
    - additional_metadata (Dict, optional): Additional metadata
45
    - tools_called (List[ToolCall], optional): Tools called by the LLM
46
    - expected_tools (List[ToolCall], optional): Expected tools to be called
47
    - comments (str, optional): Comments about the test case
48
    - token_cost (float, optional): Cost in tokens
49
    - completion_time (float, optional): Time to complete in seconds
50
    - name (str, optional): Name of the test case
51
    - tags (List[str], optional): Tags for organization
52
    - mcp_servers (List[MCPServer], optional): MCP servers configuration
53
    - mcp_tools_called (List[MCPToolCall], optional): MCP tools called
54
    - mcp_resources_called (List[MCPResourceCall], optional): MCP resources called
55
    - mcp_prompts_called (List[MCPPromptCall], optional): MCP prompts called
56
    """
57
```
58

59
Usage example:
60

61
```python
62
from deepeval.test_case import LLMTestCase
63

64
# Basic test case
65
test_case = LLMTestCase(
66
    input="What is the capital of France?",
67
    actual_output="The capital of France is Paris.",
68
    expected_output="Paris"
69
)
70

71
# RAG test case with retrieval context
72
rag_test_case = LLMTestCase(
73
    input="What's our refund policy?",
74
    actual_output="We offer a 30-day full refund at no extra cost.",
75
    expected_output="30-day full refund policy",
76
    retrieval_context=[
77
        "All customers are eligible for a 30 day full refund at no extra costs.",
78
        "Refunds are processed within 5-7 business days."
79
    ],
80
    context=["Customer support FAQ"]
81
)
82

83
# Agentic test case with tool calls
84
agentic_test_case = LLMTestCase(
85
    input="What's the weather in New York?",
86
    actual_output="The current weather in New York is 72°F and sunny.",
87
    tools_called=[
88
        ToolCall(
89
            name="get_weather",
90
            input_parameters={"location": "New York", "unit": "fahrenheit"},
91
            output={"temperature": 72, "condition": "sunny"}
92
        )
93
    ],
94
    expected_tools=[
95
        ToolCall(name="get_weather", input_parameters={"location": "New York"})
96
    ]
97
)
98
```
99

100
### LLM Test Case Parameters
101

102
Enumeration of test case parameters for use with metrics.
103

104
```python { .api }
105
class LLMTestCaseParams:
106
    """
107
    Enumeration of test case parameters.
108

109
    Values:
110
    - INPUT: "input"
111
    - ACTUAL_OUTPUT: "actual_output"
112
    - EXPECTED_OUTPUT: "expected_output"
113
    - CONTEXT: "context"
114
    - RETRIEVAL_CONTEXT: "retrieval_context"
115
    - TOOLS_CALLED: "tools_called"
116
    - EXPECTED_TOOLS: "expected_tools"
117
    - MCP_SERVERS: "mcp_servers"
118
    - MCP_TOOLS_CALLED: "mcp_tools_called"
119
    - MCP_RESOURCES_CALLED: "mcp_resources_called"
120
    - MCP_PROMPTS_CALLED: "mcp_prompts_called"
121
    """
122
```
123

124
Usage example:
125

126
```python
127
from deepeval.metrics import GEval
128
from deepeval.test_case import LLMTestCaseParams
129

130
# Use params to specify what to evaluate
131
metric = GEval(
132
    name="Answer Relevancy",
133
    criteria="Determine if the actual output is relevant to the input.",
134
    evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT]
135
)
136
```
137

138
### Tool Call
139

140
Represents a tool call made by an LLM or expected to be called.
141

142
```python { .api }
143
class ToolCall:
144
    """
145
    Represents a tool call made by an LLM.
146

147
    Parameters:
148
    - name (str): Name of the tool
149
    - description (str, optional): Description of the tool
150
    - reasoning (str, optional): Reasoning for calling the tool
151
    - output (Any, optional): Output from the tool
152
    - input_parameters (Dict[str, Any], optional): Input parameters to the tool
153
    """
154
```
155

156
Usage example:
157

158
```python
159
from deepeval.test_case import ToolCall
160

161
# Define a tool call
162
tool_call = ToolCall(
163
    name="search_database",
164
    description="Searches the product database",
165
    reasoning="Need to find product information",
166
    input_parameters={
167
        "query": "wireless headphones",
168
        "max_results": 10
169
    },
170
    output=[
171
        {"id": 1, "name": "Premium Wireless Headphones"},
172
        {"id": 2, "name": "Budget Wireless Headphones"}
173
    ]
174
)
175
```
176

177
### Tool Call Parameters
178

179
Enumeration of tool call parameters.
180

181
```python { .api }
182
class ToolCallParams:
183
    """
184
    Enumeration of tool call parameters.
185

186
    Values:
187
    - INPUT_PARAMETERS: "input_parameters"
188
    - OUTPUT: "output"
189
    """
190
```
191

192
### Conversational Test Case
193

194
Test case for evaluating multi-turn conversational interactions.
195

196
```python { .api }
197
class ConversationalTestCase:
198
    """
199
    Represents a multi-turn conversational test case.
200

201
    Parameters:
202
    - turns (List[Turn]): List of conversation turns
203
    - scenario (str, optional): Scenario description
204
    - context (List[str], optional): Context information
205
    - name (str, optional): Name of the test case
206
    - user_description (str, optional): Description of the user
207
    - expected_outcome (str, optional): Expected outcome of the conversation
208
    - chatbot_role (str, optional): Role of the chatbot
209
    - additional_metadata (Dict, optional): Additional metadata
210
    - comments (str, optional): Comments
211
    - tags (List[str], optional): Tags for organization
212
    - mcp_servers (List[MCPServer], optional): MCP servers configuration
213
    """
214
```
215

216
Usage example:
217

218
```python
219
from deepeval.test_case import ConversationalTestCase, Turn
220

221
# Multi-turn customer support conversation
222
conversation = ConversationalTestCase(
223
    scenario="Customer inquiring about product return",
224
    chatbot_role="Customer support agent",
225
    user_description="Customer who wants to return a product",
226
    expected_outcome="Customer understands return process and is satisfied",
227
    context=["30-day return policy", "Free return shipping"],
228
    turns=[
229
        Turn(
230
            role="user",
231
            content="I want to return my purchase"
232
        ),
233
        Turn(
234
            role="assistant",
235
            content="I'd be happy to help with your return. Can you provide your order number?"
236
        ),
237
        Turn(
238
            role="user",
239
            content="My order number is #12345"
240
        ),
241
        Turn(
242
            role="assistant",
243
            content="Thank you. I've initiated your return. You'll receive a prepaid return label via email within 24 hours.",
244
            retrieval_context=["Order #12345 placed on 2024-01-15"]
245
        )
246
    ]
247
)
248
```
249

250
### Turn
251

252
Represents a single turn in a conversation.
253

254
```python { .api }
255
class Turn:
256
    """
257
    Represents a single turn in a conversation.
258

259
    Parameters:
260
    - role (Literal["user", "assistant"]): Role of the speaker
261
    - content (str): Content of the turn
262
    - user_id (str, optional): User identifier
263
    - retrieval_context (List[str], optional): Retrieved context for this turn
264
    - tools_called (List[ToolCall], optional): Tools called during this turn
265
    - mcp_tools_called (List[MCPToolCall], optional): MCP tools called
266
    - mcp_resources_called (List[MCPResourceCall], optional): MCP resources called
267
    - mcp_prompts_called (List[MCPPromptCall], optional): MCP prompts called
268
    - additional_metadata (Dict, optional): Additional metadata
269
    """
270
```
271

272
Usage example:
273

274
```python
275
from deepeval.test_case import Turn, ToolCall
276

277
# Assistant turn with tool usage
278
turn = Turn(
279
    role="assistant",
280
    content="I've checked the weather for you. It's currently 72°F and sunny in New York.",
281
    tools_called=[
282
        ToolCall(
283
            name="get_weather",
284
            input_parameters={"city": "New York"},
285
            output={"temp": 72, "condition": "sunny"}
286
        )
287
    ],
288
    retrieval_context=["User prefers Fahrenheit for temperature"]
289
)
290
```
291

292
### Turn Parameters
293

294
Enumeration of turn parameters for use with conversational metrics.
295

296
```python { .api }
297
class TurnParams:
298
    """
299
    Enumeration of turn parameters.
300

301
    Values:
302
    - ROLE: "role"
303
    - CONTENT: "content"
304
    - SCENARIO: "scenario"
305
    - EXPECTED_OUTCOME: "expected_outcome"
306
    - RETRIEVAL_CONTEXT: "retrieval_context"
307
    - TOOLS_CALLED: "tools_called"
308
    - MCP_TOOLS: "mcp_tools_called"
309
    - MCP_RESOURCES: "mcp_resources_called"
310
    - MCP_PROMPTS: "mcp_prompts_called"
311
    """
312
```
313

314
### Multimodal LLM Test Case
315

316
Test case for evaluating multimodal LLM interactions involving text and images.
317

318
```python { .api }
319
class MLLMTestCase:
320
    """
321
    Represents a test case for multimodal LLMs (text + images).
322

323
    Parameters:
324
    - input (List[Union[str, MLLMImage]]): Input with text and images
325
    - actual_output (List[Union[str, MLLMImage]]): Actual output
326
    - expected_output (List[Union[str, MLLMImage]], optional): Expected output
327
    - context (List[Union[str, MLLMImage]], optional): Context
328
    - retrieval_context (List[Union[str, MLLMImage]], optional): Retrieved context
329
    - additional_metadata (Dict, optional): Additional metadata
330
    - comments (str, optional): Comments
331
    - tools_called (List[ToolCall], optional): Tools called
332
    - expected_tools (List[ToolCall], optional): Expected tools
333
    - token_cost (float, optional): Token cost
334
    - completion_time (float, optional): Completion time in seconds
335
    - name (str, optional): Name
336
    """
337
```
338

339
Usage example:
340

341
```python
342
from deepeval.test_case import MLLMTestCase, MLLMImage
343

344
# Image description test case
345
mllm_test_case = MLLMTestCase(
346
    input=[
347
        "Describe what you see in this image:",
348
        MLLMImage(url="path/to/image.jpg", local=True)
349
    ],
350
    actual_output=["A golden retriever playing in a park with a red ball."],
351
    expected_output=["A dog playing with a ball in a park."]
352
)
353

354
# Visual question answering
355
vqa_test_case = MLLMTestCase(
356
    input=[
357
        "What color is the car in the image?",
358
        MLLMImage(url="https://example.com/car.jpg")
359
    ],
360
    actual_output=["The car is red."],
361
    expected_output=["Red"]
362
)
363
```
364

365
### MLLM Image
366

367
Represents an image in a multimodal test case.
368

369
```python { .api }
370
class MLLMImage:
371
    """
372
    Represents an image in a multimodal test case.
373

374
    Parameters:
375
    - url (str): URL or file path to the image
376
    - local (bool, optional): Whether the image is local (default: False)
377

378
    Computed Attributes (only populated for local images):
379
    - filename (Optional[str]): Filename extracted from URL
380
    - mimeType (Optional[str]): MIME type of the image
381
    - dataBase64 (Optional[str]): Base64 encoded image data
382

383
    Static Methods:
384
    - process_url(url: str) -> str: Processes a URL and returns the processed path
385
    - is_local_path(url: str) -> bool: Determines if a URL is a local file path
386
    """
387
```
388

389
Usage example:
390

391
```python
392
from deepeval.test_case import MLLMImage
393

394
# Local image
395
local_image = MLLMImage(
396
    url="/path/to/local/image.png",
397
    local=True
398
)
399

400
# Remote image
401
remote_image = MLLMImage(
402
    url="https://example.com/image.jpg"
403
)
404
```
405

406
### MLLM Test Case Parameters
407

408
Enumeration of multimodal test case parameters.
409

410
```python { .api }
411
class MLLMTestCaseParams:
412
    """
413
    Enumeration of multimodal test case parameters.
414

415
    Values:
416
    - INPUT: "input"
417
    - ACTUAL_OUTPUT: "actual_output"
418
    - EXPECTED_OUTPUT: "expected_output"
419
    - CONTEXT: "context"
420
    - RETRIEVAL_CONTEXT: "retrieval_context"
421
    - TOOLS_CALLED: "tools_called"
422
    - EXPECTED_TOOLS: "expected_tools"
423
    """
424
```
425

426
### Arena Test Case
427

428
Test case for comparing multiple LLM outputs in arena-style evaluation.
429

430
```python { .api }
431
class ArenaTestCase:
432
    """
433
    Represents a test case for comparing multiple LLM outputs (arena-style).
434

435
    Parameters:
436
    - contestants (Dict[str, LLMTestCase]): Dictionary mapping contestant names to test cases
437
    """
438
```
439

440
Usage example:
441

442
```python
443
from deepeval.test_case import ArenaTestCase, LLMTestCase
444
from deepeval.metrics import ArenaGEval
445

446
# Compare outputs from different models
447
arena_test = ArenaTestCase(
448
    contestants={
449
        "gpt-4": LLMTestCase(
450
            input="Write a haiku about coding",
451
            actual_output="Lines of code flow\\nBugs emerge, then disappear\\nSoftware takes its form"
452
        ),
453
        "claude-3": LLMTestCase(
454
            input="Write a haiku about coding",
455
            actual_output="Keys click through the night\\nAlgorithms come alive\\nCode compiles at dawn"
456
        ),
457
        "gemini-pro": LLMTestCase(
458
            input="Write a haiku about coding",
459
            actual_output="Functions nested deep\\nVariables dance in loops\\nPrograms bloom to life"
460
        )
461
    }
462
)
463

464
# Evaluate which is best
465
arena_metric = ArenaGEval(
466
    name="Haiku Quality",
467
    criteria="Determine which haiku best captures the essence of coding"
468
)
469
arena_metric.measure(arena_test)
470
print(f"Winner: {arena_metric.winner}")  # Returns name of winning contestant
471
```
472

473
### Arena
474

475
Container for multiple arena test cases.
476

477
```python { .api }
478
class Arena:
479
    """
480
    Container for managing multiple arena test cases.
481

482
    Parameters:
483
    - test_cases (List[ArenaTestCase]): List of arena test cases to manage
484
    """
485
```
486

487
Usage example:
488

489
```python
490
from deepeval.test_case import Arena, ArenaTestCase, LLMTestCase
491

492
# Create multiple arena test cases
493
arena = Arena(test_cases=[
494
    ArenaTestCase(contestants={
495
        "model-a": LLMTestCase(input="Question 1", actual_output="Answer A1"),
496
        "model-b": LLMTestCase(input="Question 1", actual_output="Answer B1")
497
    }),
498
    ArenaTestCase(contestants={
499
        "model-a": LLMTestCase(input="Question 2", actual_output="Answer A2"),
500
        "model-b": LLMTestCase(input="Question 2", actual_output="Answer B2")
501
    })
502
])
503
```
504

505
### MCP Types
506

507
Model Context Protocol (MCP) support for advanced tool and resource management.
508

509
```python { .api }
510
class MCPServer:
511
    """
512
    Represents an MCP (Model Context Protocol) server configuration.
513

514
    Parameters:
515
    - server_name (str): Name of the server
516
    - transport (Literal["stdio", "sse", "streamable-http"], optional): Transport protocol
517
    - available_tools (List, optional): Available tools
518
    - available_resources (List, optional): Available resources
519
    - available_prompts (List, optional): Available prompts
520
    """
521

522
class MCPToolCall(BaseModel):
523
    """
524
    Represents an MCP tool call.
525

526
    Parameters:
527
    - name (str): Name of the tool
528
    - args (Dict): Tool arguments
529
    - result (object): Tool execution result
530
    """
531

532
class MCPResourceCall(BaseModel):
533
    """
534
    Represents an MCP resource call.
535

536
    Parameters:
537
    - uri (AnyUrl): URI of the resource (pydantic AnyUrl type)
538
    - result (object): Resource retrieval result
539
    """
540

541
class MCPPromptCall(BaseModel):
542
    """
543
    Represents an MCP prompt call.
544

545
    Parameters:
546
    - name (str): Name of the prompt
547
    - result (object): Prompt execution result
548
    """
549
```
550

551
Usage example:
552

553
```python
554
from deepeval.test_case import LLMTestCase, MCPServer, MCPToolCall
555

556
# Test case with MCP server usage
557
mcp_test_case = LLMTestCase(
558
    input="Search for Python tutorials",
559
    actual_output="Here are the top Python tutorials I found...",
560
    mcp_servers=[
561
        MCPServer(
562
            server_name="search-server",
563
            transport="stdio",
564
            available_tools=["web_search", "database_query"]
565
        )
566
    ],
567
    mcp_tools_called=[
568
        MCPToolCall(
569
            name="web_search",
570
            args={"query": "Python tutorials", "limit": 10},
571
            result={"count": 10, "results": [...]}
572
        )
573
    ]
574
)
575
```
576

Version

Tile

Files

test-cases.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

test-cases.mddocs/