Multi-module test support framework for Embabel Agent applications providing integration testing, mock AI services, and test configuration utilities
Common testing patterns and recipes for Embabel Agent test support framework.
Stub LLM response and execute code.
@Test
void testSimpleStubAndExecute() {
// 1. Stub
whenGenerateText(prompt -> prompt.contains("hello"))
.thenReturn("Hello, world!");
// 2. Execute
String result = myAgent.greet();
// 3. Assert
assertEquals("Hello, world!", result);
}When to use: Basic single LLM call testing.
Stub structured object extraction.
@Test
void testObjectExtraction() {
// Create expected object
Person expected = new Person("Alice", 30);
// Stub
whenCreateObject(p -> p.contains("extract"), Person.class)
.thenReturn(expected);
// Execute
Person result = myAgent.extractPerson("Alice is 30 years old");
// Assert
assertEquals(expected, result);
}When to use: Testing LLM object creation/parsing.
Stub multiple unrelated operations.
@Test
void testMultipleStubs() {
// Stub operation A
whenGenerateText(p -> p.contains("greet"))
.thenReturn("Hello!");
// Stub operation B
whenGenerateText(p -> p.contains("farewell"))
.thenReturn("Goodbye!");
// Execute both
String greeting = myAgent.greet();
String farewell = myAgent.farewell();
// Assert both
assertEquals("Hello!", greeting);
assertEquals("Goodbye!", farewell);
}When to use: Testing multiple independent operations.
Verify LLM call and assert result.
@Test
void testVerifyAndAssert() {
// Execute
String result = myAgent.process("user input");
// Verify LLM call
verifyGenerateText(prompt ->
prompt.contains("user input") &&
prompt.contains("process")
);
// Assert result
assertNotNull(result);
assertTrue(result.length() > 0);
}When to use: Ensuring both LLM interaction and result correctness.
Ensure code doesn't use LLM.
@Test
void testNoLlmUsage() {
// Execute cached/fast path
String result = myAgent.getCached("key");
// Verify no LLM calls
verifyNoInteractions();
// Assert result from cache
assertNotNull(result);
}When to use: Testing caching, fast paths, or optimization logic.
Ensure only expected calls occurred.
@Test
void testOnlyExpectedCalls() {
myAgent.singleOperation();
// Verify expected call
verifyGenerateText(p -> p.contains("operation"));
// Ensure no other calls
verifyNoMoreInteractions();
}When to use: Preventing unexpected LLM calls.
Test multi-step workflows.
@Test
void testSequentialWorkflow() {
// Stub step 1
whenGenerateText(p -> p.contains("step1"))
.thenReturn("Result 1");
// Stub step 2
whenGenerateText(p -> p.contains("step2"))
.thenReturn("Result 2");
// Execute multi-step
WorkflowResult result = myAgent.executeWorkflow();
// Verify both steps
verifyGenerateText(p -> p.contains("step1"));
verifyGenerateText(p -> p.contains("step2"));
// Assert final result
assertNotNull(result);
}When to use: Testing complex multi-step agent workflows.
Test conditional execution paths.
@Test
void testConditionalBranch() {
// Stub path A
whenGenerateText(p -> p.contains("simple"))
.thenReturn("Simple result");
// Stub path B
whenGenerateText(p -> p.contains("complex"))
.thenReturn("Complex result");
// Test simple path
String simpleResult = myAgent.process(simpleInput);
verifyGenerateText(p -> p.contains("simple"));
assertEquals("Simple result", simpleResult);
// Test complex path
String complexResult = myAgent.process(complexInput);
verifyGenerateText(p -> p.contains("complex"));
assertEquals("Complex result", complexResult);
}When to use: Testing branching logic based on input.
Test iterative LLM calls.
@Test
void testLoopProcessing() {
// Stub for each iteration
whenGenerateText(p -> p.contains("item"))
.thenReturn("processed");
// Execute on multiple items
List<String> items = List.of("item1", "item2", "item3");
List<String> results = myAgent.processAll(items);
// Verify called for each
verify(llmOperations, times(3)).generateText(any(), any());
// Assert all processed
assertEquals(3, results.size());
}When to use: Testing batch processing or iteration.
Test error handling logic.
@Test
void testErrorHandling() {
// Stub to throw exception
whenGenerateText(p -> p.contains("fail"))
.thenThrow(new RuntimeException("LLM error"));
// Execute and expect exception handling
assertThrows(AgentException.class, () -> {
myAgent.processWithError();
});
}When to use: Testing error handling and resilience.
Test retry mechanisms.
@Test
void testRetryLogic() {
// First call fails, second succeeds
whenGenerateText(p -> p.contains("retry"))
.thenThrow(new RuntimeException("Temporary error"))
.thenReturn("Success after retry");
// Execute with retry
String result = myAgent.processWithRetry();
// Verify retried
verify(llmOperations, times(2)).generateText(any(), any());
// Assert final success
assertEquals("Success after retry", result);
}When to use: Testing retry and failure recovery.
Test fallback to alternative approach.
@Test
void testFallback() {
// Primary approach fails
whenGenerateText(p -> p.contains("primary"))
.thenThrow(new RuntimeException());
// Fallback succeeds
whenGenerateText(p -> p.contains("fallback"))
.thenReturn("Fallback result");
// Execute with fallback
String result = myAgent.processWithFallback();
// Verify fallback was used
verifyGenerateText(p -> p.contains("fallback"));
// Assert result
assertEquals("Fallback result", result);
}When to use: Testing graceful degradation.
Test storing and retrieving embeddings.
@Test
fun `test vector storage`() {
val embeddingModel = FakeEmbeddingModel()
val vectorStore = VectorStore(embeddingModel)
// Add documents
val docs = listOf(
Document("doc1"),
Document("doc2")
)
vectorStore.add(docs)
// Retrieve
val retrieved = vectorStore.getAll()
assertEquals(2, retrieved.size)
retrieved.forEach { doc ->
assertNotNull(doc.embedding)
assertEquals(1536, doc.embedding!!.size)
}
}When to use: Testing vector database integration.
Test search engine structure (not semantic quality).
@Test
fun `test search structure`() {
val model = FakeEmbeddingModel()
val searchEngine = SearchEngine(model)
// Index documents
searchEngine.index(listOf("doc1", "doc2", "doc3"))
// Search
val results = searchEngine.search("query", topK = 2)
// Assert structure
assertEquals(2, results.size)
results.forEach { result ->
assertNotNull(result.document)
assertTrue(result.score >= 0.0)
}
}When to use: Testing search engine structure without semantic validation.
Test processing large batches.
@Test
fun `test batch processing`() {
val model = FakeEmbeddingModel()
val processor = BatchProcessor(model)
// Large batch
val texts = (1..1000).map { "Document $it" }
// Process
val embeddings = processor.process(texts)
// Assert all processed
assertEquals(1000, embeddings.size)
embeddings.forEach {
assertEquals(1536, it.size)
}
}When to use: Testing batch processing and performance.
Test services with injected dependencies.
@SpringBootTest
@Import(FakeAiConfiguration::class)
class ServiceInjectionTest {
@Autowired
private lateinit var cheapest: LlmService<*>
@Autowired
private lateinit var myService: MyService
@Test
fun `test service with injected LLM`() {
val result = myService.process("input", cheapest)
assertNotNull(result)
}
}When to use: Testing Spring components with AI dependencies.
Combine Mockito stubs with Spring beans.
@SpringBootTest
@Import(FakeAiConfiguration::class)
class CombinedTest : EmbabelMockitoIntegrationTest() {
@Autowired
private lateinit var embeddingService: EmbeddingService
@Test
fun `test with stubbing and beans`() {
// Mockito stub
whenGenerateText { it.contains("test") }
.thenReturn("stubbed")
// Spring bean
val embedding = embeddingService.embed("test")
// Execute
val result = myAgent.process("test")
// Verify
verifyGenerateText { it.contains("test") }
assertEquals(1536, embedding.size)
}
}When to use: Combining multiple testing approaches.
Test code selects appropriate model tier.
@Test
fun `test tier selection`() {
// Simple task → cheap model
val simpleResult = processor.process(simpleTask, cheapest)
assertNotNull(simpleResult)
// Complex task → best model
val complexResult = processor.process(complexTask, best)
assertNotNull(complexResult)
}When to use: Testing model selection logic.
Test cost-aware execution.
@Test
fun `test cost optimization`() {
val optimizer = CostOptimizer(cheapest, best)
// Small input uses cheap
optimizer.process(smallInput)
verify(exactly = 1) { cheapest.generate(any()) }
// Large input uses best
optimizer.process(largeInput)
verify(exactly = 1) { best.generate(any()) }
}When to use: Testing cost optimization logic.
Test fallback from expensive to cheap model.
@Test
fun `test fallback to cheaper`() {
// Best model fails
every { best.generate(any()) } throws RuntimeException()
// Should fallback to cheapest
val result = processor.processWithFallback(input, best, cheapest)
// Verify fallback used
verify(exactly = 1) { cheapest.generate(any()) }
assertNotNull(result)
}When to use: Testing model fallback strategies.
Capture and inspect LLM call details.
@Test
void testArgumentCapture() {
myAgent.process("test input");
// Capture interaction
ArgumentCaptor<LlmInteraction> captor = captureLlmInteraction();
verifyGenerateText(p -> true);
// Inspect details
LlmInteraction interaction = captor.getValue();
assertEquals("gpt-4", interaction.getModel());
assertEquals(0.7, interaction.getTemperature());
assertTrue(interaction.getMaxTokens() > 0);
}When to use: Detailed validation of LLM configuration.
Use complex logic for matching.
@Test
void testComplexMatching() {
whenGenerateText(prompt ->
prompt.contains("analyze") &&
prompt.length() > 100 &&
prompt.toLowerCase().contains("data") &&
!prompt.contains("skip")
).thenReturn("Analysis result");
String result = myAgent.analyzeData(largeDataset);
assertEquals("Analysis result", result);
}When to use: Precise prompt matching requirements.
Generate responses based on input.
@Test
void testDynamicResponse() {
whenGenerateText(p -> true).thenAnswer(invocation -> {
List<Message> messages = invocation.getArgument(0);
String prompt = messages.get(0).getContent();
return "Processed: " + prompt.substring(0, 10);
});
String result = myAgent.process("long input text here");
assertTrue(result.startsWith("Processed:"));
}When to use: Response depends on actual input.
Test prompt construction and templates.
@Test
void testPromptTemplate() {
// Capture to inspect prompt
ArgumentCaptor<String> promptCaptor = capturePrompt();
whenGenerateText(p -> true).thenReturn("result");
myAgent.processWithTemplate(userData);
// Inspect constructed prompt
String actualPrompt = promptCaptor.getValue();
assertTrue(actualPrompt.contains("User: " + userData.getName()));
assertTrue(actualPrompt.contains("Role: " + userData.getRole()));
}When to use: Validating prompt construction logic.
// Bad: Too specific, brittle
whenGenerateText(p -> p.equals("Exact prompt with every word"))
.thenReturn("result");
// Good: Focus on key elements
whenGenerateText(p ->
p.contains("key concept") &&
p.contains("important data")
).thenReturn("result");// Bad: Only asserting result, not verifying LLM usage
@Test
void bad() {
whenGenerateText(p -> true).thenReturn("result");
String result = myAgent.process();
assertEquals("result", result); // Missing verification!
}
// Good: Verify LLM call
@Test
void good() {
whenGenerateText(p -> p.contains("expected")).thenReturn("result");
String result = myAgent.process();
verifyGenerateText(p -> p.contains("expected")); // Verify!
assertEquals("result", result);
}// Bad: Fake embeddings are random, not semantic
@Test
fun bad() {
val model = FakeEmbeddingModel()
val similarity = calculateSimilarity(
model.embed(Document("cat")),
model.embed(Document("dog"))
)
assertTrue(similarity > 0.8) // Will randomly fail!
}
// Good: Test structure, not semantics
@Test
fun good() {
val model = FakeEmbeddingModel()
val emb1 = model.embed(Document("cat"))
val emb2 = model.embed(Document("dog"))
assertEquals(1536, emb1.size) // Test structure
assertEquals(1536, emb2.size)
assertTrue(calculateSimilarity(emb1, emb2) in -1.0..1.0) // Valid range
}| Scenario | Recommended Pattern |
|---|---|
| Single LLM call | Pattern 1: Simple Stub and Execute |
| Object extraction | Pattern 2: Stub Object Creation |
| Multiple operations | Pattern 3: Multiple Independent Stubs |
| Verify interactions | Pattern 4: Verify and Assert |
| Test caching | Pattern 5: Verify No Interactions |
| Multi-step workflow | Pattern 7: Sequential Operations |
| Error handling | Pattern 10: Exception Handling |
| Retry logic | Pattern 11: Retry Logic |
| Embeddings | Pattern 13-15: Embedding patterns |
| Spring components | Pattern 16-17: Spring patterns |
| Model tiers | Pattern 18-20: Model tier patterns |