CtrlK

Community Documentation Log in Get started

tessl/maven-com-embabel-agent--embabel-agent-test-support

Multi-module test support framework for Embabel Agent applications providing integration testing, mock AI services, and test configuration utilities

Overview

Eval results

Files

Testing Patterns

Name: tessl/maven-com-embabel-agent--embabel-agent-test-support
Author: tessl

Common testing patterns and recipes for Embabel Agent test support framework.

Basic Stubbing Patterns

Pattern 1: Simple Stub and Execute

Stub LLM response and execute code.

@Test
void testSimpleStubAndExecute() {
    // 1. Stub
    whenGenerateText(prompt -> prompt.contains("hello"))
        .thenReturn("Hello, world!");

    // 2. Execute
    String result = myAgent.greet();

    // 3. Assert
    assertEquals("Hello, world!", result);
}

When to use: Basic single LLM call testing.

Pattern 2: Stub Object Creation

Stub structured object extraction.

@Test
void testObjectExtraction() {
    // Create expected object
    Person expected = new Person("Alice", 30);

    // Stub
    whenCreateObject(p -> p.contains("extract"), Person.class)
        .thenReturn(expected);

    // Execute
    Person result = myAgent.extractPerson("Alice is 30 years old");

    // Assert
    assertEquals(expected, result);
}

When to use: Testing LLM object creation/parsing.

Pattern 3: Multiple Independent Stubs

Stub multiple unrelated operations.

@Test
void testMultipleStubs() {
    // Stub operation A
    whenGenerateText(p -> p.contains("greet"))
        .thenReturn("Hello!");

    // Stub operation B
    whenGenerateText(p -> p.contains("farewell"))
        .thenReturn("Goodbye!");

    // Execute both
    String greeting = myAgent.greet();
    String farewell = myAgent.farewell();

    // Assert both
    assertEquals("Hello!", greeting);
    assertEquals("Goodbye!", farewell);
}

When to use: Testing multiple independent operations.

Verification Patterns

Pattern 4: Verify and Assert

Verify LLM call and assert result.

@Test
void testVerifyAndAssert() {
    // Execute
    String result = myAgent.process("user input");

    // Verify LLM call
    verifyGenerateText(prompt ->
        prompt.contains("user input") &&
        prompt.contains("process")
    );

    // Assert result
    assertNotNull(result);
    assertTrue(result.length() > 0);
}

When to use: Ensuring both LLM interaction and result correctness.

Pattern 5: Verify No Interactions

Ensure code doesn't use LLM.

@Test
void testNoLlmUsage() {
    // Execute cached/fast path
    String result = myAgent.getCached("key");

    // Verify no LLM calls
    verifyNoInteractions();

    // Assert result from cache
    assertNotNull(result);
}

When to use: Testing caching, fast paths, or optimization logic.

Pattern 6: Verify No More Interactions

Ensure only expected calls occurred.

@Test
void testOnlyExpectedCalls() {
    myAgent.singleOperation();

    // Verify expected call
    verifyGenerateText(p -> p.contains("operation"));

    // Ensure no other calls
    verifyNoMoreInteractions();
}

When to use: Preventing unexpected LLM calls.

Multi-Step Workflow Patterns

Pattern 7: Sequential Operations

Test multi-step workflows.

@Test
void testSequentialWorkflow() {
    // Stub step 1
    whenGenerateText(p -> p.contains("step1"))
        .thenReturn("Result 1");

    // Stub step 2
    whenGenerateText(p -> p.contains("step2"))
        .thenReturn("Result 2");

    // Execute multi-step
    WorkflowResult result = myAgent.executeWorkflow();

    // Verify both steps
    verifyGenerateText(p -> p.contains("step1"));
    verifyGenerateText(p -> p.contains("step2"));

    // Assert final result
    assertNotNull(result);
}

When to use: Testing complex multi-step agent workflows.

Pattern 8: Conditional Branching

Test conditional execution paths.

@Test
void testConditionalBranch() {
    // Stub path A
    whenGenerateText(p -> p.contains("simple"))
        .thenReturn("Simple result");

    // Stub path B
    whenGenerateText(p -> p.contains("complex"))
        .thenReturn("Complex result");

    // Test simple path
    String simpleResult = myAgent.process(simpleInput);
    verifyGenerateText(p -> p.contains("simple"));
    assertEquals("Simple result", simpleResult);

    // Test complex path
    String complexResult = myAgent.process(complexInput);
    verifyGenerateText(p -> p.contains("complex"));
    assertEquals("Complex result", complexResult);
}

When to use: Testing branching logic based on input.

Pattern 9: Loop Processing

Test iterative LLM calls.

@Test
void testLoopProcessing() {
    // Stub for each iteration
    whenGenerateText(p -> p.contains("item"))
        .thenReturn("processed");

    // Execute on multiple items
    List<String> items = List.of("item1", "item2", "item3");
    List<String> results = myAgent.processAll(items);

    // Verify called for each
    verify(llmOperations, times(3)).generateText(any(), any());

    // Assert all processed
    assertEquals(3, results.size());
}

When to use: Testing batch processing or iteration.

Error Handling Patterns

Pattern 10: Exception Handling

Test error handling logic.

@Test
void testErrorHandling() {
    // Stub to throw exception
    whenGenerateText(p -> p.contains("fail"))
        .thenThrow(new RuntimeException("LLM error"));

    // Execute and expect exception handling
    assertThrows(AgentException.class, () -> {
        myAgent.processWithError();
    });
}

When to use: Testing error handling and resilience.

Pattern 11: Retry Logic

Test retry mechanisms.

@Test
void testRetryLogic() {
    // First call fails, second succeeds
    whenGenerateText(p -> p.contains("retry"))
        .thenThrow(new RuntimeException("Temporary error"))
        .thenReturn("Success after retry");

    // Execute with retry
    String result = myAgent.processWithRetry();

    // Verify retried
    verify(llmOperations, times(2)).generateText(any(), any());

    // Assert final success
    assertEquals("Success after retry", result);
}

When to use: Testing retry and failure recovery.

Pattern 12: Fallback Strategy

Test fallback to alternative approach.

@Test
void testFallback() {
    // Primary approach fails
    whenGenerateText(p -> p.contains("primary"))
        .thenThrow(new RuntimeException());

    // Fallback succeeds
    whenGenerateText(p -> p.contains("fallback"))
        .thenReturn("Fallback result");

    // Execute with fallback
    String result = myAgent.processWithFallback();

    // Verify fallback was used
    verifyGenerateText(p -> p.contains("fallback"));

    // Assert result
    assertEquals("Fallback result", result);
}

When to use: Testing graceful degradation.

Embedding Testing Patterns

Pattern 13: Vector Storage Testing

Test storing and retrieving embeddings.

@Test
fun `test vector storage`() {
    val embeddingModel = FakeEmbeddingModel()
    val vectorStore = VectorStore(embeddingModel)

    // Add documents
    val docs = listOf(
        Document("doc1"),
        Document("doc2")
    )
    vectorStore.add(docs)

    // Retrieve
    val retrieved = vectorStore.getAll()

    assertEquals(2, retrieved.size)
    retrieved.forEach { doc ->
        assertNotNull(doc.embedding)
        assertEquals(1536, doc.embedding!!.size)
    }
}

When to use: Testing vector database integration.

Pattern 14: Semantic Search Structure

Test search engine structure (not semantic quality).

@Test
fun `test search structure`() {
    val model = FakeEmbeddingModel()
    val searchEngine = SearchEngine(model)

    // Index documents
    searchEngine.index(listOf("doc1", "doc2", "doc3"))

    // Search
    val results = searchEngine.search("query", topK = 2)

    // Assert structure
    assertEquals(2, results.size)
    results.forEach { result ->
        assertNotNull(result.document)
        assertTrue(result.score >= 0.0)
    }
}

When to use: Testing search engine structure without semantic validation.

Pattern 15: Batch Embedding Processing

Test processing large batches.

@Test
fun `test batch processing`() {
    val model = FakeEmbeddingModel()
    val processor = BatchProcessor(model)

    // Large batch
    val texts = (1..1000).map { "Document $it" }

    // Process
    val embeddings = processor.process(texts)

    // Assert all processed
    assertEquals(1000, embeddings.size)
    embeddings.forEach {
        assertEquals(1536, it.size)
    }
}

When to use: Testing batch processing and performance.

Spring Integration Patterns

Pattern 16: Service Injection Testing

Test services with injected dependencies.

@SpringBootTest
@Import(FakeAiConfiguration::class)
class ServiceInjectionTest {

    @Autowired
    private lateinit var cheapest: LlmService<*>

    @Autowired
    private lateinit var myService: MyService

    @Test
    fun `test service with injected LLM`() {
        val result = myService.process("input", cheapest)
        assertNotNull(result)
    }
}

When to use: Testing Spring components with AI dependencies.

Pattern 17: Combined Stubbing and Beans

Combine Mockito stubs with Spring beans.

@SpringBootTest
@Import(FakeAiConfiguration::class)
class CombinedTest : EmbabelMockitoIntegrationTest() {

    @Autowired
    private lateinit var embeddingService: EmbeddingService

    @Test
    fun `test with stubbing and beans`() {
        // Mockito stub
        whenGenerateText { it.contains("test") }
            .thenReturn("stubbed")

        // Spring bean
        val embedding = embeddingService.embed("test")

        // Execute
        val result = myAgent.process("test")

        // Verify
        verifyGenerateText { it.contains("test") }
        assertEquals(1536, embedding.size)
    }
}

When to use: Combining multiple testing approaches.

Model Tier Patterns

Pattern 18: Tier Selection Testing

Test code selects appropriate model tier.

@Test
fun `test tier selection`() {
    // Simple task → cheap model
    val simpleResult = processor.process(simpleTask, cheapest)
    assertNotNull(simpleResult)

    // Complex task → best model
    val complexResult = processor.process(complexTask, best)
    assertNotNull(complexResult)
}

When to use: Testing model selection logic.

Pattern 19: Cost Optimization Testing

Test cost-aware execution.

@Test
fun `test cost optimization`() {
    val optimizer = CostOptimizer(cheapest, best)

    // Small input uses cheap
    optimizer.process(smallInput)
    verify(exactly = 1) { cheapest.generate(any()) }

    // Large input uses best
    optimizer.process(largeInput)
    verify(exactly = 1) { best.generate(any()) }
}

When to use: Testing cost optimization logic.

Pattern 20: Fallback to Cheaper Model

Test fallback from expensive to cheap model.

@Test
fun `test fallback to cheaper`() {
    // Best model fails
    every { best.generate(any()) } throws RuntimeException()

    // Should fallback to cheapest
    val result = processor.processWithFallback(input, best, cheapest)

    // Verify fallback used
    verify(exactly = 1) { cheapest.generate(any()) }
    assertNotNull(result)
}

When to use: Testing model fallback strategies.

Advanced Patterns

Pattern 21: Argument Capture and Inspection

Capture and inspect LLM call details.

@Test
void testArgumentCapture() {
    myAgent.process("test input");

    // Capture interaction
    ArgumentCaptor<LlmInteraction> captor = captureLlmInteraction();
    verifyGenerateText(p -> true);

    // Inspect details
    LlmInteraction interaction = captor.getValue();
    assertEquals("gpt-4", interaction.getModel());
    assertEquals(0.7, interaction.getTemperature());
    assertTrue(interaction.getMaxTokens() > 0);
}

When to use: Detailed validation of LLM configuration.

Pattern 22: Complex Predicate Matching

Use complex logic for matching.

@Test
void testComplexMatching() {
    whenGenerateText(prompt ->
        prompt.contains("analyze") &&
        prompt.length() > 100 &&
        prompt.toLowerCase().contains("data") &&
        !prompt.contains("skip")
    ).thenReturn("Analysis result");

    String result = myAgent.analyzeData(largeDataset);

    assertEquals("Analysis result", result);
}

When to use: Precise prompt matching requirements.

Pattern 23: Dynamic Response Based on Input

Generate responses based on input.

@Test
void testDynamicResponse() {
    whenGenerateText(p -> true).thenAnswer(invocation -> {
        List<Message> messages = invocation.getArgument(0);
        String prompt = messages.get(0).getContent();
        return "Processed: " + prompt.substring(0, 10);
    });

    String result = myAgent.process("long input text here");

    assertTrue(result.startsWith("Processed:"));
}

When to use: Response depends on actual input.

Pattern 24: Testing Prompt Templates

Test prompt construction and templates.

@Test
void testPromptTemplate() {
    // Capture to inspect prompt
    ArgumentCaptor<String> promptCaptor = capturePrompt();

    whenGenerateText(p -> true).thenReturn("result");

    myAgent.processWithTemplate(userData);

    // Inspect constructed prompt
    String actualPrompt = promptCaptor.getValue();
    assertTrue(actualPrompt.contains("User: " + userData.getName()));
    assertTrue(actualPrompt.contains("Role: " + userData.getRole()));
}

When to use: Validating prompt construction logic.

Testing Anti-Patterns to Avoid

Anti-Pattern 1: Over-Specific Predicates

// Bad: Too specific, brittle
whenGenerateText(p -> p.equals("Exact prompt with every word"))
    .thenReturn("result");

// Good: Focus on key elements
whenGenerateText(p ->
    p.contains("key concept") &&
    p.contains("important data")
).thenReturn("result");

Anti-Pattern 2: Not Verifying LLM Calls

// Bad: Only asserting result, not verifying LLM usage
@Test
void bad() {
    whenGenerateText(p -> true).thenReturn("result");
    String result = myAgent.process();
    assertEquals("result", result);  // Missing verification!
}

// Good: Verify LLM call
@Test
void good() {
    whenGenerateText(p -> p.contains("expected")).thenReturn("result");
    String result = myAgent.process();
    verifyGenerateText(p -> p.contains("expected"));  // Verify!
    assertEquals("result", result);
}

Anti-Pattern 3: Testing Semantic Quality with Fake Embeddings

// Bad: Fake embeddings are random, not semantic
@Test
fun bad() {
    val model = FakeEmbeddingModel()
    val similarity = calculateSimilarity(
        model.embed(Document("cat")),
        model.embed(Document("dog"))
    )
    assertTrue(similarity > 0.8)  // Will randomly fail!
}

// Good: Test structure, not semantics
@Test
fun good() {
    val model = FakeEmbeddingModel()
    val emb1 = model.embed(Document("cat"))
    val emb2 = model.embed(Document("dog"))

    assertEquals(1536, emb1.size)  // Test structure
    assertEquals(1536, emb2.size)
    assertTrue(calculateSimilarity(emb1, emb2) in -1.0..1.0)  // Valid range
}

Pattern Selection Guide

Scenario	Recommended Pattern
Single LLM call	Pattern 1: Simple Stub and Execute
Object extraction	Pattern 2: Stub Object Creation
Multiple operations	Pattern 3: Multiple Independent Stubs
Verify interactions	Pattern 4: Verify and Assert
Test caching	Pattern 5: Verify No Interactions
Multi-step workflow	Pattern 7: Sequential Operations
Error handling	Pattern 10: Exception Handling
Retry logic	Pattern 11: Retry Logic
Embeddings	Pattern 13-15: Embedding patterns
Spring components	Pattern 16-17: Spring patterns
Model tiers	Pattern 18-20: Model tier patterns

Next Steps

Stubbing Guide - Detailed stubbing
Verification Guide - Detailed verification
Your First Test - Getting started
Common Tasks - Quick reference

tessl i tessl/maven-com-embabel-agent--embabel-agent-test-support@0.3.0

tessl/maven-com-embabel-agent--embabel-agent-test-support