CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j

Build LLM-powered applications in Java with support for chatbots, agents, RAG, tools, and much more

Overview
Eval results
Files

data-types.mddocs/

Data Types

Core data types for working with documents, embeddings, tools, and structured data. These types provide the foundation for document processing, semantic search, tool execution, and structured output parsing.

Overview

LangChain4j data types are designed to be:

  • Immutable (mostly): Prefer immutable designs for thread safety
  • Type-Safe: Strong typing prevents runtime errors
  • Interoperable: Work across all LangChain4j components
  • Efficient: Optimized for performance-critical paths

Capabilities

Document

Interface representing an unstructured text document with metadata.

package dev.langchain4j.data.document;

/**
 * Represents a document (unstructured text) with associated metadata
 */
public interface Document {
    /**
     * Common metadata key for file name
     */
    String FILE_NAME = "file_name";

    /**
     * Common metadata key for absolute directory path
     */
    String ABSOLUTE_DIRECTORY_PATH = "absolute_directory_path";

    /**
     * Common metadata key for URL
     */
    String URL = "url";

    /**
     * Get document text
     * @return Document text
     */
    String text();

    /**
     * Get document metadata
     * @return Metadata
     */
    Metadata metadata();

    /**
     * Convert to text segment
     * @return TextSegment with same content
     */
    TextSegment toTextSegment();

    /**
     * Create document from text
     * @param text Document text
     * @return Document instance
     */
    static Document from(String text);

    /**
     * Create document from text and metadata
     * @param text Document text
     * @param metadata Document metadata
     * @return Document instance
     */
    static Document from(String text, Metadata metadata);

    /**
     * Create document from text (alias)
     * @param text Document text
     * @return Document instance
     */
    static Document document(String text);

    /**
     * Create document from text and metadata (alias)
     * @param text Document text
     * @param metadata Document metadata
     * @return Document instance
     */
    static Document document(String text, Metadata metadata);
}

Thread Safety:

  • Document instances are immutable (text and metadata are set at construction)
  • The default implementation is thread-safe for read operations
  • Metadata object may be mutable (see Metadata section)
  • Safe to share across threads once constructed

Common Pitfalls:

  • DO NOT mutate the returned Metadata object if sharing Document across threads
  • DO NOT assume text() is null-safe - always check for null
  • DO NOT use Document for large files (>100MB) - consider streaming
  • DO NOT store binary data in text field - use appropriate Document types

Edge Cases:

  • Empty string text is valid (not null)
  • Null metadata defaults to empty Metadata
  • Very large documents may cause memory issues
  • toTextSegment() creates new object (not cached)

Exception Handling:

  • Construction methods can throw NullPointerException if text is null
  • No checked exceptions thrown
  • text() may return null in custom implementations

Performance Notes:

  • Document objects are lightweight (just pointers to text and metadata)
  • text() is O(1) - returns reference
  • metadata() is O(1) - returns reference
  • toTextSegment() creates new object - O(1) but allocates memory
  • Standard metadata keys are constants for efficient comparison

Usage Example:

// Simple document
Document doc1 = Document.from("Hello, world!");

// Document with metadata
Metadata metadata = new Metadata()
    .put(Document.FILE_NAME, "greeting.txt")
    .put(Document.ABSOLUTE_DIRECTORY_PATH, "/docs");
Document doc2 = Document.from("Hello, world!", metadata);

// Convert to segment for embedding
TextSegment segment = doc2.toTextSegment();

Related APIs:

  • TextSegment - Document chunk for embeddings
  • Metadata - Metadata container
  • DocumentParser - Parse files into Documents
  • DocumentSplitter - Split documents into segments

Metadata

Key-value storage for document and segment metadata. Supports typed access for String, UUID, Integer, Long, Float, and Double values.

package dev.langchain4j.data.document;

import java.util.Map;
import java.util.UUID;

/**
 * Metadata container with typed accessors
 * Supports String, UUID, Integer, Long, Float, Double values
 */
public class Metadata {
    /**
     * Create empty metadata
     */
    public Metadata();

    /**
     * Create from map
     * @param metadata Initial metadata
     */
    public Metadata(Map<String, ?> metadata);

    /**
     * Get string value
     * @param key Metadata key
     * @return String value or null
     */
    public String getString(String key);

    /**
     * Get UUID value
     * @param key Metadata key
     * @return UUID value or null
     */
    public UUID getUUID(String key);

    /**
     * Get integer value
     * @param key Metadata key
     * @return Integer value or null
     */
    public Integer getInteger(String key);

    /**
     * Get long value
     * @param key Metadata key
     * @return Long value or null
     */
    public Long getLong(String key);

    /**
     * Get float value
     * @param key Metadata key
     * @return Float value or null
     */
    public Float getFloat(String key);

    /**
     * Get double value
     * @param key Metadata key
     * @return Double value or null
     */
    public Double getDouble(String key);

    /**
     * Check if key exists
     * @param key Metadata key
     * @return true if key exists
     */
    public boolean containsKey(String key);

    /**
     * Put string value
     * @param key Metadata key
     * @param value String value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, String value);

    /**
     * Put UUID value
     * @param key Metadata key
     * @param value UUID value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, UUID value);

    /**
     * Put integer value
     * @param key Metadata key
     * @param value Integer value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, int value);

    /**
     * Put long value
     * @param key Metadata key
     * @param value Long value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, long value);

    /**
     * Put float value
     * @param key Metadata key
     * @param value Float value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, float value);

    /**
     * Put double value
     * @param key Metadata key
     * @param value Double value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, double value);

    /**
     * Put all from map
     * @param metadata Map of metadata
     * @return This metadata (fluent)
     */
    public Metadata putAll(Map<String, Object> metadata);

    /**
     * Remove key
     * @param key Metadata key
     * @return This metadata (fluent)
     */
    public Metadata remove(String key);

    /**
     * Create a copy
     * @return New metadata with same values
     */
    public Metadata copy();

    /**
     * Convert to map
     * @return Map representation
     */
    public Map<String, Object> toMap();

    /**
     * Merge with another metadata
     * @param another Metadata to merge
     * @return This metadata with merged values (fluent)
     */
    public Metadata merge(Metadata another);

    /**
     * Create from single key-value pair
     * @param key Metadata key
     * @param value String value
     * @return New metadata instance
     */
    public static Metadata from(String key, String value);

    /**
     * Create from map
     * @param metadata Map of metadata
     * @return New metadata instance
     */
    public static Metadata from(Map<String, ?> metadata);

    /**
     * Create from single key-value pair (alias)
     * @param key Metadata key
     * @param value String value
     * @return New metadata instance
     */
    public static Metadata metadata(String key, String value);
}

Thread Safety:

  • Metadata is MUTABLE and NOT thread-safe
  • Concurrent modifications without synchronization lead to undefined behavior
  • Use copy() to create thread-safe snapshots
  • Recommended: Create once, don't mutate after sharing
  • For concurrent access, use external synchronization or use immutable copies

Common Pitfalls:

  • DO NOT share and mutate Metadata across threads without synchronization
  • DO NOT assume type safety - wrong getter returns null, not exception
  • DO NOT use arbitrary Object types - only supported types work
  • DO NOT modify Metadata after adding to Document/TextSegment if shared
  • DO NOT rely on insertion order - use LinkedHashMap if order matters

Edge Cases:

  • Null keys throw NullPointerException
  • Null values are stored as null (not removed)
  • Getting wrong type returns null (not exception)
  • Empty string keys are valid
  • copy() is deep copy - safe to modify independently
  • merge() overwrites existing keys (last write wins)
  • toMap() returns defensive copy (modifications don't affect Metadata)

Exception Handling:

  • put() throws NullPointerException if key is null
  • No exceptions for type mismatches (returns null)
  • No exceptions for missing keys (returns null)

Performance Notes:

  • Backed by HashMap - O(1) get/put operations
  • copy() is O(n) where n is number of entries
  • merge() is O(m) where m is entries in other Metadata
  • toMap() creates defensive copy - O(n) allocation
  • Typed getters avoid boxing for primitive types
  • containsKey() is faster than checking get() != null

Usage Example:

// Fluent builder pattern
Metadata metadata = new Metadata()
    .put("file_name", "document.pdf")
    .put("page_number", 5)
    .put("confidence", 0.95)
    .put("document_id", UUID.randomUUID());

// Type-safe access
String fileName = metadata.getString("file_name"); // "document.pdf"
Integer pageNumber = metadata.getInteger("page_number"); // 5
Double confidence = metadata.getDouble("confidence"); // 0.95
UUID docId = metadata.getUUID("document_id");

// Wrong type returns null
UUID wrongType = metadata.getUUID("file_name"); // null, not exception

// Thread-safe snapshot
Metadata snapshot = metadata.copy();
// Safe to mutate snapshot independently

// Merge metadata
Metadata additional = new Metadata().put("author", "John Doe");
metadata.merge(additional); // metadata now has all keys

Testing Patterns:

@Test
void testMetadata() {
    Metadata metadata = new Metadata()
        .put("key", "value")
        .put("count", 42);

    assertEquals("value", metadata.getString("key"));
    assertEquals(Integer.valueOf(42), metadata.getInteger("count"));
    assertNull(metadata.getString("missing")); // Missing key returns null
    assertNull(metadata.getInteger("key")); // Wrong type returns null

    // Test copy independence
    Metadata copy = metadata.copy();
    copy.put("key", "modified");
    assertEquals("value", metadata.getString("key")); // Original unchanged
}

Related APIs:

  • Document - Uses Metadata for document attributes
  • TextSegment - Uses Metadata for segment attributes
  • ContentRetriever - Filters based on metadata
  • EmbeddingStore - Stores embeddings with metadata

TextSegment

Represents a chunk or segment of text with associated metadata. Used for embeddings and RAG.

package dev.langchain4j.data.segment;

import dev.langchain4j.data.document.Metadata;

/**
 * Represents a text segment with metadata
 * Typically a chunk of a larger document
 */
public class TextSegment {
    /**
     * Create text segment
     * @param text Segment text
     * @param metadata Segment metadata
     */
    public TextSegment(String text, Metadata metadata);

    /**
     * Get segment text
     * @return Text content
     */
    public String text();

    /**
     * Get segment metadata
     * @return Metadata
     */
    public Metadata metadata();

    /**
     * Create from text
     * @param text Segment text
     * @return TextSegment instance
     */
    public static TextSegment from(String text);

    /**
     * Create from text and metadata
     * @param text Segment text
     * @param metadata Segment metadata
     * @return TextSegment instance
     */
    public static TextSegment from(String text, Metadata metadata);

    /**
     * Create from text (alias)
     * @param text Segment text
     * @return TextSegment instance
     */
    public static TextSegment textSegment(String text);

    /**
     * Create from text and metadata (alias)
     * @param text Segment text
     * @param metadata Segment metadata
     * @return TextSegment instance
     */
    public static TextSegment textSegment(String text, Metadata metadata);
}

Thread Safety:

  • TextSegment is immutable - text and metadata reference set at construction
  • However, Metadata itself is mutable - don't mutate after construction if sharing
  • Safe to share across threads if Metadata is not mutated
  • Recommended: Don't modify metadata() after creating TextSegment

Common Pitfalls:

  • DO NOT mutate metadata() after construction if sharing across threads
  • DO NOT create very large segments (>8KB text) - hurts embedding quality
  • DO NOT create very small segments (<50 chars) - loses context
  • DO NOT forget to include relevant metadata for filtering
  • DO NOT reuse metadata objects across segments without copy()

Edge Cases:

  • Empty string text is valid
  • Null text in constructor may throw NPE (implementation dependent)
  • Null metadata defaults to empty Metadata
  • No automatic text normalization (preserve whitespace/formatting)
  • No text length limits enforced (but embedding models have limits)

Exception Handling:

  • Constructor may throw NullPointerException if text is null
  • No checked exceptions
  • text() and metadata() don't throw exceptions

Performance Notes:

  • TextSegment is lightweight wrapper (two references)
  • text() and metadata() are O(1)
  • Typically created in bulk by DocumentSplitter
  • Keep segment size reasonable (512-1024 tokens recommended)
  • Metadata adds minimal overhead (few KB per segment)

Recommended Segment Sizes:

  • Small segments (128-256 tokens): High precision, less context
  • Medium segments (512-1024 tokens): Balanced - recommended for most use cases
  • Large segments (2048+ tokens): More context, lower precision
  • Consider overlap between segments (e.g., 20% overlap)

Usage Example:

// Simple segment
TextSegment segment1 = TextSegment.from("This is a text chunk.");

// Segment with metadata
Metadata metadata = new Metadata()
    .put("document_id", UUID.randomUUID())
    .put("chunk_index", 0)
    .put("source", "manual.pdf");
TextSegment segment2 = TextSegment.from("Text content", metadata);

// Common pattern: Document → Segments
Document document = Document.from(longText);
DocumentSplitter splitter = DocumentSplitters.recursive(1000, 200);
List<TextSegment> segments = splitter.split(document);

// Each segment can be embedded
for (TextSegment segment : segments) {
    Embedding embedding = embeddingModel.embed(segment).content();
    embeddingStore.add(embedding, segment);
}

Testing Patterns:

@Test
void testTextSegment() {
    Metadata metadata = new Metadata().put("key", "value");
    TextSegment segment = TextSegment.from("content", metadata);

    assertEquals("content", segment.text());
    assertEquals("value", segment.metadata().getString("key"));

    // Verify immutability concern
    segment.metadata().put("key", "modified");
    assertEquals("modified", segment.metadata().getString("key"));
    // Note: metadata was mutated! Use copy() if sharing.
}

Related APIs:

  • Document - Source of segments
  • DocumentSplitter - Creates segments from documents
  • Embedding - Vector representation of segment
  • EmbeddingStore - Stores segments with embeddings
  • ContentRetriever - Retrieves relevant segments

Embedding

Represents a dense vector representation of text for semantic search and similarity operations.

package dev.langchain4j.data.embedding;

import java.util.List;

/**
 * Represents a text embedding (vector representation)
 */
public class Embedding {
    /**
     * Create embedding from vector
     * @param vector Float array vector
     */
    public Embedding(float[] vector);

    /**
     * Get vector as array
     * @return Float array vector
     */
    public float[] vector();

    /**
     * Get vector as list
     * @return List of Float values
     */
    public List<Float> vectorAsList();

    /**
     * Normalize the embedding vector in-place
     */
    public void normalize();

    /**
     * Get embedding dimension
     * @return Vector dimension
     */
    public int dimension();

    /**
     * Create from array
     * @param vector Float array vector
     * @return Embedding instance
     */
    public static Embedding from(float[] vector);

    /**
     * Create from list
     * @param vector List of Float values
     * @return Embedding instance
     */
    public static Embedding from(List<Float> vector);
}

Thread Safety:

  • Embedding is MUTABLE - normalize() modifies internal vector
  • NOT thread-safe if normalize() is called concurrently
  • vector() returns reference to internal array (not defensive copy)
  • Concurrent reads are safe if no writes occur
  • Recommended: Don't modify after creation, or synchronize access

Common Pitfalls:

  • DO NOT modify returned vector array - affects internal state
  • DO NOT assume embeddings are normalized - call normalize() if needed
  • DO NOT mix embeddings from different models (different dimensions)
  • DO NOT store embeddings as doubles - use float for efficiency
  • DO NOT call normalize() multiple times (no-op after first call, but wasteful)

Edge Cases:

  • Empty vector (dimension 0) is technically valid but semantically meaningless
  • Null vector in constructor throws NullPointerException
  • normalize() on zero vector results in NaN values
  • Very large dimensions (>10K) are rare but supported
  • Negative values are valid (embeddings can have negative components)

Exception Handling:

  • Constructor throws NullPointerException if vector is null
  • normalize() may produce NaN if vector is all zeros
  • No exceptions for dimension mismatches (caller's responsibility)

Performance Notes:

  • Embedding stores float array directly - no boxing overhead
  • vector() returns reference - O(1), no allocation
  • vectorAsList() boxes floats to Float - O(n) allocation, avoid if possible
  • normalize() is O(n) - modifies in place
  • dimension() is O(1) - just array length
  • Typical embedding sizes: 384 (BERT-small), 768 (BERT-base), 1536 (OpenAI), 3072 (large models)

Memory Usage:

  • float[n] uses 4n bytes plus object overhead (~16 bytes)
  • Example: 1536-dimensional embedding ≈ 6KB
  • 1 million embeddings ≈ 6GB RAM (without compression)

Normalization:

  • L2 normalization: divides each component by vector magnitude
  • Required for some similarity metrics (cosine similarity with normalized vectors = dot product)
  • Many embedding models return pre-normalized vectors
  • Check model documentation before normalizing

Usage Example:

// Create from array
float[] vector = new float[]{0.1f, 0.2f, 0.3f};
Embedding embedding = Embedding.from(vector);

// Get dimension
int dim = embedding.dimension(); // 3

// Normalize (modifies in-place)
embedding.normalize();

// Calculate cosine similarity (assuming both normalized)
float similarity = cosineSimilarity(embedding1, embedding2);

// Common pattern: Embed text
Response<Embedding> response = embeddingModel.embed("Hello world");
Embedding embedding = response.content();

// Store with metadata
embeddingStore.add(embedding, textSegment);

Similarity Calculations:

// Cosine similarity (for normalized embeddings)
public float cosineSimilarity(Embedding e1, Embedding e2) {
    float[] v1 = e1.vector();
    float[] v2 = e2.vector();

    float dotProduct = 0.0f;
    for (int i = 0; i < v1.length; i++) {
        dotProduct += v1[i] * v2[i];
    }
    return dotProduct; // Already normalized, so dot product = cosine similarity
}

// Euclidean distance
public float euclideanDistance(Embedding e1, Embedding e2) {
    float[] v1 = e1.vector();
    float[] v2 = e2.vector();

    float sumSquares = 0.0f;
    for (int i = 0; i < v1.length; i++) {
        float diff = v1[i] - v2[i];
        sumSquares += diff * diff;
    }
    return (float) Math.sqrt(sumSquares);
}

Testing Patterns:

@Test
void testEmbedding() {
    float[] vector = {3.0f, 4.0f}; // Magnitude = 5.0
    Embedding embedding = Embedding.from(vector);

    assertEquals(2, embedding.dimension());

    // Test normalization
    embedding.normalize();
    float[] normalized = embedding.vector();
    assertEquals(0.6f, normalized[0], 0.001f); // 3/5
    assertEquals(0.8f, normalized[1], 0.001f); // 4/5

    // Verify magnitude is 1.0
    float magnitude = (float) Math.sqrt(
        normalized[0] * normalized[0] + normalized[1] * normalized[1]
    );
    assertEquals(1.0f, magnitude, 0.001f);
}

Related APIs:

  • EmbeddingModel - Generates embeddings
  • EmbeddingStore - Stores and retrieves embeddings
  • TextSegment - Text to be embedded
  • ContentRetriever - Uses embeddings for semantic search

ToolExecutionRequest

Represents a request from the AI to execute a tool/function.

package dev.langchain4j.agent.tool;

/**
 * Represents a tool execution request from the AI
 */
public class ToolExecutionRequest {
    /**
     * Get tool execution ID
     * @return Execution ID
     */
    public String id();

    /**
     * Get tool name
     * @return Tool name
     */
    public String name();

    /**
     * Get tool arguments as JSON string
     * @return JSON arguments
     */
    public String arguments();

    /**
     * Create builder for modification
     * @return Builder with current values
     */
    public Builder toBuilder();

    /**
     * Create new builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety:

  • ToolExecutionRequest is immutable once built
  • Thread-safe for read operations
  • Builder is NOT thread-safe (build once per thread)
  • Safe to share built instances across threads

Common Pitfalls:

  • DO NOT assume arguments are valid JSON - validate before parsing
  • DO NOT ignore the id - required for tracking execution results
  • DO NOT modify arguments string directly - use builder to create new request
  • DO NOT assume tool name exists - validate against available tools
  • DO NOT parse arguments manually - use JsonParser utilities

Edge Cases:

  • Empty arguments string ("{}") is valid for parameterless tools
  • Null id may occur in some model implementations (validate!)
  • Tool name is case-sensitive
  • Arguments may contain complex nested JSON
  • Malformed JSON in arguments requires error handling

Exception Handling:

  • Builder throws IllegalStateException if required fields missing
  • JSON parsing of arguments can throw JsonParseException
  • Tool execution may throw any exception (tool-dependent)

Performance Notes:

  • ToolExecutionRequest is lightweight (three string references)
  • arguments() returns string reference - O(1)
  • Parsing arguments has JSON parsing overhead
  • Keep tool argument schemas simple for faster parsing

Execution Flow:

  1. Model generates ToolExecutionRequest in response
  2. Framework parses request
  3. Framework validates tool exists
  4. Framework parses arguments JSON
  5. Framework invokes tool with parsed arguments
  6. Framework sends result back to model

Usage Example:

// Typically created by framework, not manually
ToolExecutionRequest request = ToolExecutionRequest.builder()
    .id("call_123")
    .name("getCurrentWeather")
    .arguments("{\"location\": \"San Francisco\", \"unit\": \"celsius\"}")
    .build();

// Access fields
String id = request.id(); // "call_123"
String name = request.name(); // "getCurrentWeather"
String args = request.arguments(); // JSON string

// Parse arguments (framework does this automatically)
WeatherArgs parsedArgs = gson.fromJson(args, WeatherArgs.class);

// Execute tool
String result = weatherTool.getCurrentWeather(
    parsedArgs.location,
    parsedArgs.unit
);

// Create result to send back to model
ToolExecutionResult executionResult = ToolExecutionResult.builder()
    .id(id) // Must match request ID
    .toolName(name)
    .result(result)
    .build();

Testing Patterns:

@Test
void testToolExecutionRequest() {
    ToolExecutionRequest request = ToolExecutionRequest.builder()
        .id("test_id")
        .name("testTool")
        .arguments("{\"param\": \"value\"}")
        .build();

    assertEquals("test_id", request.id());
    assertEquals("testTool", request.name());
    assertTrue(request.arguments().contains("param"));

    // Test JSON parsing
    JsonObject json = JsonParser.parseString(request.arguments())
        .getAsJsonObject();
    assertEquals("value", json.get("param").getAsString());
}

Related APIs:

  • ToolSpecification - Describes tool to LLM
  • ToolExecutionResult - Result sent back to LLM
  • @Tool - Annotation for defining tools
  • AiServices - Handles tool execution automatically

ToolExecutionRequest Builder

/**
 * Builder for ToolExecutionRequest
 */
public static final class Builder {
    /**
     * Set execution ID
     * @param id Execution ID
     * @return Builder instance
     */
    public Builder id(String id);

    /**
     * Set tool name
     * @param name Tool name
     * @return Builder instance
     */
    public Builder name(String name);

    /**
     * Set tool arguments
     * @param arguments JSON arguments string
     * @return Builder instance
     */
    public Builder arguments(String arguments);

    /**
     * Build the request
     * @return ToolExecutionRequest instance
     */
    public ToolExecutionRequest build();
}

Thread Safety:

  • Builder is NOT thread-safe
  • Don't share builder across threads
  • Built ToolExecutionRequest is immutable and thread-safe

Common Pitfalls:

  • DO NOT forget to set all required fields (id, name, arguments)
  • DO NOT reuse builder after build() - create new builder
  • DO NOT pass null to setter methods

Exception Handling:

  • build() throws IllegalStateException if required fields are null

Related APIs:

  • ToolExecutionRequest - Built request object

ToolSpecification

Describes a tool for the LLM, including name, description, and parameter schema.

package dev.langchain4j.agent.tool;

import dev.langchain4j.model.chat.request.json.JsonObjectSchema;
import java.util.Map;

/**
 * Describes a tool/function for the LLM
 * Includes name, description, parameters schema, and provider-specific metadata
 */
public class ToolSpecification {
    /**
     * Get tool name
     * @return Tool name
     */
    public String name();

    /**
     * Get tool description
     * @return Tool description
     */
    public String description();

    /**
     * Get parameters schema
     * @return JSON object schema for parameters
     */
    public JsonObjectSchema parameters();

    /**
     * Get provider-specific metadata
     * @return Metadata map
     */
    public Map<String, Object> metadata();

    /**
     * Create builder for modification
     * @return Builder with current values
     */
    public Builder toBuilder();

    /**
     * Create new builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety:

  • ToolSpecification is immutable once built
  • Thread-safe for read operations
  • metadata() returns unmodifiable map
  • Safe to share across threads

Common Pitfalls:

  • DO NOT use vague descriptions - LLM relies on description for tool selection
  • DO NOT forget to mark required parameters in schema
  • DO NOT use overly complex parameter schemas (keep simple)
  • DO NOT use generic names like "tool1" - be specific
  • DO NOT skip parameter descriptions - help LLM understand usage

Edge Cases:

  • Empty parameters schema is valid for parameterless tools
  • Very long descriptions may be truncated by some models
  • Metadata is optional (can be null or empty)
  • Parameter schema can nest objects (but keep reasonable depth)

Exception Handling:

  • Builder throws IllegalStateException if required fields missing
  • No exceptions during read operations

Performance Notes:

  • ToolSpecification is lightweight
  • Serialized to JSON when sent to LLM
  • Complex schemas increase token usage
  • Keep descriptions concise (under 200 chars recommended)

Best Practices for Descriptions:

  • Be specific about what the tool does
  • Include when to use it
  • Mention any limitations
  • Use active voice
  • Example: "Gets current weather for a city. Use when user asks about weather conditions. Requires city name."

Usage Example:

// Manual creation (usually auto-generated from @Tool)
ToolSpecification spec = ToolSpecification.builder()
    .name("getCurrentWeather")
    .description("Get current weather for a specific location")
    .parameters(JsonObjectSchema.builder()
        .addStringProperty("location", "City name (e.g., 'San Francisco')")
        .addEnumProperty("unit", List.of("celsius", "fahrenheit"), "Temperature unit")
        .required("location")
        .build())
    .build();

// Typically auto-generated from @Tool annotation
@Tool("Get current weather for a specific location")
public String getCurrentWeather(
    @P("City name") String location,
    @P("Temperature unit") TemperatureUnit unit
) {
    // Implementation
}
// Framework generates ToolSpecification automatically

// Use with AiServices
List<ToolSpecification> tools = List.of(spec);
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(toolImplementations)
    .build();

Testing Patterns:

@Test
void testToolSpecification() {
    ToolSpecification spec = ToolSpecification.builder()
        .name("testTool")
        .description("Test description")
        .parameters(JsonObjectSchema.builder()
            .addStringProperty("param1")
            .required("param1")
            .build())
        .build();

    assertEquals("testTool", spec.name());
    assertEquals("Test description", spec.description());
    assertNotNull(spec.parameters());
    assertTrue(spec.parameters().required().contains("param1"));
}

Related APIs:

  • ToolExecutionRequest - Request to execute tool
  • @Tool - Annotation for defining tools
  • JsonObjectSchema - Parameter schema definition
  • AiServices - Handles tool integration

ToolSpecification Builder

/**
 * Builder for ToolSpecification
 */
public static final class Builder {
    /**
     * Set tool name
     * @param name Tool name
     * @return Builder instance
     */
    public Builder name(String name);

    /**
     * Set tool description
     * @param description Tool description
     * @return Builder instance
     */
    public Builder description(String description);

    /**
     * Set parameters schema
     * @param parameters JSON object schema
     * @return Builder instance
     */
    public Builder parameters(JsonObjectSchema parameters);

    /**
     * Set metadata map
     * @param metadata Provider-specific metadata
     * @return Builder instance
     */
    public Builder metadata(Map<String, Object> metadata);

    /**
     * Add single metadata entry
     * @param key Metadata key
     * @param value Metadata value
     * @return Builder instance
     */
    public Builder addMetadata(String key, Object value);

    /**
     * Build the specification
     * @return ToolSpecification instance
     */
    public ToolSpecification build();
}

Thread Safety:

  • Builder is NOT thread-safe
  • Build once per thread
  • Built ToolSpecification is immutable and thread-safe

Common Pitfalls:

  • DO NOT forget name and description (required)
  • DO NOT skip parameters schema for tools with parameters
  • DO NOT add provider-specific metadata unless necessary

Exception Handling:

  • build() throws IllegalStateException if name or description is null

Related APIs:

  • ToolSpecification - Built specification object
  • JsonObjectSchema.Builder - Build parameter schemas

JsonObjectSchema

JSON object schema for defining tool parameters and structured output formats.

package dev.langchain4j.model.chat.request.json;

import java.util.List;
import java.util.Map;

/**
 * JSON object schema for tool parameters and structured outputs
 */
public class JsonObjectSchema implements JsonSchemaElement {
    /**
     * Get schema description
     * @return Description
     */
    public String description();

    /**
     * Get properties
     * @return Map of property name to schema element
     */
    public Map<String, JsonSchemaElement> properties();

    /**
     * Get required property names
     * @return List of required property names
     */
    public List<String> required();

    /**
     * Check if additional properties allowed
     * @return true if additional properties allowed, false if not, null if unspecified
     */
    public Boolean additionalProperties();

    /**
     * Get schema definitions
     * @return Map of definition name to schema element
     */
    public Map<String, JsonSchemaElement> definitions();

    /**
     * Create builder for modification
     * @return Builder with current values
     */
    public Builder toBuilder();

    /**
     * Create new builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety:

  • JsonObjectSchema is immutable once built
  • Thread-safe for read operations
  • properties() and definitions() return unmodifiable maps
  • Safe to share across threads

Common Pitfalls:

  • DO NOT forget to mark required properties
  • DO NOT use overly nested schemas (limit to 3-4 levels)
  • DO NOT skip property descriptions - helps LLM understand
  • DO NOT use additionalProperties=true unless necessary (less strict validation)
  • DO NOT create circular references in definitions

Edge Cases:

  • Empty properties map is valid (for objects with no properties)
  • Empty required list means all properties optional
  • Null additionalProperties means unspecified (provider default)
  • Definitions allow schema reuse (like JSON Schema $ref)

Exception Handling:

  • Builder validation occurs at build() time
  • No exceptions during read operations

Performance Notes:

  • Schema serialization impacts token count
  • Keep schemas minimal for efficiency
  • Reuse definitions for common structures
  • Deeply nested schemas slow parsing

JSON Schema Compliance:

  • Follows JSON Schema Draft 7 (subset)
  • Supports common types: string, number, integer, boolean, array, object
  • Supports enum for string values
  • Supports required properties
  • Limited support for advanced features (patterns, formats, etc.)

Usage Example:

// Simple schema
JsonObjectSchema simple = JsonObjectSchema.builder()
    .addStringProperty("name", "Person's name")
    .addIntegerProperty("age", "Person's age")
    .required("name")
    .build();

// Complex schema with nesting
JsonObjectSchema complex = JsonObjectSchema.builder()
    .description("User profile data")
    .addStringProperty("username", "Unique username")
    .addStringProperty("email", "Email address")
    .addProperty("address", JsonObjectSchema.builder()
        .addStringProperty("street")
        .addStringProperty("city")
        .addStringProperty("zipCode")
        .required("street", "city")
        .build())
    .addEnumProperty("status",
        List.of("active", "inactive", "suspended"),
        "Account status")
    .required("username", "email")
    .additionalProperties(false) // Strict validation
    .build();

// Schema with reusable definitions
JsonObjectSchema withDefs = JsonObjectSchema.builder()
    .definitions(Map.of(
        "Address", JsonObjectSchema.builder()
            .addStringProperty("street")
            .addStringProperty("city")
            .build()
    ))
    // Use definition in properties...
    .build();

Testing Patterns:

@Test
void testJsonObjectSchema() {
    JsonObjectSchema schema = JsonObjectSchema.builder()
        .addStringProperty("prop1", "Description")
        .addIntegerProperty("prop2")
        .required("prop1")
        .build();

    assertEquals(2, schema.properties().size());
    assertTrue(schema.properties().containsKey("prop1"));
    assertEquals(1, schema.required().size());
    assertTrue(schema.required().contains("prop1"));
}

Related APIs:

  • JsonSchemaElement - Base interface for all schema types
  • ToolSpecification - Uses JsonObjectSchema for parameters
  • StructuredOutputParser - Parses output based on schema

JsonObjectSchema Builder

/**
 * Builder for JsonObjectSchema
 */
public static class Builder {
    /**
     * Set schema description
     * @param description Description
     * @return Builder instance
     */
    public Builder description(String description);

    /**
     * Add properties
     * @param properties Map of properties
     * @return Builder instance
     */
    public Builder addProperties(Map<String, JsonSchemaElement> properties);

    /**
     * Add single property
     * @param name Property name
     * @param jsonSchemaElement Property schema
     * @return Builder instance
     */
    public Builder addProperty(String name, JsonSchemaElement jsonSchemaElement);

    /**
     * Add string property
     * @param name Property name
     * @return Builder instance
     */
    public Builder addStringProperty(String name);

    /**
     * Add string property with description
     * @param name Property name
     * @param description Property description
     * @return Builder instance
     */
    public Builder addStringProperty(String name, String description);

    /**
     * Add integer property
     * @param name Property name
     * @return Builder instance
     */
    public Builder addIntegerProperty(String name);

    /**
     * Add integer property with description
     * @param name Property name
     * @param description Property description
     * @return Builder instance
     */
    public Builder addIntegerProperty(String name, String description);

    /**
     * Add number property
     * @param name Property name
     * @return Builder instance
     */
    public Builder addNumberProperty(String name);

    /**
     * Add number property with description
     * @param name Property name
     * @param description Property description
     * @return Builder instance
     */
    public Builder addNumberProperty(String name, String description);

    /**
     * Add boolean property
     * @param name Property name
     * @return Builder instance
     */
    public Builder addBooleanProperty(String name);

    /**
     * Add boolean property with description
     * @param name Property name
     * @param description Property description
     * @return Builder instance
     */
    public Builder addBooleanProperty(String name, String description);

    /**
     * Add enum property
     * @param name Property name
     * @param enumValues Allowed values
     * @return Builder instance
     */
    public Builder addEnumProperty(String name, List<String> enumValues);

    /**
     * Add enum property with description
     * @param name Property name
     * @param enumValues Allowed values
     * @param description Property description
     * @return Builder instance
     */
    public Builder addEnumProperty(String name, List<String> enumValues, String description);

    /**
     * Set required properties
     * @param required List of required property names
     * @return Builder instance
     */
    public Builder required(List<String> required);

    /**
     * Set required properties (varargs)
     * @param required Required property names
     * @return Builder instance
     */
    public Builder required(String... required);

    /**
     * Set additional properties allowed
     * @param additionalProperties true to allow, false to disallow
     * @return Builder instance
     */
    public Builder additionalProperties(Boolean additionalProperties);

    /**
     * Set definitions
     * @param definitions Map of definitions
     * @return Builder instance
     */
    public Builder definitions(Map<String, JsonSchemaElement> definitions);

    /**
     * Build the schema
     * @return JsonObjectSchema instance
     */
    public JsonObjectSchema build();
}

Thread Safety:

  • Builder is NOT thread-safe
  • Build once per thread
  • Built JsonObjectSchema is immutable and thread-safe

Common Pitfalls:

  • DO NOT call required() with non-existent property names
  • DO NOT forget descriptions for better LLM understanding
  • DO NOT mix addProperty with convenience methods inconsistently
  • DO NOT add same property twice (last wins)

Exception Handling:

  • No validation during add operations
  • Validation occurs at build() time (if any)

Performance Notes:

  • Convenience methods (addStringProperty, etc.) are preferred for readability
  • No significant performance difference between methods
  • Build() is O(n) where n is number of properties

Related APIs:

  • JsonObjectSchema - Built schema object
  • JsonSchemaElement - Base type for property schemas

JsonSchemaElement

Base interface for all JSON schema types.

package dev.langchain4j.model.chat.request.json;

/**
 * Base interface for JSON schema elements
 * Implementations: JsonObjectSchema, JsonArraySchema, JsonStringSchema,
 * JsonIntegerSchema, JsonNumberSchema, JsonBooleanSchema, JsonEnumSchema, etc.
 */
public interface JsonSchemaElement {
    /**
     * Get element description
     * @return Description or null
     */
    String description();
}

Thread Safety:

  • All implementations are immutable
  • Thread-safe for read operations

Common Pitfalls:

  • DO NOT forget this is just a base interface - use concrete types
  • DO NOT assume description() is never null

Exception Handling:

  • No exceptions thrown by interface methods

Implementations:

  • JsonObjectSchema - Object type with properties
  • JsonArraySchema - Array type with item schema
  • JsonStringSchema - String type
  • JsonIntegerSchema - Integer number type
  • JsonNumberSchema - Floating point number type
  • JsonBooleanSchema - Boolean type
  • JsonEnumSchema - String with enumerated values

Usage Example:

// Typically used via JsonObjectSchema builder
JsonSchemaElement stringElement = JsonStringSchema.builder()
    .description("A string value")
    .build();

JsonSchemaElement objectElement = JsonObjectSchema.builder()
    .addProperty("field", stringElement)
    .build();

Related APIs:

  • JsonObjectSchema - Most common implementation
  • All specific schema type implementations

Image

Represents an image with URL or base64 data.

package dev.langchain4j.data.image;

import java.net.URI;

/**
 * Represents an image with URL or base64 data
 */
public final class Image {
    /**
     * Get image URL
     * @return Image URL or null
     */
    public URI url();

    /**
     * Get base64-encoded image data
     * @return Base64 data or null
     */
    public String base64Data();

    /**
     * Get MIME type
     * @return MIME type or null
     */
    public String mimeType();

    /**
     * Get revised prompt (for image generation)
     * @return Revised prompt or null
     */
    public String revisedPrompt();

    /**
     * Create builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety:

  • Image is immutable once built
  • Thread-safe for read operations
  • Safe to share across threads

Common Pitfalls:

  • DO NOT assume both url() and base64Data() are non-null (only one should be set)
  • DO NOT forget mimeType when using base64Data
  • DO NOT load very large images into base64 (memory intensive)
  • DO NOT forget to validate URL accessibility
  • DO NOT assume revisedPrompt is set (only for image generation)

Edge Cases:

  • Either url or base64Data should be set, not both
  • mimeType is required for base64Data but optional for URLs
  • revisedPrompt is only populated by image generation models
  • Very large base64 strings can cause memory issues
  • URL images require network access

Exception Handling:

  • Builder throws IllegalStateException if neither url nor base64Data set
  • No exceptions during read operations
  • URL validation not performed (may be invalid)

Performance Notes:

  • URL-based images don't consume memory (just reference)
  • base64Data can be large (1MB+ for high-res images)
  • Consider streaming for large images
  • URL loading is lazy (not loaded until used)

Supported MIME Types:

  • image/png - PNG format
  • image/jpeg - JPEG format
  • image/gif - GIF format
  • image/webp - WebP format
  • Model-specific support varies

Usage Example:

// Image from URL
Image urlImage = Image.builder()
    .url("https://example.com/image.png")
    .mimeType("image/png") // Optional for URLs
    .build();

// Image from base64 data
String base64 = "iVBORw0KGgoAAAANSUhEUgAAAAUA...";
Image base64Image = Image.builder()
    .base64Data(base64)
    .mimeType("image/png") // Required for base64
    .build();

// Generated image with revised prompt
Image generated = Image.builder()
    .url(generatedUrl)
    .revisedPrompt("A photo of a cat (revised by model)")
    .build();

// Use with vision models
Response<AiMessage> response = visionModel.generate(
    UserMessage.from(
        TextContent.from("What's in this image?"),
        ImageContent.from(image)
    )
);

Testing Patterns:

@Test
void testImage() {
    Image image = Image.builder()
        .url("https://example.com/test.png")
        .mimeType("image/png")
        .build();

    assertNotNull(image.url());
    assertEquals("image/png", image.mimeType());
    assertNull(image.base64Data()); // Only URL set
}

Related APIs:

  • ImageContent - Wraps Image for messages
  • UserMessage - Can contain image content
  • ImageModel - Generates images
  • Vision models (ChatModel with vision support)

Image Builder

/**
 * Builder for Image
 */
public static class Builder {
    /**
     * Set image URL
     * @param url Image URL
     * @return Builder instance
     */
    public Builder url(URI url);

    /**
     * Set image URL from string
     * @param url Image URL string
     * @return Builder instance
     */
    public Builder url(String url);

    /**
     * Set base64 data
     * @param base64Data Base64-encoded image data
     * @return Builder instance
     */
    public Builder base64Data(String base64Data);

    /**
     * Set MIME type
     * @param mimeType MIME type (e.g., "image/png")
     * @return Builder instance
     */
    public Builder mimeType(String mimeType);

    /**
     * Set revised prompt
     * @param revisedPrompt Revised prompt from image generation
     * @return Builder instance
     */
    public Builder revisedPrompt(String revisedPrompt);

    /**
     * Build the image
     * @return Image instance
     */
    public Image build();
}

Thread Safety:

  • Builder is NOT thread-safe
  • Build once per thread
  • Built Image is immutable and thread-safe

Common Pitfalls:

  • DO NOT set both url and base64Data (only one)
  • DO NOT forget mimeType for base64Data
  • DO NOT skip validation of URL format

Exception Handling:

  • build() throws IllegalStateException if neither url nor base64Data set
  • url(String) may throw URISyntaxException for invalid URLs

Related APIs:

  • Image - Built image object

Related APIs

Document Processing

  • DocumentParser - Parse files into Documents
  • DocumentSplitter - Split documents into segments
  • DocumentTransformer - Transform document content

Embeddings

  • EmbeddingModel - Generate embeddings
  • EmbeddingStore - Store and retrieve embeddings
  • ContentRetriever - Semantic search using embeddings

Tools

  • @Tool - Annotation for defining tools
  • ToolExecutionResult - Result of tool execution
  • AiServices - Automatic tool integration

Structured Output

  • OutputParser - Parse structured output
  • StructuredOutputParser - Schema-based parsing
  • @StructuredPrompt - Structured input templates

Testing Patterns

Testing with Immutable Types

@Test
void testImmutableTypes() {
    // Document and TextSegment are safe to share
    Document doc = Document.from("text");
    TextSegment segment = TextSegment.from("text");

    // Safe to pass to multiple threads
    executor.submit(() -> processDocument(doc));
    executor.submit(() -> processSegment(segment));
}

Testing with Mutable Types

@Test
void testMutableTypes() {
    // Metadata and Embedding are mutable
    Metadata metadata = new Metadata().put("key", "value");

    // Create copy for thread safety
    Metadata copy = metadata.copy();

    // Safe to modify independently
    metadata.put("key", "modified");
    assertEquals("value", copy.getString("key"));
}

Testing Metadata Thread Safety

@Test
void testMetadataThreadSafety() throws Exception {
    Metadata shared = new Metadata().put("counter", 0);

    // UNSAFE: concurrent modification
    // CountDownLatch latch = new CountDownLatch(2);
    // executor.submit(() -> {
    //     for (int i = 0; i < 1000; i++) {
    //         shared.put("counter", shared.getInteger("counter") + 1);
    //     }
    //     latch.countDown();
    // });
    // Result: race condition, lost updates

    // SAFE: use copies
    CountDownLatch latch = new CountDownLatch(2);
    List<Metadata> results = new CopyOnWriteArrayList<>();

    executor.submit(() -> {
        Metadata local = shared.copy();
        for (int i = 0; i < 1000; i++) {
            local.put("counter", local.getInteger("counter") + 1);
        }
        results.add(local);
        latch.countDown();
    });

    latch.await();
    // Merge results as needed
}

Performance Tips

  1. Prefer Immutable Types: Document, TextSegment, Image are immutable - safe and fast
  2. Use copy() for Metadata: Create independent copies for thread safety
  3. Avoid vectorAsList(): Use vector() for embeddings (no boxing overhead)
  4. Normalize Once: Call embedding.normalize() only once
  5. Batch Operations: Process multiple segments/embeddings together
  6. Reasonable Segment Sizes: 512-1024 tokens for best RAG performance
  7. Reuse Tool Specifications: Create once, use many times
  8. Keep Schemas Simple: Simpler schemas = faster parsing and lower token usage
  9. Use URL Images: Prefer URL over base64 for large images
  10. Cache Parsed Arguments: Don't re-parse tool arguments if used multiple times

Common Integration Patterns

RAG Pipeline

// 1. Load and split document
Document document = DocumentParser.parse(file);
List<TextSegment> segments = splitter.split(document);

// 2. Embed and store
for (TextSegment segment : segments) {
    Embedding embedding = embeddingModel.embed(segment).content();
    embeddingStore.add(embedding, segment);
}

// 3. Retrieve and generate
List<EmbeddingMatch<TextSegment>> relevant =
    embeddingStore.findRelevant(queryEmbedding, 5);
String response = chatModel.generate(buildPrompt(relevant));

Tool Execution

// 1. Define tools with @Tool annotation
@Tool("Get current weather")
String getWeather(@P("City name") String city) {
    return weatherService.getWeather(city);
}

// 2. Framework generates ToolSpecification
// 3. Model generates ToolExecutionRequest
// 4. Framework executes and returns ToolExecutionResult
// All handled automatically by AiServices

Structured Output

// 1. Define schema
JsonObjectSchema schema = JsonObjectSchema.builder()
    .addStringProperty("name", "Person name")
    .addIntegerProperty("age", "Person age")
    .required("name", "age")
    .build();

// 2. Request structured output
@UserMessage("Extract person info: {{text}}")
@StructuredOutput
Person extractPerson(@V("text") String text);

// 3. Automatic parsing to POJO
Person person = assistant.extractPerson("John is 30 years old");

This completes the comprehensive data types documentation with production-grade details for coding agents.

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j@1.11.0

docs

ai-services.md

chains.md

classification.md

data-types.md

document-processing.md

embedding-store.md

guardrails.md

index.md

memory.md

messages.md

models.md

output-parsing.md

prompts.md

rag.md

request-response.md

spi.md

tools.md

README.md

tile.json