tessl/maven-dev-langchain4j--langchain4j

Build LLM-powered applications in Java with support for chatbots, agents, RAG, tools, and much more

Overview

Eval results

Files

Data Types

Name: tessl/maven-dev-langchain4j--langchain4j
Author: tessl

Core data types for working with documents, embeddings, tools, and structured data. These types provide the foundation for document processing, semantic search, tool execution, and structured output parsing.

Overview

LangChain4j data types are designed to be:

Immutable (mostly): Prefer immutable designs for thread safety
Type-Safe: Strong typing prevents runtime errors
Interoperable: Work across all LangChain4j components
Efficient: Optimized for performance-critical paths

Capabilities

Document

Interface representing an unstructured text document with metadata.

package dev.langchain4j.data.document;

/**
 * Represents a document (unstructured text) with associated metadata
 */
public interface Document {
    /**
     * Common metadata key for file name
     */
    String FILE_NAME = "file_name";

    /**
     * Common metadata key for absolute directory path
     */
    String ABSOLUTE_DIRECTORY_PATH = "absolute_directory_path";

    /**
     * Common metadata key for URL
     */
    String URL = "url";

    /**
     * Get document text
     * @return Document text
     */
    String text();

    /**
     * Get document metadata
     * @return Metadata
     */
    Metadata metadata();

    /**
     * Convert to text segment
     * @return TextSegment with same content
     */
    TextSegment toTextSegment();

    /**
     * Create document from text
     * @param text Document text
     * @return Document instance
     */
    static Document from(String text);

    /**
     * Create document from text and metadata
     * @param text Document text
     * @param metadata Document metadata
     * @return Document instance
     */
    static Document from(String text, Metadata metadata);

    /**
     * Create document from text (alias)
     * @param text Document text
     * @return Document instance
     */
    static Document document(String text);

    /**
     * Create document from text and metadata (alias)
     * @param text Document text
     * @param metadata Document metadata
     * @return Document instance
     */
    static Document document(String text, Metadata metadata);
}

Thread Safety:

Document instances are immutable (text and metadata are set at construction)
The default implementation is thread-safe for read operations
Metadata object may be mutable (see Metadata section)
Safe to share across threads once constructed

Common Pitfalls:

DO NOT mutate the returned Metadata object if sharing Document across threads
DO NOT assume text() is null-safe - always check for null
DO NOT use Document for large files (>100MB) - consider streaming
DO NOT store binary data in text field - use appropriate Document types

Edge Cases:

Empty string text is valid (not null)
Null metadata defaults to empty Metadata
Very large documents may cause memory issues
toTextSegment() creates new object (not cached)

Exception Handling:

Construction methods can throw NullPointerException if text is null
No checked exceptions thrown
text() may return null in custom implementations

Performance Notes:

Document objects are lightweight (just pointers to text and metadata)
text() is O(1) - returns reference
metadata() is O(1) - returns reference
toTextSegment() creates new object - O(1) but allocates memory
Standard metadata keys are constants for efficient comparison

Usage Example:

// Simple document
Document doc1 = Document.from("Hello, world!");

// Document with metadata
Metadata metadata = new Metadata()
    .put(Document.FILE_NAME, "greeting.txt")
    .put(Document.ABSOLUTE_DIRECTORY_PATH, "/docs");
Document doc2 = Document.from("Hello, world!", metadata);

// Convert to segment for embedding
TextSegment segment = doc2.toTextSegment();

Related APIs:

TextSegment - Document chunk for embeddings
Metadata - Metadata container
DocumentParser - Parse files into Documents
DocumentSplitter - Split documents into segments

Metadata

Key-value storage for document and segment metadata. Supports typed access for String, UUID, Integer, Long, Float, and Double values.

package dev.langchain4j.data.document;

import java.util.Map;
import java.util.UUID;

/**
 * Metadata container with typed accessors
 * Supports String, UUID, Integer, Long, Float, Double values
 */
public class Metadata {
    /**
     * Create empty metadata
     */
    public Metadata();

    /**
     * Create from map
     * @param metadata Initial metadata
     */
    public Metadata(Map<String, ?> metadata);

    /**
     * Get string value
     * @param key Metadata key
     * @return String value or null
     */
    public String getString(String key);

    /**
     * Get UUID value
     * @param key Metadata key
     * @return UUID value or null
     */
    public UUID getUUID(String key);

    /**
     * Get integer value
     * @param key Metadata key
     * @return Integer value or null
     */
    public Integer getInteger(String key);

    /**
     * Get long value
     * @param key Metadata key
     * @return Long value or null
     */
    public Long getLong(String key);

    /**
     * Get float value
     * @param key Metadata key
     * @return Float value or null
     */
    public Float getFloat(String key);

    /**
     * Get double value
     * @param key Metadata key
     * @return Double value or null
     */
    public Double getDouble(String key);

    /**
     * Check if key exists
     * @param key Metadata key
     * @return true if key exists
     */
    public boolean containsKey(String key);

    /**
     * Put string value
     * @param key Metadata key
     * @param value String value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, String value);

    /**
     * Put UUID value
     * @param key Metadata key
     * @param value UUID value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, UUID value);

    /**
     * Put integer value
     * @param key Metadata key
     * @param value Integer value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, int value);

    /**
     * Put long value
     * @param key Metadata key
     * @param value Long value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, long value);

    /**
     * Put float value
     * @param key Metadata key
     * @param value Float value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, float value);

    /**
     * Put double value
     * @param key Metadata key
     * @param value Double value
     * @return This metadata (fluent)
     */
    public Metadata put(String key, double value);

    /**
     * Put all from map
     * @param metadata Map of metadata
     * @return This metadata (fluent)
     */
    public Metadata putAll(Map<String, Object> metadata);

    /**
     * Remove key
     * @param key Metadata key
     * @return This metadata (fluent)
     */
    public Metadata remove(String key);

    /**
     * Create a copy
     * @return New metadata with same values
     */
    public Metadata copy();

    /**
     * Convert to map
     * @return Map representation
     */
    public Map<String, Object> toMap();

    /**
     * Merge with another metadata
     * @param another Metadata to merge
     * @return This metadata with merged values (fluent)
     */
    public Metadata merge(Metadata another);

    /**
     * Create from single key-value pair
     * @param key Metadata key
     * @param value String value
     * @return New metadata instance
     */
    public static Metadata from(String key, String value);

    /**
     * Create from map
     * @param metadata Map of metadata
     * @return New metadata instance
     */
    public static Metadata from(Map<String, ?> metadata);

    /**
     * Create from single key-value pair (alias)
     * @param key Metadata key
     * @param value String value
     * @return New metadata instance
     */
    public static Metadata metadata(String key, String value);
}

Thread Safety:

Metadata is MUTABLE and NOT thread-safe
Concurrent modifications without synchronization lead to undefined behavior
Use copy() to create thread-safe snapshots
Recommended: Create once, don't mutate after sharing
For concurrent access, use external synchronization or use immutable copies

Common Pitfalls:

DO NOT share and mutate Metadata across threads without synchronization
DO NOT assume type safety - wrong getter returns null, not exception
DO NOT use arbitrary Object types - only supported types work
DO NOT modify Metadata after adding to Document/TextSegment if shared
DO NOT rely on insertion order - use LinkedHashMap if order matters

Edge Cases:

Null keys throw NullPointerException
Null values are stored as null (not removed)
Getting wrong type returns null (not exception)
Empty string keys are valid
copy() is deep copy - safe to modify independently
merge() overwrites existing keys (last write wins)
toMap() returns defensive copy (modifications don't affect Metadata)

Exception Handling:

put() throws NullPointerException if key is null
No exceptions for type mismatches (returns null)
No exceptions for missing keys (returns null)

Performance Notes:

Backed by HashMap - O(1) get/put operations
copy() is O(n) where n is number of entries
merge() is O(m) where m is entries in other Metadata
toMap() creates defensive copy - O(n) allocation
Typed getters avoid boxing for primitive types
containsKey() is faster than checking get() != null

Usage Example:

// Fluent builder pattern
Metadata metadata = new Metadata()
    .put("file_name", "document.pdf")
    .put("page_number", 5)
    .put("confidence", 0.95)
    .put("document_id", UUID.randomUUID());

// Type-safe access
String fileName = metadata.getString("file_name"); // "document.pdf"
Integer pageNumber = metadata.getInteger("page_number"); // 5
Double confidence = metadata.getDouble("confidence"); // 0.95
UUID docId = metadata.getUUID("document_id");

// Wrong type returns null
UUID wrongType = metadata.getUUID("file_name"); // null, not exception

// Thread-safe snapshot
Metadata snapshot = metadata.copy();
// Safe to mutate snapshot independently

// Merge metadata
Metadata additional = new Metadata().put("author", "John Doe");
metadata.merge(additional); // metadata now has all keys

Testing Patterns:

@Test
void testMetadata() {
    Metadata metadata = new Metadata()
        .put("key", "value")
        .put("count", 42);

    assertEquals("value", metadata.getString("key"));
    assertEquals(Integer.valueOf(42), metadata.getInteger("count"));
    assertNull(metadata.getString("missing")); // Missing key returns null
    assertNull(metadata.getInteger("key")); // Wrong type returns null

    // Test copy independence
    Metadata copy = metadata.copy();
    copy.put("key", "modified");
    assertEquals("value", metadata.getString("key")); // Original unchanged
}

Related APIs:

Document - Uses Metadata for document attributes
TextSegment - Uses Metadata for segment attributes
ContentRetriever - Filters based on metadata
EmbeddingStore - Stores embeddings with metadata

TextSegment

Represents a chunk or segment of text with associated metadata. Used for embeddings and RAG.

package dev.langchain4j.data.segment;

import dev.langchain4j.data.document.Metadata;

/**
 * Represents a text segment with metadata
 * Typically a chunk of a larger document
 */
public class TextSegment {
    /**
     * Create text segment
     * @param text Segment text
     * @param metadata Segment metadata
     */
    public TextSegment(String text, Metadata metadata);

    /**
     * Get segment text
     * @return Text content
     */
    public String text();

    /**
     * Get segment metadata
     * @return Metadata
     */
    public Metadata metadata();

    /**
     * Create from text
     * @param text Segment text
     * @return TextSegment instance
     */
    public static TextSegment from(String text);

    /**
     * Create from text and metadata
     * @param text Segment text
     * @param metadata Segment metadata
     * @return TextSegment instance
     */
    public static TextSegment from(String text, Metadata metadata);

    /**
     * Create from text (alias)
     * @param text Segment text
     * @return TextSegment instance
     */
    public static TextSegment textSegment(String text);

    /**
     * Create from text and metadata (alias)
     * @param text Segment text
     * @param metadata Segment metadata
     * @return TextSegment instance
     */
    public static TextSegment textSegment(String text, Metadata metadata);
}

Thread Safety:

TextSegment is immutable - text and metadata reference set at construction
However, Metadata itself is mutable - don't mutate after construction if sharing
Safe to share across threads if Metadata is not mutated
Recommended: Don't modify metadata() after creating TextSegment

Common Pitfalls:

DO NOT mutate metadata() after construction if sharing across threads
DO NOT create very large segments (>8KB text) - hurts embedding quality
DO NOT create very small segments (<50 chars) - loses context
DO NOT forget to include relevant metadata for filtering
DO NOT reuse metadata objects across segments without copy()

Edge Cases:

Empty string text is valid
Null text in constructor may throw NPE (implementation dependent)
Null metadata defaults to empty Metadata
No automatic text normalization (preserve whitespace/formatting)
No text length limits enforced (but embedding models have limits)

Exception Handling:

Constructor may throw NullPointerException if text is null
No checked exceptions
text() and metadata() don't throw exceptions

Performance Notes:

TextSegment is lightweight wrapper (two references)
text() and metadata() are O(1)
Typically created in bulk by DocumentSplitter
Keep segment size reasonable (512-1024 tokens recommended)
Metadata adds minimal overhead (few KB per segment)

Recommended Segment Sizes:

Small segments (128-256 tokens): High precision, less context
Medium segments (512-1024 tokens): Balanced - recommended for most use cases
Large segments (2048+ tokens): More context, lower precision
Consider overlap between segments (e.g., 20% overlap)

Usage Example:

// Simple segment
TextSegment segment1 = TextSegment.from("This is a text chunk.");

// Segment with metadata
Metadata metadata = new Metadata()
    .put("document_id", UUID.randomUUID())
    .put("chunk_index", 0)
    .put("source", "manual.pdf");
TextSegment segment2 = TextSegment.from("Text content", metadata);

// Common pattern: Document → Segments
Document document = Document.from(longText);
DocumentSplitter splitter = DocumentSplitters.recursive(1000, 200);
List<TextSegment> segments = splitter.split(document);

// Each segment can be embedded
for (TextSegment segment : segments) {
    Embedding embedding = embeddingModel.embed(segment).content();
    embeddingStore.add(embedding, segment);
}

Testing Patterns:

@Test
void testTextSegment() {
    Metadata metadata = new Metadata().put("key", "value");
    TextSegment segment = TextSegment.from("content", metadata);

    assertEquals("content", segment.text());
    assertEquals("value", segment.metadata().getString("key"));

    // Verify immutability concern
    segment.metadata().put("key", "modified");
    assertEquals("modified", segment.metadata().getString("key"));
    // Note: metadata was mutated! Use copy() if sharing.
}

Related APIs:

Document - Source of segments
DocumentSplitter - Creates segments from documents
Embedding - Vector representation of segment
EmbeddingStore - Stores segments with embeddings
ContentRetriever - Retrieves relevant segments

Embedding

Represents a dense vector representation of text for semantic search and similarity operations.

package dev.langchain4j.data.embedding;

import java.util.List;

/**
 * Represents a text embedding (vector representation)
 */
public class Embedding {
    /**
     * Create embedding from vector
     * @param vector Float array vector
     */
    public Embedding(float[] vector);

    /**
     * Get vector as array
     * @return Float array vector
     */
    public float[] vector();

    /**
     * Get vector as list
     * @return List of Float values
     */
    public List<Float> vectorAsList();

    /**
     * Normalize the embedding vector in-place
     */
    public void normalize();

    /**
     * Get embedding dimension
     * @return Vector dimension
     */
    public int dimension();

    /**
     * Create from array
     * @param vector Float array vector
     * @return Embedding instance
     */
    public static Embedding from(float[] vector);

    /**
     * Create from list
     * @param vector List of Float values
     * @return Embedding instance
     */
    public static Embedding from(List<Float> vector);
}

Thread Safety:

Embedding is MUTABLE - normalize() modifies internal vector
NOT thread-safe if normalize() is called concurrently
vector() returns reference to internal array (not defensive copy)
Concurrent reads are safe if no writes occur
Recommended: Don't modify after creation, or synchronize access

Common Pitfalls:

DO NOT modify returned vector array - affects internal state
DO NOT assume embeddings are normalized - call normalize() if needed
DO NOT mix embeddings from different models (different dimensions)
DO NOT store embeddings as doubles - use float for efficiency
DO NOT call normalize() multiple times (no-op after first call, but wasteful)

Edge Cases:

Empty vector (dimension 0) is technically valid but semantically meaningless
Null vector in constructor throws NullPointerException
normalize() on zero vector results in NaN values
Very large dimensions (>10K) are rare but supported
Negative values are valid (embeddings can have negative components)

Exception Handling:

Constructor throws NullPointerException if vector is null
normalize() may produce NaN if vector is all zeros
No exceptions for dimension mismatches (caller's responsibility)

Performance Notes:

Embedding stores float array directly - no boxing overhead
vector() returns reference - O(1), no allocation
vectorAsList() boxes floats to Float - O(n) allocation, avoid if possible
normalize() is O(n) - modifies in place
dimension() is O(1) - just array length
Typical embedding sizes: 384 (BERT-small), 768 (BERT-base), 1536 (OpenAI), 3072 (large models)

Memory Usage:

float[n] uses 4n bytes plus object overhead (~16 bytes)
Example: 1536-dimensional embedding ≈ 6KB
1 million embeddings ≈ 6GB RAM (without compression)

Normalization:

L2 normalization: divides each component by vector magnitude
Required for some similarity metrics (cosine similarity with normalized vectors = dot product)
Many embedding models return pre-normalized vectors
Check model documentation before normalizing

Usage Example:

// Create from array
float[] vector = new float[]{0.1f, 0.2f, 0.3f};
Embedding embedding = Embedding.from(vector);

// Get dimension
int dim = embedding.dimension(); // 3

// Normalize (modifies in-place)
embedding.normalize();

// Calculate cosine similarity (assuming both normalized)
float similarity = cosineSimilarity(embedding1, embedding2);

// Common pattern: Embed text
Response<Embedding> response = embeddingModel.embed("Hello world");
Embedding embedding = response.content();

// Store with metadata
embeddingStore.add(embedding, textSegment);

Similarity Calculations:

// Cosine similarity (for normalized embeddings)
public float cosineSimilarity(Embedding e1, Embedding e2) {
    float[] v1 = e1.vector();
    float[] v2 = e2.vector();

    float dotProduct = 0.0f;
    for (int i = 0; i < v1.length; i++) {
        dotProduct += v1[i] * v2[i];
    }
    return dotProduct; // Already normalized, so dot product = cosine similarity
}

// Euclidean distance
public float euclideanDistance(Embedding e1, Embedding e2) {
    float[] v1 = e1.vector();
    float[] v2 = e2.vector();

    float sumSquares = 0.0f;
    for (int i = 0; i < v1.length; i++) {
        float diff = v1[i] - v2[i];
        sumSquares += diff * diff;
    }
    return (float) Math.sqrt(sumSquares);
}

Testing Patterns:

@Test
void testEmbedding() {
    float[] vector = {3.0f, 4.0f}; // Magnitude = 5.0
    Embedding embedding = Embedding.from(vector);

    assertEquals(2, embedding.dimension());

    // Test normalization
    embedding.normalize();
    float[] normalized = embedding.vector();
    assertEquals(0.6f, normalized[0], 0.001f); // 3/5
    assertEquals(0.8f, normalized[1], 0.001f); // 4/5

    // Verify magnitude is 1.0
    float magnitude = (float) Math.sqrt(
        normalized[0] * normalized[0] + normalized[1] * normalized[1]
    );
    assertEquals(1.0f, magnitude, 0.001f);
}

Related APIs:

EmbeddingModel - Generates embeddings
EmbeddingStore - Stores and retrieves embeddings
TextSegment - Text to be embedded
ContentRetriever - Uses embeddings for semantic search

ToolExecutionRequest

Represents a request from the AI to execute a tool/function.

package dev.langchain4j.agent.tool;

/**
 * Represents a tool execution request from the AI
 */
public class ToolExecutionRequest {
    /**
     * Get tool execution ID
     * @return Execution ID
     */
    public String id();

    /**
     * Get tool name
     * @return Tool name
     */
    public String name();

    /**
     * Get tool arguments as JSON string
     * @return JSON arguments
     */
    public String arguments();

    /**
     * Create builder for modification
     * @return Builder with current values
     */
    public Builder toBuilder();

    /**
     * Create new builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety:

ToolExecutionRequest is immutable once built
Thread-safe for read operations
Builder is NOT thread-safe (build once per thread)
Safe to share built instances across threads

Common Pitfalls:

DO NOT assume arguments are valid JSON - validate before parsing
DO NOT ignore the id - required for tracking execution results
DO NOT modify arguments string directly - use builder to create new request
DO NOT assume tool name exists - validate against available tools
DO NOT parse arguments manually - use JsonParser utilities

Edge Cases:

Empty arguments string ("{}") is valid for parameterless tools
Null id may occur in some model implementations (validate!)
Tool name is case-sensitive
Arguments may contain complex nested JSON
Malformed JSON in arguments requires error handling

Exception Handling:

Builder throws IllegalStateException if required fields missing
JSON parsing of arguments can throw JsonParseException
Tool execution may throw any exception (tool-dependent)

Performance Notes:

ToolExecutionRequest is lightweight (three string references)
arguments() returns string reference - O(1)
Parsing arguments has JSON parsing overhead
Keep tool argument schemas simple for faster parsing

Execution Flow:

Model generates ToolExecutionRequest in response
Framework parses request
Framework validates tool exists
Framework parses arguments JSON
Framework invokes tool with parsed arguments
Framework sends result back to model

Usage Example:

// Typically created by framework, not manually
ToolExecutionRequest request = ToolExecutionRequest.builder()
    .id("call_123")
    .name("getCurrentWeather")
    .arguments("{\"location\": \"San Francisco\", \"unit\": \"celsius\"}")
    .build();

// Access fields
String id = request.id(); // "call_123"
String name = request.name(); // "getCurrentWeather"
String args = request.arguments(); // JSON string

// Parse arguments (framework does this automatically)
WeatherArgs parsedArgs = gson.fromJson(args, WeatherArgs.class);

// Execute tool
String result = weatherTool.getCurrentWeather(
    parsedArgs.location,
    parsedArgs.unit
);

// Create result to send back to model
ToolExecutionResult executionResult = ToolExecutionResult.builder()
    .id(id) // Must match request ID
    .toolName(name)
    .result(result)
    .build();

Testing Patterns:

@Test
void testToolExecutionRequest() {
    ToolExecutionRequest request = ToolExecutionRequest.builder()
        .id("test_id")
        .name("testTool")
        .arguments("{\"param\": \"value\"}")
        .build();

    assertEquals("test_id", request.id());
    assertEquals("testTool", request.name());
    assertTrue(request.arguments().contains("param"));

    // Test JSON parsing
    JsonObject json = JsonParser.parseString(request.arguments())
        .getAsJsonObject();
    assertEquals("value", json.get("param").getAsString());
}

Related APIs:

ToolSpecification - Describes tool to LLM
ToolExecutionResult - Result sent back to LLM
@Tool - Annotation for defining tools
AiServices - Handles tool execution automatically

ToolExecutionRequest Builder

/**
 * Builder for ToolExecutionRequest
 */
public static final class Builder {
    /**
     * Set execution ID
     * @param id Execution ID
     * @return Builder instance
     */
    public Builder id(String id);

    /**
     * Set tool name
     * @param name Tool name
     * @return Builder instance
     */
    public Builder name(String name);

    /**
     * Set tool arguments
     * @param arguments JSON arguments string
     * @return Builder instance
     */
    public Builder arguments(String arguments);

    /**
     * Build the request
     * @return ToolExecutionRequest instance
     */
    public ToolExecutionRequest build();
}

Thread Safety:

Builder is NOT thread-safe
Don't share builder across threads
Built ToolExecutionRequest is immutable and thread-safe

Common Pitfalls:

DO NOT forget to set all required fields (id, name, arguments)
DO NOT reuse builder after build() - create new builder
DO NOT pass null to setter methods

Exception Handling:

build() throws IllegalStateException if required fields are null

Related APIs:

ToolExecutionRequest - Built request object

ToolSpecification

Describes a tool for the LLM, including name, description, and parameter schema.

package dev.langchain4j.agent.tool;

import dev.langchain4j.model.chat.request.json.JsonObjectSchema;
import java.util.Map;

/**
 * Describes a tool/function for the LLM
 * Includes name, description, parameters schema, and provider-specific metadata
 */
public class ToolSpecification {
    /**
     * Get tool name
     * @return Tool name
     */
    public String name();

    /**
     * Get tool description
     * @return Tool description
     */
    public String description();

    /**
     * Get parameters schema
     * @return JSON object schema for parameters
     */
    public JsonObjectSchema parameters();

    /**
     * Get provider-specific metadata
     * @return Metadata map
     */
    public Map<String, Object> metadata();

    /**
     * Create builder for modification
     * @return Builder with current values
     */
    public Builder toBuilder();

    /**
     * Create new builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety:

ToolSpecification is immutable once built
Thread-safe for read operations
metadata() returns unmodifiable map
Safe to share across threads

Common Pitfalls:

DO NOT use vague descriptions - LLM relies on description for tool selection
DO NOT forget to mark required parameters in schema
DO NOT use overly complex parameter schemas (keep simple)
DO NOT use generic names like "tool1" - be specific
DO NOT skip parameter descriptions - help LLM understand usage

Edge Cases:

Empty parameters schema is valid for parameterless tools
Very long descriptions may be truncated by some models
Metadata is optional (can be null or empty)
Parameter schema can nest objects (but keep reasonable depth)

Exception Handling:

Builder throws IllegalStateException if required fields missing
No exceptions during read operations

Performance Notes:

ToolSpecification is lightweight
Serialized to JSON when sent to LLM
Complex schemas increase token usage
Keep descriptions concise (under 200 chars recommended)

Best Practices for Descriptions:

Be specific about what the tool does
Include when to use it
Mention any limitations
Use active voice
Example: "Gets current weather for a city. Use when user asks about weather conditions. Requires city name."

Usage Example:

// Manual creation (usually auto-generated from @Tool)
ToolSpecification spec = ToolSpecification.builder()
    .name("getCurrentWeather")
    .description("Get current weather for a specific location")
    .parameters(JsonObjectSchema.builder()
        .addStringProperty("location", "City name (e.g., 'San Francisco')")
        .addEnumProperty("unit", List.of("celsius", "fahrenheit"), "Temperature unit")
        .required("location")
        .build())
    .build();

// Typically auto-generated from @Tool annotation
@Tool("Get current weather for a specific location")
public String getCurrentWeather(
    @P("City name") String location,
    @P("Temperature unit") TemperatureUnit unit
) {
    // Implementation
}
// Framework generates ToolSpecification automatically

// Use with AiServices
List<ToolSpecification> tools = List.of(spec);
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(toolImplementations)
    .build();

Testing Patterns:

@Test
void testToolSpecification() {
    ToolSpecification spec = ToolSpecification.builder()
        .name("testTool")
        .description("Test description")
        .parameters(JsonObjectSchema.builder()
            .addStringProperty("param1")
            .required("param1")
            .build())
        .build();

    assertEquals("testTool", spec.name());
    assertEquals("Test description", spec.description());
    assertNotNull(spec.parameters());
    assertTrue(spec.parameters().required().contains("param1"));
}

Related APIs:

ToolExecutionRequest - Request to execute tool
@Tool - Annotation for defining tools
JsonObjectSchema - Parameter schema definition
AiServices - Handles tool integration

ToolSpecification Builder

/**
 * Builder for ToolSpecification
 */
public static final class Builder {
    /**
     * Set tool name
     * @param name Tool name
     * @return Builder instance
     */
    public Builder name(String name);

    /**
     * Set tool description
     * @param description Tool description
     * @return Builder instance
     */
    public Builder description(String description);

    /**
     * Set parameters schema
     * @param parameters JSON object schema
     * @return Builder instance
     */
    public Builder parameters(JsonObjectSchema parameters);

    /**
     * Set metadata map
     * @param metadata Provider-specific metadata
     * @return Builder instance
     */
    public Builder metadata(Map<String, Object> metadata);

    /**
     * Add single metadata entry
     * @param key Metadata key
     * @param value Metadata value
     * @return Builder instance
     */
    public Builder addMetadata(String key, Object value);

    /**
     * Build the specification
     * @return ToolSpecification instance
     */
    public ToolSpecification build();
}

Thread Safety:

Builder is NOT thread-safe
Build once per thread
Built ToolSpecification is immutable and thread-safe

Common Pitfalls:

DO NOT forget name and description (required)
DO NOT skip parameters schema for tools with parameters
DO NOT add provider-specific metadata unless necessary

Exception Handling:

build() throws IllegalStateException if name or description is null

Related APIs:

ToolSpecification - Built specification object
JsonObjectSchema.Builder - Build parameter schemas

JsonObjectSchema

JSON object schema for defining tool parameters and structured output formats.

package dev.langchain4j.model.chat.request.json;

import java.util.List;
import java.util.Map;

/**
 * JSON object schema for tool parameters and structured outputs
 */
public class JsonObjectSchema implements JsonSchemaElement {
    /**
     * Get schema description
     * @return Description
     */
    public String description();

    /**
     * Get properties
     * @return Map of property name to schema element
     */
    public Map<String, JsonSchemaElement> properties();

    /**
     * Get required property names
     * @return List of required property names
     */
    public List<String> required();

    /**
     * Check if additional properties allowed
     * @return true if additional properties allowed, false if not, null if unspecified
     */
    public Boolean additionalProperties();

    /**
     * Get schema definitions
     * @return Map of definition name to schema element
     */
    public Map<String, JsonSchemaElement> definitions();

    /**
     * Create builder for modification
     * @return Builder with current values
     */
    public Builder toBuilder();

    /**
     * Create new builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety:

JsonObjectSchema is immutable once built
Thread-safe for read operations
properties() and definitions() return unmodifiable maps
Safe to share across threads

Common Pitfalls:

DO NOT forget to mark required properties
DO NOT use overly nested schemas (limit to 3-4 levels)
DO NOT skip property descriptions - helps LLM understand
DO NOT use additionalProperties=true unless necessary (less strict validation)
DO NOT create circular references in definitions

Edge Cases:

Empty properties map is valid (for objects with no properties)
Empty required list means all properties optional
Null additionalProperties means unspecified (provider default)
Definitions allow schema reuse (like JSON Schema $ref)

Exception Handling:

Builder validation occurs at build() time
No exceptions during read operations

Performance Notes:

Schema serialization impacts token count
Keep schemas minimal for efficiency
Reuse definitions for common structures
Deeply nested schemas slow parsing

JSON Schema Compliance:

Follows JSON Schema Draft 7 (subset)
Supports common types: string, number, integer, boolean, array, object
Supports enum for string values
Supports required properties
Limited support for advanced features (patterns, formats, etc.)

Usage Example:

// Simple schema
JsonObjectSchema simple = JsonObjectSchema.builder()
    .addStringProperty("name", "Person's name")
    .addIntegerProperty("age", "Person's age")
    .required("name")
    .build();

// Complex schema with nesting
JsonObjectSchema complex = JsonObjectSchema.builder()
    .description("User profile data")
    .addStringProperty("username", "Unique username")
    .addStringProperty("email", "Email address")
    .addProperty("address", JsonObjectSchema.builder()
        .addStringProperty("street")
        .addStringProperty("city")
        .addStringProperty("zipCode")
        .required("street", "city")
        .build())
    .addEnumProperty("status",
        List.of("active", "inactive", "suspended"),
        "Account status")
    .required("username", "email")
    .additionalProperties(false) // Strict validation
    .build();

// Schema with reusable definitions
JsonObjectSchema withDefs = JsonObjectSchema.builder()
    .definitions(Map.of(
        "Address", JsonObjectSchema.builder()
            .addStringProperty("street")
            .addStringProperty("city")
            .build()
    ))
    // Use definition in properties...
    .build();

Testing Patterns:

@Test
void testJsonObjectSchema() {
    JsonObjectSchema schema = JsonObjectSchema.builder()
        .addStringProperty("prop1", "Description")
        .addIntegerProperty("prop2")
        .required("prop1")
        .build();

    assertEquals(2, schema.properties().size());
    assertTrue(schema.properties().containsKey("prop1"));
    assertEquals(1, schema.required().size());
    assertTrue(schema.required().contains("prop1"));
}

Related APIs:

JsonSchemaElement - Base interface for all schema types
ToolSpecification - Uses JsonObjectSchema for parameters
StructuredOutputParser - Parses output based on schema

JsonObjectSchema Builder

/**
 * Builder for JsonObjectSchema
 */
public static class Builder {
    /**
     * Set schema description
     * @param description Description
     * @return Builder instance
     */
    public Builder description(String description);

    /**
     * Add properties
     * @param properties Map of properties
     * @return Builder instance
     */
    public Builder addProperties(Map<String, JsonSchemaElement> properties);

    /**
     * Add single property
     * @param name Property name
     * @param jsonSchemaElement Property schema
     * @return Builder instance
     */
    public Builder addProperty(String name, JsonSchemaElement jsonSchemaElement);

    /**
     * Add string property
     * @param name Property name
     * @return Builder instance
     */
    public Builder addStringProperty(String name);

    /**
     * Add string property with description
     * @param name Property name
     * @param description Property description
     * @return Builder instance
     */
    public Builder addStringProperty(String name, String description);

    /**
     * Add integer property
     * @param name Property name
     * @return Builder instance
     */
    public Builder addIntegerProperty(String name);

    /**
     * Add integer property with description
     * @param name Property name
     * @param description Property description
     * @return Builder instance
     */
    public Builder addIntegerProperty(String name, String description);

    /**
     * Add number property
     * @param name Property name
     * @return Builder instance
     */
    public Builder addNumberProperty(String name);

    /**
     * Add number property with description
     * @param name Property name
     * @param description Property description
     * @return Builder instance
     */
    public Builder addNumberProperty(String name, String description);

    /**
     * Add boolean property
     * @param name Property name
     * @return Builder instance
     */
    public Builder addBooleanProperty(String name);

    /**
     * Add boolean property with description
     * @param name Property name
     * @param description Property description
     * @return Builder instance
     */
    public Builder addBooleanProperty(String name, String description);

    /**
     * Add enum property
     * @param name Property name
     * @param enumValues Allowed values
     * @return Builder instance
     */
    public Builder addEnumProperty(String name, List<String> enumValues);

    /**
     * Add enum property with description
     * @param name Property name
     * @param enumValues Allowed values
     * @param description Property description
     * @return Builder instance
     */
    public Builder addEnumProperty(String name, List<String> enumValues, String description);

    /**
     * Set required properties
     * @param required List of required property names
     * @return Builder instance
     */
    public Builder required(List<String> required);

    /**
     * Set required properties (varargs)
     * @param required Required property names
     * @return Builder instance
     */
    public Builder required(String... required);

    /**
     * Set additional properties allowed
     * @param additionalProperties true to allow, false to disallow
     * @return Builder instance
     */
    public Builder additionalProperties(Boolean additionalProperties);

    /**
     * Set definitions
     * @param definitions Map of definitions
     * @return Builder instance
     */
    public Builder definitions(Map<String, JsonSchemaElement> definitions);

    /**
     * Build the schema
     * @return JsonObjectSchema instance
     */
    public JsonObjectSchema build();
}

Thread Safety:

Builder is NOT thread-safe
Build once per thread
Built JsonObjectSchema is immutable and thread-safe

Common Pitfalls:

DO NOT call required() with non-existent property names
DO NOT forget descriptions for better LLM understanding
DO NOT mix addProperty with convenience methods inconsistently
DO NOT add same property twice (last wins)

Exception Handling:

No validation during add operations
Validation occurs at build() time (if any)

Performance Notes:

Convenience methods (addStringProperty, etc.) are preferred for readability
No significant performance difference between methods
Build() is O(n) where n is number of properties

Related APIs:

JsonObjectSchema - Built schema object
JsonSchemaElement - Base type for property schemas

JsonSchemaElement

Base interface for all JSON schema types.

package dev.langchain4j.model.chat.request.json;

/**
 * Base interface for JSON schema elements
 * Implementations: JsonObjectSchema, JsonArraySchema, JsonStringSchema,
 * JsonIntegerSchema, JsonNumberSchema, JsonBooleanSchema, JsonEnumSchema, etc.
 */
public interface JsonSchemaElement {
    /**
     * Get element description
     * @return Description or null
     */
    String description();
}

Thread Safety:

All implementations are immutable
Thread-safe for read operations

Common Pitfalls:

DO NOT forget this is just a base interface - use concrete types
DO NOT assume description() is never null

Exception Handling:

No exceptions thrown by interface methods

Implementations:

JsonObjectSchema - Object type with properties
JsonArraySchema - Array type with item schema
JsonStringSchema - String type
JsonIntegerSchema - Integer number type
JsonNumberSchema - Floating point number type
JsonBooleanSchema - Boolean type
JsonEnumSchema - String with enumerated values

Usage Example:

// Typically used via JsonObjectSchema builder
JsonSchemaElement stringElement = JsonStringSchema.builder()
    .description("A string value")
    .build();

JsonSchemaElement objectElement = JsonObjectSchema.builder()
    .addProperty("field", stringElement)
    .build();

Related APIs:

JsonObjectSchema - Most common implementation
All specific schema type implementations

Image

Represents an image with URL or base64 data.

package dev.langchain4j.data.image;

import java.net.URI;

/**
 * Represents an image with URL or base64 data
 */
public final class Image {
    /**
     * Get image URL
     * @return Image URL or null
     */
    public URI url();

    /**
     * Get base64-encoded image data
     * @return Base64 data or null
     */
    public String base64Data();

    /**
     * Get MIME type
     * @return MIME type or null
     */
    public String mimeType();

    /**
     * Get revised prompt (for image generation)
     * @return Revised prompt or null
     */
    public String revisedPrompt();

    /**
     * Create builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety:

Image is immutable once built
Thread-safe for read operations
Safe to share across threads

Common Pitfalls:

DO NOT assume both url() and base64Data() are non-null (only one should be set)
DO NOT forget mimeType when using base64Data
DO NOT load very large images into base64 (memory intensive)
DO NOT forget to validate URL accessibility
DO NOT assume revisedPrompt is set (only for image generation)

Edge Cases:

Either url or base64Data should be set, not both
mimeType is required for base64Data but optional for URLs
revisedPrompt is only populated by image generation models
Very large base64 strings can cause memory issues
URL images require network access

Exception Handling:

Builder throws IllegalStateException if neither url nor base64Data set
No exceptions during read operations
URL validation not performed (may be invalid)

Performance Notes:

URL-based images don't consume memory (just reference)
base64Data can be large (1MB+ for high-res images)
Consider streaming for large images
URL loading is lazy (not loaded until used)

Supported MIME Types:

image/png - PNG format
image/jpeg - JPEG format
image/gif - GIF format
image/webp - WebP format
Model-specific support varies

Usage Example:

// Image from URL
Image urlImage = Image.builder()
    .url("https://example.com/image.png")
    .mimeType("image/png") // Optional for URLs
    .build();

// Image from base64 data
String base64 = "iVBORw0KGgoAAAANSUhEUgAAAAUA...";
Image base64Image = Image.builder()
    .base64Data(base64)
    .mimeType("image/png") // Required for base64
    .build();

// Generated image with revised prompt
Image generated = Image.builder()
    .url(generatedUrl)
    .revisedPrompt("A photo of a cat (revised by model)")
    .build();

// Use with vision models
Response<AiMessage> response = visionModel.generate(
    UserMessage.from(
        TextContent.from("What's in this image?"),
        ImageContent.from(image)
    )
);

Testing Patterns:

@Test
void testImage() {
    Image image = Image.builder()
        .url("https://example.com/test.png")
        .mimeType("image/png")
        .build();

    assertNotNull(image.url());
    assertEquals("image/png", image.mimeType());
    assertNull(image.base64Data()); // Only URL set
}

Related APIs:

ImageContent - Wraps Image for messages
UserMessage - Can contain image content
ImageModel - Generates images
Vision models (ChatModel with vision support)

Image Builder

/**
 * Builder for Image
 */
public static class Builder {
    /**
     * Set image URL
     * @param url Image URL
     * @return Builder instance
     */
    public Builder url(URI url);

    /**
     * Set image URL from string
     * @param url Image URL string
     * @return Builder instance
     */
    public Builder url(String url);

    /**
     * Set base64 data
     * @param base64Data Base64-encoded image data
     * @return Builder instance
     */
    public Builder base64Data(String base64Data);

    /**
     * Set MIME type
     * @param mimeType MIME type (e.g., "image/png")
     * @return Builder instance
     */
    public Builder mimeType(String mimeType);

    /**
     * Set revised prompt
     * @param revisedPrompt Revised prompt from image generation
     * @return Builder instance
     */
    public Builder revisedPrompt(String revisedPrompt);

    /**
     * Build the image
     * @return Image instance
     */
    public Image build();
}

Thread Safety:

Builder is NOT thread-safe
Build once per thread
Built Image is immutable and thread-safe

Common Pitfalls:

DO NOT set both url and base64Data (only one)
DO NOT forget mimeType for base64Data
DO NOT skip validation of URL format

Exception Handling:

build() throws IllegalStateException if neither url nor base64Data set
url(String) may throw URISyntaxException for invalid URLs

Related APIs:

Image - Built image object

Related APIs

Document Processing

DocumentParser - Parse files into Documents
DocumentSplitter - Split documents into segments
DocumentTransformer - Transform document content

Embeddings

EmbeddingModel - Generate embeddings
EmbeddingStore - Store and retrieve embeddings
ContentRetriever - Semantic search using embeddings

Tools

@Tool - Annotation for defining tools
ToolExecutionResult - Result of tool execution
AiServices - Automatic tool integration

Structured Output

OutputParser - Parse structured output
StructuredOutputParser - Schema-based parsing
@StructuredPrompt - Structured input templates

Testing Patterns

Testing with Immutable Types

@Test
void testImmutableTypes() {
    // Document and TextSegment are safe to share
    Document doc = Document.from("text");
    TextSegment segment = TextSegment.from("text");

    // Safe to pass to multiple threads
    executor.submit(() -> processDocument(doc));
    executor.submit(() -> processSegment(segment));
}

Testing with Mutable Types

@Test
void testMutableTypes() {
    // Metadata and Embedding are mutable
    Metadata metadata = new Metadata().put("key", "value");

    // Create copy for thread safety
    Metadata copy = metadata.copy();

    // Safe to modify independently
    metadata.put("key", "modified");
    assertEquals("value", copy.getString("key"));
}

Testing Metadata Thread Safety

@Test
void testMetadataThreadSafety() throws Exception {
    Metadata shared = new Metadata().put("counter", 0);

    // UNSAFE: concurrent modification
    // CountDownLatch latch = new CountDownLatch(2);
    // executor.submit(() -> {
    //     for (int i = 0; i < 1000; i++) {
    //         shared.put("counter", shared.getInteger("counter") + 1);
    //     }
    //     latch.countDown();
    // });
    // Result: race condition, lost updates

    // SAFE: use copies
    CountDownLatch latch = new CountDownLatch(2);
    List<Metadata> results = new CopyOnWriteArrayList<>();

    executor.submit(() -> {
        Metadata local = shared.copy();
        for (int i = 0; i < 1000; i++) {
            local.put("counter", local.getInteger("counter") + 1);
        }
        results.add(local);
        latch.countDown();
    });

    latch.await();
    // Merge results as needed
}

Performance Tips

Prefer Immutable Types: Document, TextSegment, Image are immutable - safe and fast
Use copy() for Metadata: Create independent copies for thread safety
Avoid vectorAsList(): Use vector() for embeddings (no boxing overhead)
Normalize Once: Call embedding.normalize() only once
Batch Operations: Process multiple segments/embeddings together
Reasonable Segment Sizes: 512-1024 tokens for best RAG performance
Reuse Tool Specifications: Create once, use many times
Keep Schemas Simple: Simpler schemas = faster parsing and lower token usage
Use URL Images: Prefer URL over base64 for large images
Cache Parsed Arguments: Don't re-parse tool arguments if used multiple times

Common Integration Patterns

RAG Pipeline

// 1. Load and split document
Document document = DocumentParser.parse(file);
List<TextSegment> segments = splitter.split(document);

// 2. Embed and store
for (TextSegment segment : segments) {
    Embedding embedding = embeddingModel.embed(segment).content();
    embeddingStore.add(embedding, segment);
}

// 3. Retrieve and generate
List<EmbeddingMatch<TextSegment>> relevant =
    embeddingStore.findRelevant(queryEmbedding, 5);
String response = chatModel.generate(buildPrompt(relevant));

Tool Execution

// 1. Define tools with @Tool annotation
@Tool("Get current weather")
String getWeather(@P("City name") String city) {
    return weatherService.getWeather(city);
}

// 2. Framework generates ToolSpecification
// 3. Model generates ToolExecutionRequest
// 4. Framework executes and returns ToolExecutionResult
// All handled automatically by AiServices

Structured Output

// 1. Define schema
JsonObjectSchema schema = JsonObjectSchema.builder()
    .addStringProperty("name", "Person name")
    .addIntegerProperty("age", "Person age")
    .required("name", "age")
    .build();

// 2. Request structured output
@UserMessage("Extract person info: {{text}}")
@StructuredOutput
Person extractPerson(@V("text") String text);

// 3. Automatic parsing to POJO
Person person = assistant.extractPerson("John is 30 years old");

This completes the comprehensive data types documentation with production-grade details for coding agents.

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j@1.11.0

docs

document-processing.md

tessl/maven-dev-langchain4j--langchain4j

data-types.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Data Types

Overview

Capabilities

Document

Metadata

TextSegment

Embedding

ToolExecutionRequest

ToolExecutionRequest Builder

ToolSpecification

ToolSpecification Builder

JsonObjectSchema

JsonObjectSchema Builder

JsonSchemaElement

Image

Image Builder

Related APIs

Document Processing

Embeddings

Tools

Structured Output

Testing Patterns

Testing with Immutable Types

Testing with Mutable Types

Testing Metadata Thread Safety

Performance Tips

Common Integration Patterns

RAG Pipeline

Tool Execution

Structured Output

data-types.mddocs/