CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-weaviate

LangChain4j integration for Weaviate vector database enabling embedding storage and similarity search in Java applications

Pending
Overview
Eval results
Files

index.mddocs/

LangChain4j Weaviate Integration

Version: 1.11.0 (unreleased development version - see Installation section)

A Java integration library that connects LangChain4j with Weaviate vector database, enabling vector embedding storage, retrieval, and similarity search for building semantic search applications, retrieval-augmented generation (RAG) systems, and AI-powered applications.

Package Information

  • Package Name: langchain4j-weaviate
  • Package Type: Maven
  • Group ID: dev.langchain4j
  • Artifact ID: langchain4j-weaviate
  • Language: Java

Installation

Note: This documentation describes version 1.11.0 which corresponds to an unreleased development version (1.11.0-beta19) and is not yet available in Maven Central. This version includes enhanced configuration options and API improvements not available in earlier releases.

For Development/Source Build (version 1.11.0):

Clone and build from source:

git clone https://github.com/langchain4j/langchain4j.git
cd langchain4j
git checkout 1.11.0
mvn clean install -DskipTests

Then add to your Maven pom.xml:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-weaviate</artifactId>
    <version>1.11.0-beta19</version>
</dependency>

Or for Gradle:

implementation 'dev.langchain4j:langchain4j-weaviate:1.11.0-beta19'

For Production Use (latest published version):

Check Maven Central for the latest published release:

  • Maven Central Repository

Note that published versions may have a different API surface than documented here. For example, version 0.34.0 does not include textFieldName(), metadataFieldName(), and other configuration methods documented in this tile.

Core Imports

import dev.langchain4j.store.embedding.weaviate.WeaviateEmbeddingStore;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;

Basic Usage

Creating a Store (Builder Pattern - Recommended)

import dev.langchain4j.store.embedding.weaviate.WeaviateEmbeddingStore;

// For cloud Weaviate deployment
WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
    .apiKey("your-api-key")
    .scheme("https")
    .host("your-cluster.weaviate.network")
    .objectClass("Document")
    .build();

// For local Weaviate deployment (no API key required)
WeaviateEmbeddingStore localStore = WeaviateEmbeddingStore.builder()
    .scheme("http")
    .host("localhost")
    .port(8080)
    .objectClass("Document")
    .build();

Adding Embeddings

import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;

// Add single embedding with auto-generated ID
Embedding embedding = new Embedding(new float[]{0.1f, 0.2f, 0.3f});
String id = store.add(embedding);

// Add embedding with text segment
TextSegment segment = TextSegment.from("This is a document about AI");
String idWithText = store.add(embedding, segment);

// Add multiple embeddings
List<Embedding> embeddings = Arrays.asList(embedding1, embedding2, embedding3);
List<String> ids = store.addAll(embeddings);

// Add embeddings with text segments
List<TextSegment> segments = Arrays.asList(segment1, segment2, segment3);
store.addAll(ids, embeddings, segments);

Searching for Similar Embeddings

import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingMatch;

// Create query embedding
Embedding queryEmbedding = new Embedding(new float[]{0.15f, 0.25f, 0.35f});

// Build search request
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(5)
    .minScore(0.7)
    .build();

// Execute search
EmbeddingSearchResult<TextSegment> result = store.search(request);

// Process results
for (EmbeddingMatch<TextSegment> match : result.matches()) {
    String embeddingId = match.embeddingId();
    double score = match.score(); // Weaviate's certainty metric (0.0 to 1.0)
    TextSegment segment = match.embedded();
    System.out.println("Found match with score: " + score);
}

Removing Embeddings

// Remove by specific IDs
store.removeAll(Arrays.asList("id1", "id2", "id3"));

// Remove all embeddings
store.removeAll();

// Remove by filter
Filter filter = metadataKey("category").isEqualTo("outdated");
store.removeAll(filter);

Architecture

The langchain4j-weaviate integration provides a bridge between LangChain4j's EmbeddingStore interface and Weaviate's vector database capabilities:

  • EmbeddingStore Implementation: Implements the standard EmbeddingStore<TextSegment> interface from langchain4j-core, providing seamless integration with LangChain4j's RAG and semantic search workflows
  • Dual Protocol Support: Supports both HTTP (for all operations) and GRPC (for batch inserts only), with GRPC offering better performance for large-scale data ingestion
  • Consistency Model: Configurable consistency levels (ONE, QUORUM, ALL) for distributed operations, allowing trade-offs between performance and data consistency across Weaviate replicas
  • Duplicate Prevention: Content-based ID hashing strategy (when avoidDups=true) generates deterministic UUIDs from content, preventing duplicate entries while maintaining Weaviate's UUID requirements
  • Metadata Handling: Flexible metadata storage supporting both nested (under _metadata field) and root-level approaches, with selective persistence via metadataKeys configuration
  • Client-Side Operations: Search filtering and some removal operations are evaluated client-side, which is important for performance considerations on large datasets

This design enables efficient vector similarity search while maintaining flexibility for various deployment scenarios (cloud vs local, high-consistency vs high-performance).

Capabilities

Store Creation and Configuration

Create and configure a WeaviateEmbeddingStore instance to connect to Weaviate vector database with flexible connection options including HTTP/GRPC protocols, authentication, consistency levels, and metadata handling.

Builder Pattern Constructor

/**
 * Returns a builder for constructing WeaviateEmbeddingStore instances.
 *
 * @return WeaviateEmbeddingStoreBuilder instance
 */
public static WeaviateEmbeddingStoreBuilder builder()

Direct Constructor

/**
 * Creates a new WeaviateEmbeddingStore instance with full configuration options.
 *
 * @param apiKey            Weaviate API key for authentication (null for local deployments)
 * @param scheme            Connection scheme ("https" or "http")
 * @param host              Weaviate cluster host address
 * @param port              HTTP port number (optional, can be null)
 * @param useGrpcForInserts Use GRPC protocol for batch inserts (HTTP still required for search)
 * @param securedGrpc       Enable secured GRPC connection
 * @param grpcPort          GRPC port number (optional, defaults to 50051)
 * @param objectClass       Weaviate object class name (must start with uppercase letter)
 * @param avoidDups         Generate content-based hashed IDs to prevent duplicates (default: true)
 * @param consistencyLevel  Consistency level: "ONE", "QUORUM" (default), or "ALL"
 * @param metadataKeys      Collection of metadata keys to persist (can be empty)
 * @param textFieldName     Field name for text content storage (default: "text")
 * @param metadataFieldName Field name for metadata storage (default: "_metadata", empty string for root level)
 */
public WeaviateEmbeddingStore(
    String apiKey,
    String scheme,
    String host,
    Integer port,
    Boolean useGrpcForInserts,
    Boolean securedGrpc,
    Integer grpcPort,
    String objectClass,
    Boolean avoidDups,
    String consistencyLevel,
    Collection<String> metadataKeys,
    String textFieldName,
    String metadataFieldName
)

Builder Configuration Methods

public class WeaviateEmbeddingStoreBuilder {

    /**
     * Sets the API key for Weaviate authentication.
     *
     * @param apiKey API key string
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder apiKey(String apiKey)

    /**
     * Sets the connection scheme.
     *
     * @param scheme "https" or "http"
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder scheme(String scheme)

    /**
     * Sets the Weaviate host address.
     *
     * @param host Host address (e.g., "cluster.weaviate.network" or "localhost")
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder host(String host)

    /**
     * Sets the HTTP port number.
     *
     * @param port Port number (e.g., 8080)
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder port(Integer port)

    /**
     * Enables GRPC protocol for batch insert operations.
     * Note: HTTP is still required for search operations.
     *
     * @param useGrpcForInserts true to use GRPC for inserts
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder useGrpcForInserts(Boolean useGrpcForInserts)

    /**
     * Enables secured GRPC connection.
     *
     * @param securedGrpc true for secured GRPC
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder securedGrpc(Boolean securedGrpc)

    /**
     * Sets the GRPC port number.
     *
     * @param grpcPort GRPC port (defaults to 50051 if not specified)
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder grpcPort(Integer grpcPort)

    /**
     * Sets the Weaviate object class name.
     * Must start with an uppercase letter.
     *
     * @param objectClass Object class name (e.g., "Document", "Embedding")
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder objectClass(String objectClass)

    /**
     * Enables duplicate avoidance through content-based ID hashing.
     * When true, generates deterministic IDs from content hash.
     * When false, generates random UUIDs.
     *
     * @param avoidDups true to avoid duplicates (default: true)
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder avoidDups(Boolean avoidDups)

    /**
     * Sets the consistency level for distributed operations.
     *
     * @param consistencyLevel "ONE" (single replica), "QUORUM" (majority, default), or "ALL" (all replicas)
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder consistencyLevel(String consistencyLevel)

    /**
     * Specifies which metadata keys should be persisted to Weaviate.
     * Only listed keys will be stored.
     *
     * @param metadataKeys Collection of metadata key names
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder metadataKeys(Collection<String> metadataKeys)

    /**
     * Sets the field name for storing text content.
     *
     * @param textFieldName Field name (default: "text")
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder textFieldName(String textFieldName)

    /**
     * Sets the field name for metadata storage.
     * Use "_metadata" for nested storage (default).
     * Use empty string to store metadata at root level.
     *
     * @param metadataFieldName Field name for metadata
     * @return this builder
     */
    public WeaviateEmbeddingStoreBuilder metadataFieldName(String metadataFieldName)

    /**
     * Builds and returns the configured WeaviateEmbeddingStore instance.
     *
     * @return Configured WeaviateEmbeddingStore
     */
    public WeaviateEmbeddingStore build()
}

Configuration Defaults:

ParameterDefault ValueNotes
objectClass"Default"Must start with uppercase
avoidDupstrueContent-based hashing enabled
consistencyLevel"QUORUM"Majority replica acknowledgment
metadataFieldName"_metadata"Nested metadata storage
textFieldName"text"Default text field name
grpcPort50051Standard GRPC port
securedGrpcfalseGRPC security disabled
useGrpcForInsertsfalseHTTP used by default
metadataKeysempty collectionNo metadata persisted

Adding Embeddings

Store vector embeddings with optional text segments and metadata in Weaviate database. Supports single and batch operations with automatic or custom ID assignment.

Add Single Embedding (Auto-generated ID)

/**
 * Adds a single embedding to the store with automatically generated UUID.
 *
 * @param embedding The embedding vector to store
 * @return Generated UUID string for the stored embedding
 */
public String add(Embedding embedding)

Add Embedding with Specific ID

/**
 * Adds an embedding with a specific UUID.
 * ID must be in valid UUID format (Weaviate requirement).
 *
 * @param id        UUID identifier (must be valid UUID format)
 * @param embedding The embedding vector to store
 */
public void add(String id, Embedding embedding)

Add Embedding with Text Segment

/**
 * Adds an embedding along with its associated text segment and metadata.
 * If avoidDups is true, generates deterministic ID from content hash.
 * If avoidDups is false, generates random UUID.
 *
 * @param embedding    The embedding vector
 * @param textSegment  Associated text content and metadata
 * @return Generated or computed UUID string
 */
public String add(Embedding embedding, TextSegment textSegment)

Add Multiple Embeddings (Batch)

/**
 * Adds multiple embeddings in batch with auto-generated IDs.
 *
 * @param embeddings List of embedding vectors to store
 * @return List of generated UUID strings in corresponding order
 */
public List<String> addAll(List<Embedding> embeddings)

Add Multiple Embeddings with IDs and Text Segments

/**
 * Adds multiple embeddings with specific IDs and optional text segments in batch.
 * All lists must have the same size.
 *
 * @param ids       List of UUID identifiers
 * @param embeddings List of embedding vectors
 * @param embedded  List of text segments (can contain null entries)
 */
public void addAll(List<String> ids, List<Embedding> embeddings, List<TextSegment> embedded)

Important Notes:

  • All IDs must be in valid UUID format (e.g., "550e8400-e29b-41d4-a716-446655440000")
  • When avoidDups=true, IDs are generated from content hash for duplicate prevention
  • When avoidDups=false, random UUIDs are generated
  • GRPC can be used for batch inserts if configured via useGrpcForInserts(true)

Searching for Similar Embeddings

Perform vector similarity search to find embeddings most similar to a query embedding. Returns matches with Weaviate's certainty score and associated text segments.

/**
 * Searches for embeddings similar to the query embedding using vector similarity.
 *
 * The score in each EmbeddingMatch represents Weaviate's certainty metric,
 * which ranges from 0.0 (least certain) to 1.0 (most certain).
 *
 * This implementation assumes cosine distance metric is used in Weaviate.
 *
 * @param request Search request containing query embedding and search parameters
 * @return EmbeddingSearchResult containing matched embeddings with text segments and scores
 */
public EmbeddingSearchResult<TextSegment> search(EmbeddingSearchRequest request)

Search Request Parameters:

The EmbeddingSearchRequest object can be constructed using its builder with the following parameters:

  • queryEmbedding (required): The embedding vector to search for
  • maxResults: Maximum number of results to return (default varies by implementation)
  • minScore: Minimum certainty score threshold (0.0 to 1.0)
  • filter: Optional metadata filter to restrict search scope

Search Result Structure:

The EmbeddingSearchResult<TextSegment> contains:

  • matches(): List of EmbeddingMatch<TextSegment> objects, each containing:
    • embeddingId(): UUID of the matched embedding
    • score(): Weaviate's certainty score (0.0 to 1.0)
    • embedding(): The matched embedding vector
    • embedded(): Associated TextSegment with text and metadata

Example:

EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(10)
    .minScore(0.7)
    .build();

EmbeddingSearchResult<TextSegment> result = store.search(request);

for (EmbeddingMatch<TextSegment> match : result.matches()) {
    String id = match.embeddingId();
    double certainty = match.score(); // 0.0 to 1.0
    TextSegment segment = match.embedded();
    String text = segment.text();
}

Removing Embeddings

Delete embeddings from the store by ID, by filter criteria, or remove all embeddings. Supports both targeted and bulk removal operations.

Remove by IDs

/**
 * Removes embeddings by their UUID identifiers.
 *
 * @param ids Collection of UUID strings to remove
 * @throws IllegalArgumentException if ids is null or empty
 */
public void removeAll(Collection<String> ids)

Remove All Embeddings

/**
 * Removes all embeddings from the store.
 * This operation retrieves all IDs first, then deletes them in batch.
 */
public void removeAll()

Remove by Filter

/**
 * Removes embeddings that match the specified filter criteria.
 * The filter is evaluated against metadata associated with each embedding.
 *
 * @param filter Filter criteria for selecting embeddings to remove
 * @throws IllegalArgumentException if filter is null
 */
public void removeAll(Filter filter)

Filter Examples:

import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;

// Remove by metadata value
Filter categoryFilter = metadataKey("category").isEqualTo("temporary");
store.removeAll(categoryFilter);

// Remove by date
Filter dateFilter = metadataKey("created").isLessThan("2024-01-01");
store.removeAll(dateFilter);

// Complex filter with AND/OR
Filter complexFilter = metadataKey("status").isEqualTo("archived")
    .and(metadataKey("age").isGreaterThan(365));
store.removeAll(complexFilter);

Important Notes:

  • Remove operations respect the configured consistencyLevel
  • removeAll() without parameters retrieves all IDs first, which may be slow for large datasets
  • removeAll(Filter) evaluates filters client-side after retrieving all objects

Types

Embedding

/**
 * Represents a vector embedding (from langchain4j-core).
 * Encapsulates a float array that captures semantic information of text.
 */
public class Embedding {
    public Embedding(float[] vector)
    public static Embedding from(float[] vector)
    public static Embedding from(List<Float> vector)
    public float[] vector()
    public List<Float> vectorAsList()
}

TextSegment

/**
 * Represents a text segment with optional metadata (from langchain4j-core).
 */
public class TextSegment {
    public static TextSegment from(String text)
    public static TextSegment from(String text, Metadata metadata)
    public String text()
    public Metadata metadata()
}

Metadata

/**
 * Key-value metadata associated with text segments (from langchain4j-core).
 * Supports various data types including String, Integer, Long, Float, Double, and UUID.
 */
public class Metadata {
    public Metadata()
    public Metadata(Map<String, ?> metadata)

    // Typed put methods
    public Metadata put(String key, String value)
    public Metadata put(String key, int value)
    public Metadata put(String key, long value)
    public Metadata put(String key, float value)
    public Metadata put(String key, double value)
    public Metadata put(String key, UUID value)

    // Typed get methods
    public String getString(String key)
    public Integer getInteger(String key)
    public Long getLong(String key)
    public Float getFloat(String key)
    public Double getDouble(String key)
    public UUID getUUID(String key)

    // Utility methods
    public Metadata putAll(Map<String, Object> metadata)
    public Metadata remove(String key)
    public Metadata copy()
    public Map<String, Object> toMap()
}

EmbeddingSearchRequest

/**
 * Search request parameters (from langchain4j-core).
 */
public class EmbeddingSearchRequest {
    public static EmbeddingSearchRequestBuilder builder()

    public static class EmbeddingSearchRequestBuilder {
        public EmbeddingSearchRequestBuilder queryEmbedding(Embedding embedding)
        public EmbeddingSearchRequestBuilder maxResults(Integer maxResults)
        public EmbeddingSearchRequestBuilder minScore(Double minScore)
        public EmbeddingSearchRequestBuilder filter(Filter filter)
        public EmbeddingSearchRequest build()
    }
}

EmbeddingSearchResult

/**
 * Search results container (from langchain4j-core).
 *
 * @param <Embedded> Type of embedded content (typically TextSegment)
 */
public class EmbeddingSearchResult<Embedded> {
    public List<EmbeddingMatch<Embedded>> matches()
}

EmbeddingMatch

/**
 * Individual search result match (from langchain4j-core).
 *
 * @param <Embedded> Type of embedded content (typically TextSegment)
 */
public class EmbeddingMatch<Embedded> {
    public String embeddingId()
    public double score()
    public Embedding embedding()
    public Embedded embedded()
}

Filter

/**
 * Filter interface for querying embeddings by metadata (from langchain4j-core).
 * Filters are constructed using MetadataFilterBuilder.
 */
public interface Filter {
    boolean test(Object object)
    Filter and(Filter filter)
    Filter or(Filter filter)
}

/**
 * Builder for constructing metadata filters (from langchain4j-core).
 * Import: import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;
 */
public class MetadataFilterBuilder {
    public static MetadataFilterBuilder metadataKey(String key)

    // Comparison methods that return Filter
    public Filter isEqualTo(Object value)
    public Filter isNotEqualTo(Object value)
    public Filter isGreaterThan(Comparable value)
    public Filter isGreaterThanOrEqualTo(Comparable value)
    public Filter isLessThan(Comparable value)
    public Filter isLessThanOrEqualTo(Comparable value)
    public Filter isIn(Collection<?> values)
    public Filter isNotIn(Collection<?> values)
}

Configuration Examples

Cloud Weaviate with Full Configuration

WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
    .apiKey("your-weaviate-api-key")
    .scheme("https")
    .host("your-cluster.weaviate.network")
    .port(443)
    .objectClass("DocumentEmbedding")
    .avoidDups(true)
    .consistencyLevel("QUORUM")
    .metadataKeys(Arrays.asList("author", "category", "timestamp"))
    .textFieldName("content")
    .metadataFieldName("_metadata")
    .build();

Local Weaviate with GRPC for Inserts

WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
    .scheme("http")
    .host("localhost")
    .port(8080)
    .useGrpcForInserts(true)
    .securedGrpc(false)
    .grpcPort(50051)
    .objectClass("LocalDocument")
    .build();

Minimal Configuration

// Simplest configuration for local Weaviate
WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
    .scheme("http")
    .host("localhost")
    .objectClass("Embedding")
    .build();

Root-Level Metadata Storage

// Store metadata at root level instead of nested field
WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
    .scheme("https")
    .host("your-cluster.weaviate.network")
    .apiKey("your-api-key")
    .objectClass("Document")
    .metadataFieldName("") // Empty string stores at root
    .metadataKeys(Arrays.asList("title", "author", "date"))
    .build();

High Consistency Configuration

// Require all replicas to acknowledge writes
WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
    .scheme("https")
    .host("your-cluster.weaviate.network")
    .apiKey("your-api-key")
    .objectClass("CriticalData")
    .consistencyLevel("ALL")
    .build();

Important Implementation Notes

  1. Distance Metric: This implementation assumes Weaviate is configured with cosine distance metric

  2. Certainty Score: Search results use Weaviate's "certainty" metric (0.0 to 1.0), not raw distance

  3. UUID Requirement: All embedding IDs must be valid UUIDs (Weaviate requirement)

  4. GRPC Limitation: GRPC can only be used for insert operations; HTTP is still required for search

  5. Duplicate Prevention:

    • When avoidDups=true: IDs generated from content hash (deterministic)
    • When avoidDups=false: Random UUIDs generated
  6. Metadata Handling:

    • Only metadata keys listed in metadataKeys are persisted
    • Can be stored nested (default _metadata field) or at root level (empty metadataFieldName)
  7. Consistency Levels:

    • ONE: Single replica acknowledgment (fastest, least consistent)
    • QUORUM: Majority replica acknowledgment (balanced, default)
    • ALL: All replicas must acknowledge (slowest, most consistent)
  8. Filter Evaluation: The removeAll(Filter) method evaluates filters client-side, which may be inefficient for large datasets

Dependencies

This library requires:

  • dev.langchain4j:langchain4j-core:1.11.0 - Core LangChain4j interfaces and types
  • io.weaviate:client:5.3.0 - Weaviate Java client library

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-weaviate

docs

index.md

tile.json