LangChain4j integration for Weaviate vector database enabling embedding storage and similarity search in Java applications
Version: 1.11.0 (unreleased development version - see Installation section)
A Java integration library that connects LangChain4j with Weaviate vector database, enabling vector embedding storage, retrieval, and similarity search for building semantic search applications, retrieval-augmented generation (RAG) systems, and AI-powered applications.
Note: This documentation describes version 1.11.0 which corresponds to an unreleased development version (1.11.0-beta19) and is not yet available in Maven Central. This version includes enhanced configuration options and API improvements not available in earlier releases.
For Development/Source Build (version 1.11.0):
Clone and build from source:
git clone https://github.com/langchain4j/langchain4j.git
cd langchain4j
git checkout 1.11.0
mvn clean install -DskipTestsThen add to your Maven pom.xml:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-weaviate</artifactId>
<version>1.11.0-beta19</version>
</dependency>Or for Gradle:
implementation 'dev.langchain4j:langchain4j-weaviate:1.11.0-beta19'For Production Use (latest published version):
Check Maven Central for the latest published release:
Note that published versions may have a different API surface than documented here. For example, version 0.34.0 does not include textFieldName(), metadataFieldName(), and other configuration methods documented in this tile.
import dev.langchain4j.store.embedding.weaviate.WeaviateEmbeddingStore;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;import dev.langchain4j.store.embedding.weaviate.WeaviateEmbeddingStore;
// For cloud Weaviate deployment
WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
.apiKey("your-api-key")
.scheme("https")
.host("your-cluster.weaviate.network")
.objectClass("Document")
.build();
// For local Weaviate deployment (no API key required)
WeaviateEmbeddingStore localStore = WeaviateEmbeddingStore.builder()
.scheme("http")
.host("localhost")
.port(8080)
.objectClass("Document")
.build();import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
// Add single embedding with auto-generated ID
Embedding embedding = new Embedding(new float[]{0.1f, 0.2f, 0.3f});
String id = store.add(embedding);
// Add embedding with text segment
TextSegment segment = TextSegment.from("This is a document about AI");
String idWithText = store.add(embedding, segment);
// Add multiple embeddings
List<Embedding> embeddings = Arrays.asList(embedding1, embedding2, embedding3);
List<String> ids = store.addAll(embeddings);
// Add embeddings with text segments
List<TextSegment> segments = Arrays.asList(segment1, segment2, segment3);
store.addAll(ids, embeddings, segments);import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingMatch;
// Create query embedding
Embedding queryEmbedding = new Embedding(new float[]{0.15f, 0.25f, 0.35f});
// Build search request
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(5)
.minScore(0.7)
.build();
// Execute search
EmbeddingSearchResult<TextSegment> result = store.search(request);
// Process results
for (EmbeddingMatch<TextSegment> match : result.matches()) {
String embeddingId = match.embeddingId();
double score = match.score(); // Weaviate's certainty metric (0.0 to 1.0)
TextSegment segment = match.embedded();
System.out.println("Found match with score: " + score);
}// Remove by specific IDs
store.removeAll(Arrays.asList("id1", "id2", "id3"));
// Remove all embeddings
store.removeAll();
// Remove by filter
Filter filter = metadataKey("category").isEqualTo("outdated");
store.removeAll(filter);The langchain4j-weaviate integration provides a bridge between LangChain4j's EmbeddingStore interface and Weaviate's vector database capabilities:
EmbeddingStore<TextSegment> interface from langchain4j-core, providing seamless integration with LangChain4j's RAG and semantic search workflowsavoidDups=true) generates deterministic UUIDs from content, preventing duplicate entries while maintaining Weaviate's UUID requirements_metadata field) and root-level approaches, with selective persistence via metadataKeys configurationThis design enables efficient vector similarity search while maintaining flexibility for various deployment scenarios (cloud vs local, high-consistency vs high-performance).
Create and configure a WeaviateEmbeddingStore instance to connect to Weaviate vector database with flexible connection options including HTTP/GRPC protocols, authentication, consistency levels, and metadata handling.
/**
* Returns a builder for constructing WeaviateEmbeddingStore instances.
*
* @return WeaviateEmbeddingStoreBuilder instance
*/
public static WeaviateEmbeddingStoreBuilder builder()/**
* Creates a new WeaviateEmbeddingStore instance with full configuration options.
*
* @param apiKey Weaviate API key for authentication (null for local deployments)
* @param scheme Connection scheme ("https" or "http")
* @param host Weaviate cluster host address
* @param port HTTP port number (optional, can be null)
* @param useGrpcForInserts Use GRPC protocol for batch inserts (HTTP still required for search)
* @param securedGrpc Enable secured GRPC connection
* @param grpcPort GRPC port number (optional, defaults to 50051)
* @param objectClass Weaviate object class name (must start with uppercase letter)
* @param avoidDups Generate content-based hashed IDs to prevent duplicates (default: true)
* @param consistencyLevel Consistency level: "ONE", "QUORUM" (default), or "ALL"
* @param metadataKeys Collection of metadata keys to persist (can be empty)
* @param textFieldName Field name for text content storage (default: "text")
* @param metadataFieldName Field name for metadata storage (default: "_metadata", empty string for root level)
*/
public WeaviateEmbeddingStore(
String apiKey,
String scheme,
String host,
Integer port,
Boolean useGrpcForInserts,
Boolean securedGrpc,
Integer grpcPort,
String objectClass,
Boolean avoidDups,
String consistencyLevel,
Collection<String> metadataKeys,
String textFieldName,
String metadataFieldName
)public class WeaviateEmbeddingStoreBuilder {
/**
* Sets the API key for Weaviate authentication.
*
* @param apiKey API key string
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder apiKey(String apiKey)
/**
* Sets the connection scheme.
*
* @param scheme "https" or "http"
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder scheme(String scheme)
/**
* Sets the Weaviate host address.
*
* @param host Host address (e.g., "cluster.weaviate.network" or "localhost")
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder host(String host)
/**
* Sets the HTTP port number.
*
* @param port Port number (e.g., 8080)
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder port(Integer port)
/**
* Enables GRPC protocol for batch insert operations.
* Note: HTTP is still required for search operations.
*
* @param useGrpcForInserts true to use GRPC for inserts
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder useGrpcForInserts(Boolean useGrpcForInserts)
/**
* Enables secured GRPC connection.
*
* @param securedGrpc true for secured GRPC
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder securedGrpc(Boolean securedGrpc)
/**
* Sets the GRPC port number.
*
* @param grpcPort GRPC port (defaults to 50051 if not specified)
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder grpcPort(Integer grpcPort)
/**
* Sets the Weaviate object class name.
* Must start with an uppercase letter.
*
* @param objectClass Object class name (e.g., "Document", "Embedding")
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder objectClass(String objectClass)
/**
* Enables duplicate avoidance through content-based ID hashing.
* When true, generates deterministic IDs from content hash.
* When false, generates random UUIDs.
*
* @param avoidDups true to avoid duplicates (default: true)
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder avoidDups(Boolean avoidDups)
/**
* Sets the consistency level for distributed operations.
*
* @param consistencyLevel "ONE" (single replica), "QUORUM" (majority, default), or "ALL" (all replicas)
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder consistencyLevel(String consistencyLevel)
/**
* Specifies which metadata keys should be persisted to Weaviate.
* Only listed keys will be stored.
*
* @param metadataKeys Collection of metadata key names
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder metadataKeys(Collection<String> metadataKeys)
/**
* Sets the field name for storing text content.
*
* @param textFieldName Field name (default: "text")
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder textFieldName(String textFieldName)
/**
* Sets the field name for metadata storage.
* Use "_metadata" for nested storage (default).
* Use empty string to store metadata at root level.
*
* @param metadataFieldName Field name for metadata
* @return this builder
*/
public WeaviateEmbeddingStoreBuilder metadataFieldName(String metadataFieldName)
/**
* Builds and returns the configured WeaviateEmbeddingStore instance.
*
* @return Configured WeaviateEmbeddingStore
*/
public WeaviateEmbeddingStore build()
}Configuration Defaults:
| Parameter | Default Value | Notes |
|---|---|---|
| objectClass | "Default" | Must start with uppercase |
| avoidDups | true | Content-based hashing enabled |
| consistencyLevel | "QUORUM" | Majority replica acknowledgment |
| metadataFieldName | "_metadata" | Nested metadata storage |
| textFieldName | "text" | Default text field name |
| grpcPort | 50051 | Standard GRPC port |
| securedGrpc | false | GRPC security disabled |
| useGrpcForInserts | false | HTTP used by default |
| metadataKeys | empty collection | No metadata persisted |
Store vector embeddings with optional text segments and metadata in Weaviate database. Supports single and batch operations with automatic or custom ID assignment.
/**
* Adds a single embedding to the store with automatically generated UUID.
*
* @param embedding The embedding vector to store
* @return Generated UUID string for the stored embedding
*/
public String add(Embedding embedding)/**
* Adds an embedding with a specific UUID.
* ID must be in valid UUID format (Weaviate requirement).
*
* @param id UUID identifier (must be valid UUID format)
* @param embedding The embedding vector to store
*/
public void add(String id, Embedding embedding)/**
* Adds an embedding along with its associated text segment and metadata.
* If avoidDups is true, generates deterministic ID from content hash.
* If avoidDups is false, generates random UUID.
*
* @param embedding The embedding vector
* @param textSegment Associated text content and metadata
* @return Generated or computed UUID string
*/
public String add(Embedding embedding, TextSegment textSegment)/**
* Adds multiple embeddings in batch with auto-generated IDs.
*
* @param embeddings List of embedding vectors to store
* @return List of generated UUID strings in corresponding order
*/
public List<String> addAll(List<Embedding> embeddings)/**
* Adds multiple embeddings with specific IDs and optional text segments in batch.
* All lists must have the same size.
*
* @param ids List of UUID identifiers
* @param embeddings List of embedding vectors
* @param embedded List of text segments (can contain null entries)
*/
public void addAll(List<String> ids, List<Embedding> embeddings, List<TextSegment> embedded)Important Notes:
avoidDups=true, IDs are generated from content hash for duplicate preventionavoidDups=false, random UUIDs are generateduseGrpcForInserts(true)Perform vector similarity search to find embeddings most similar to a query embedding. Returns matches with Weaviate's certainty score and associated text segments.
/**
* Searches for embeddings similar to the query embedding using vector similarity.
*
* The score in each EmbeddingMatch represents Weaviate's certainty metric,
* which ranges from 0.0 (least certain) to 1.0 (most certain).
*
* This implementation assumes cosine distance metric is used in Weaviate.
*
* @param request Search request containing query embedding and search parameters
* @return EmbeddingSearchResult containing matched embeddings with text segments and scores
*/
public EmbeddingSearchResult<TextSegment> search(EmbeddingSearchRequest request)Search Request Parameters:
The EmbeddingSearchRequest object can be constructed using its builder with the following parameters:
queryEmbedding (required): The embedding vector to search formaxResults: Maximum number of results to return (default varies by implementation)minScore: Minimum certainty score threshold (0.0 to 1.0)filter: Optional metadata filter to restrict search scopeSearch Result Structure:
The EmbeddingSearchResult<TextSegment> contains:
matches(): List of EmbeddingMatch<TextSegment> objects, each containing:
embeddingId(): UUID of the matched embeddingscore(): Weaviate's certainty score (0.0 to 1.0)embedding(): The matched embedding vectorembedded(): Associated TextSegment with text and metadataExample:
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(10)
.minScore(0.7)
.build();
EmbeddingSearchResult<TextSegment> result = store.search(request);
for (EmbeddingMatch<TextSegment> match : result.matches()) {
String id = match.embeddingId();
double certainty = match.score(); // 0.0 to 1.0
TextSegment segment = match.embedded();
String text = segment.text();
}Delete embeddings from the store by ID, by filter criteria, or remove all embeddings. Supports both targeted and bulk removal operations.
/**
* Removes embeddings by their UUID identifiers.
*
* @param ids Collection of UUID strings to remove
* @throws IllegalArgumentException if ids is null or empty
*/
public void removeAll(Collection<String> ids)/**
* Removes all embeddings from the store.
* This operation retrieves all IDs first, then deletes them in batch.
*/
public void removeAll()/**
* Removes embeddings that match the specified filter criteria.
* The filter is evaluated against metadata associated with each embedding.
*
* @param filter Filter criteria for selecting embeddings to remove
* @throws IllegalArgumentException if filter is null
*/
public void removeAll(Filter filter)Filter Examples:
import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;
// Remove by metadata value
Filter categoryFilter = metadataKey("category").isEqualTo("temporary");
store.removeAll(categoryFilter);
// Remove by date
Filter dateFilter = metadataKey("created").isLessThan("2024-01-01");
store.removeAll(dateFilter);
// Complex filter with AND/OR
Filter complexFilter = metadataKey("status").isEqualTo("archived")
.and(metadataKey("age").isGreaterThan(365));
store.removeAll(complexFilter);Important Notes:
consistencyLevelremoveAll() without parameters retrieves all IDs first, which may be slow for large datasetsremoveAll(Filter) evaluates filters client-side after retrieving all objects/**
* Represents a vector embedding (from langchain4j-core).
* Encapsulates a float array that captures semantic information of text.
*/
public class Embedding {
public Embedding(float[] vector)
public static Embedding from(float[] vector)
public static Embedding from(List<Float> vector)
public float[] vector()
public List<Float> vectorAsList()
}/**
* Represents a text segment with optional metadata (from langchain4j-core).
*/
public class TextSegment {
public static TextSegment from(String text)
public static TextSegment from(String text, Metadata metadata)
public String text()
public Metadata metadata()
}/**
* Key-value metadata associated with text segments (from langchain4j-core).
* Supports various data types including String, Integer, Long, Float, Double, and UUID.
*/
public class Metadata {
public Metadata()
public Metadata(Map<String, ?> metadata)
// Typed put methods
public Metadata put(String key, String value)
public Metadata put(String key, int value)
public Metadata put(String key, long value)
public Metadata put(String key, float value)
public Metadata put(String key, double value)
public Metadata put(String key, UUID value)
// Typed get methods
public String getString(String key)
public Integer getInteger(String key)
public Long getLong(String key)
public Float getFloat(String key)
public Double getDouble(String key)
public UUID getUUID(String key)
// Utility methods
public Metadata putAll(Map<String, Object> metadata)
public Metadata remove(String key)
public Metadata copy()
public Map<String, Object> toMap()
}/**
* Search request parameters (from langchain4j-core).
*/
public class EmbeddingSearchRequest {
public static EmbeddingSearchRequestBuilder builder()
public static class EmbeddingSearchRequestBuilder {
public EmbeddingSearchRequestBuilder queryEmbedding(Embedding embedding)
public EmbeddingSearchRequestBuilder maxResults(Integer maxResults)
public EmbeddingSearchRequestBuilder minScore(Double minScore)
public EmbeddingSearchRequestBuilder filter(Filter filter)
public EmbeddingSearchRequest build()
}
}/**
* Search results container (from langchain4j-core).
*
* @param <Embedded> Type of embedded content (typically TextSegment)
*/
public class EmbeddingSearchResult<Embedded> {
public List<EmbeddingMatch<Embedded>> matches()
}/**
* Individual search result match (from langchain4j-core).
*
* @param <Embedded> Type of embedded content (typically TextSegment)
*/
public class EmbeddingMatch<Embedded> {
public String embeddingId()
public double score()
public Embedding embedding()
public Embedded embedded()
}/**
* Filter interface for querying embeddings by metadata (from langchain4j-core).
* Filters are constructed using MetadataFilterBuilder.
*/
public interface Filter {
boolean test(Object object)
Filter and(Filter filter)
Filter or(Filter filter)
}
/**
* Builder for constructing metadata filters (from langchain4j-core).
* Import: import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;
*/
public class MetadataFilterBuilder {
public static MetadataFilterBuilder metadataKey(String key)
// Comparison methods that return Filter
public Filter isEqualTo(Object value)
public Filter isNotEqualTo(Object value)
public Filter isGreaterThan(Comparable value)
public Filter isGreaterThanOrEqualTo(Comparable value)
public Filter isLessThan(Comparable value)
public Filter isLessThanOrEqualTo(Comparable value)
public Filter isIn(Collection<?> values)
public Filter isNotIn(Collection<?> values)
}WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
.apiKey("your-weaviate-api-key")
.scheme("https")
.host("your-cluster.weaviate.network")
.port(443)
.objectClass("DocumentEmbedding")
.avoidDups(true)
.consistencyLevel("QUORUM")
.metadataKeys(Arrays.asList("author", "category", "timestamp"))
.textFieldName("content")
.metadataFieldName("_metadata")
.build();WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
.scheme("http")
.host("localhost")
.port(8080)
.useGrpcForInserts(true)
.securedGrpc(false)
.grpcPort(50051)
.objectClass("LocalDocument")
.build();// Simplest configuration for local Weaviate
WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
.scheme("http")
.host("localhost")
.objectClass("Embedding")
.build();// Store metadata at root level instead of nested field
WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
.scheme("https")
.host("your-cluster.weaviate.network")
.apiKey("your-api-key")
.objectClass("Document")
.metadataFieldName("") // Empty string stores at root
.metadataKeys(Arrays.asList("title", "author", "date"))
.build();// Require all replicas to acknowledge writes
WeaviateEmbeddingStore store = WeaviateEmbeddingStore.builder()
.scheme("https")
.host("your-cluster.weaviate.network")
.apiKey("your-api-key")
.objectClass("CriticalData")
.consistencyLevel("ALL")
.build();Distance Metric: This implementation assumes Weaviate is configured with cosine distance metric
Certainty Score: Search results use Weaviate's "certainty" metric (0.0 to 1.0), not raw distance
UUID Requirement: All embedding IDs must be valid UUIDs (Weaviate requirement)
GRPC Limitation: GRPC can only be used for insert operations; HTTP is still required for search
Duplicate Prevention:
avoidDups=true: IDs generated from content hash (deterministic)avoidDups=false: Random UUIDs generatedMetadata Handling:
metadataKeys are persisted_metadata field) or at root level (empty metadataFieldName)Consistency Levels:
ONE: Single replica acknowledgment (fastest, least consistent)QUORUM: Majority replica acknowledgment (balanced, default)ALL: All replicas must acknowledge (slowest, most consistent)Filter Evaluation: The removeAll(Filter) method evaluates filters client-side, which may be inefficient for large datasets
This library requires:
dev.langchain4j:langchain4j-core:1.11.0 - Core LangChain4j interfaces and typesio.weaviate:client:5.3.0 - Weaviate Java client libraryInstall with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-weaviate@1.11.0