LangChain4j PGVector integration for PostgreSQL-based vector embedding storage and retrieval
Add, remove, and manage embeddings with support for single and batch operations, including text segments and metadata.
Add a single embedding to the store with auto-generated ID.
/**
* Adds an embedding to the store with auto-generated ID
* @param embedding The embedding to be added to the store
* @return The auto-generated ID (UUID) associated with the added embedding
*/
String add(Embedding embedding);Usage Example:
import dev.langchain4j.data.embedding.Embedding;
Embedding embedding = embeddingModel.embed("sample text").content();
String id = embeddingStore.add(embedding);
System.out.println("Added embedding with ID: " + id);Add a single embedding to the store with a specific ID.
/**
* Adds an embedding to the store with a specific ID
* If an embedding with this ID already exists, it will be replaced (upsert behavior)
* @param id The unique identifier for the embedding to be added
* @param embedding The embedding to be added to the store
*/
void add(String id, Embedding embedding);Usage Example:
String customId = "doc-123";
Embedding embedding = embeddingModel.embed("sample text").content();
embeddingStore.add(customId, embedding);Add an embedding along with the original text content and metadata.
/**
* Adds an embedding and the corresponding content that has been embedded to the store
* @param embedding The embedding to be added to the store
* @param textSegment Original content that was embedded, including text and optional metadata
* @return The auto-generated ID (UUID) associated with the added embedding
*/
String add(Embedding embedding, TextSegment textSegment);Usage Example:
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.document.Metadata;
String text = "LangChain4j is a Java framework for building LLM applications";
Metadata metadata = new Metadata();
metadata.put("source", "documentation");
metadata.put("page", 1);
TextSegment segment = TextSegment.from(text, metadata);
Embedding embedding = embeddingModel.embed(text).content();
String id = embeddingStore.add(embedding, segment);Add multiple embeddings in a single batch operation.
/**
* Adds multiple embeddings to the store with auto-generated IDs
* More efficient than adding embeddings one by one
* @param embeddings A list of embeddings to be added to the store
* @return A list of auto-generated IDs (UUIDs) associated with the added embeddings
*/
List<String> addAll(List<Embedding> embeddings);Usage Example:
import java.util.List;
List<String> texts = List.of("text 1", "text 2", "text 3");
List<Embedding> embeddings = texts.stream()
.map(text -> embeddingModel.embed(text).content())
.collect(Collectors.toList());
List<String> ids = embeddingStore.addAll(embeddings);
System.out.println("Added " + ids.size() + " embeddings");Add multiple embeddings with their IDs and optional text segments.
/**
* Adds multiple embeddings with their IDs and optional text segments
* Performs upsert - if an ID already exists, the embedding is replaced
* @param ids List of unique identifiers for the embeddings
* @param embeddings List of embeddings to be added
* @param embedded List of text segments (can be null, or individual elements can be null)
* @throws IllegalArgumentException if ids and embeddings sizes don't match, or if embedded is non-null and size doesn't match
*/
void addAll(List<String> ids, List<Embedding> embeddings, List<TextSegment> embedded);Usage Example:
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
List<String> texts = List.of("text 1", "text 2", "text 3");
List<String> ids = IntStream.range(0, texts.size())
.mapToObj(i -> "doc-" + i)
.collect(Collectors.toList());
List<Embedding> embeddings = texts.stream()
.map(text -> embeddingModel.embed(text).content())
.collect(Collectors.toList());
List<TextSegment> segments = texts.stream()
.map(TextSegment::from)
.collect(Collectors.toList());
embeddingStore.addAll(ids, embeddings, segments);Remove a specific embedding by its ID.
/**
* Removes a single embedding by its ID
* This is a convenience method equivalent to removeAll(Collections.singleton(id))
* @param id The ID of the embedding to remove
*/
void remove(String id);Usage Example:
String idToRemove = "doc-123";
embeddingStore.remove(idToRemove);Remove all embeddings from the store (truncates the table).
/**
* Removes all embeddings from the store
* This operation truncates the table and cannot be undone
*/
void removeAll();Usage Example:
// Clear all embeddings
embeddingStore.removeAll();Remove specific embeddings by their IDs.
/**
* Removes embeddings by their IDs
* @param ids Collection of embedding IDs to remove
* @throws IllegalArgumentException if ids collection is null or empty
*/
void removeAll(Collection<String> ids);Usage Example:
import java.util.List;
List<String> idsToRemove = List.of("doc-1", "doc-2", "doc-3");
embeddingStore.removeAll(idsToRemove);Remove embeddings that match specific metadata filter criteria.
/**
* Removes all embeddings that match the specified filter
* The filter is applied to metadata fields
* @param filter Filter to match embeddings for removal
* @throws IllegalArgumentException if filter is null
*/
void removeAll(Filter filter);Usage Example:
import dev.langchain4j.store.embedding.filter.Filter;
import dev.langchain4j.store.embedding.filter.MetadataFilterBuilder;
// Remove all embeddings from a specific source
Filter filter = MetadataFilterBuilder.metadataKey("source").isEqualTo("outdated_docs");
embeddingStore.removeAll(filter);
// Remove embeddings older than a certain date
Filter dateFilter = MetadataFilterBuilder.metadataKey("created_date")
.isLessThan("2024-01-01");
embeddingStore.removeAll(dateFilter);For large datasets, use batch operations instead of single operations:
// Less efficient - many individual operations
for (String text : largeTextList) {
Embedding embedding = embeddingModel.embed(text).content();
embeddingStore.add(embedding);
}
// More efficient - single batch operation
List<Embedding> embeddings = largeTextList.stream()
.map(text -> embeddingModel.embed(text).content())
.collect(Collectors.toList());
embeddingStore.addAll(embeddings);Complete workflow for ingesting documents:
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.DocumentSplitter;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import java.util.List;
import java.util.stream.Collectors;
// Load document
Document document = FileSystemDocumentLoader.loadDocument("/path/to/document.txt");
// Split into chunks
DocumentSplitter splitter = DocumentSplitters.recursive(300, 50);
List<TextSegment> segments = splitter.split(document);
// Generate embeddings
List<Embedding> embeddings = segments.stream()
.map(segment -> embeddingModel.embed(segment.text()).content())
.collect(Collectors.toList());
// Generate IDs
List<String> ids = segments.stream()
.map(segment -> java.util.UUID.randomUUID().toString())
.collect(Collectors.toList());
// Store all at once
embeddingStore.addAll(ids, embeddings, segments);The add and addAll methods with explicit IDs perform upsert operations:
// First insert
embeddingStore.add("doc-1", embedding1);
// Update with same ID - replaces the existing embedding
embeddingStore.add("doc-1", embedding2);
// Batch upsert
List<String> ids = List.of("doc-1", "doc-2", "doc-3");
List<Embedding> embeddings = List.of(emb1, emb2, emb3);
List<TextSegment> segments = List.of(seg1, seg2, seg3);
// Will replace doc-1, insert doc-2 and doc-3
embeddingStore.addAll(ids, embeddings, segments);Handle potential errors during operations:
import java.sql.SQLException;
try {
embeddingStore.add(embedding, textSegment);
} catch (RuntimeException e) {
if (e.getCause() instanceof SQLException) {
// Handle database connection or constraint errors
logger.error("Database error: " + e.getMessage(), e);
} else {
throw e;
}
}Metadata is stored according to the configured MetadataStorageConfig:
import dev.langchain4j.data.document.Metadata;
// Create metadata
Metadata metadata = new Metadata();
metadata.put("source", "documentation");
metadata.put("page", 42);
metadata.put("section", "installation");
// Create text segment with metadata
TextSegment segment = TextSegment.from("Installation instructions...", metadata);
// Add with metadata (stored according to configuration)
embeddingStore.add(embedding, segment);addAll(List<Embedding>) with an empty list is a no-opInstall with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-pgvector@1.11.0