Zero-configuration RAG package that bundles document parsing, embedding, and splitting for easy Retrieval-Augmented Generation in Java applications
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Storage interfaces and implementations for embedding stores.
Interface for storing and searching embeddings.
package dev.langchain4j.store.embedding;
public interface EmbeddingStore<Embedded> {
// Add embeddings
String add(Embedding embedding)
void add(String id, Embedding embedding)
String add(Embedding embedding, Embedded embedded)
List<String> addAll(List<Embedding> embeddings)
// Search
EmbeddingSearchResult<Embedded> search(EmbeddingSearchRequest request)
}Type Parameter:
Embedded - Type of object embedded (typically TextSegment)Add Methods:
add(embedding) - Add embedding, returns generated IDadd(id, embedding) - Add with specific ID (no return value)add(embedding, embedded) - Add embedding with associated object, returns IDaddAll(embeddings) - Add multiple embeddings, returns list of IDsSearch:
search(request) - Search for similar embeddingsExample:
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
// Add embedding with object
TextSegment segment = TextSegment.from("Some text");
Embedding embedding = embeddingModel.embed(segment).content();
String id = store.add(embedding, segment);
// Search
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(5)
.build();
EmbeddingSearchResult<TextSegment> result = store.search(request);In-memory implementation of EmbeddingStore. Useful for development, testing, and small datasets.
package dev.langchain4j.store.embedding.inmemory;
public class InMemoryEmbeddingStore<Embedded> implements EmbeddingStore<Embedded> {
// Constructors
public InMemoryEmbeddingStore()
public InMemoryEmbeddingStore(Collection<Entry<Embedded>> entries)
// Add methods
public String add(Embedding embedding)
public void add(String id, Embedding embedding)
public String add(Embedding embedding, Embedded embedded)
public void add(String id, Embedding embedding, Embedded embedded)
public List<String> addAll(List<Embedding> embeddings)
public void addAll(List<String> ids, List<Embedding> embeddings, List<Embedded> embedded)
// Remove methods
public void removeAll(Collection<String> ids)
public void removeAll(Filter filter)
public void removeAll()
// Search
public EmbeddingSearchResult<Embedded> search(EmbeddingSearchRequest request)
// Persistence
public String serializeToJson()
public void serializeToFile(Path filePath)
public void serializeToFile(String filePath)
public static InMemoryEmbeddingStore<TextSegment> fromJson(String json)
public static InMemoryEmbeddingStore<TextSegment> fromFile(Path filePath)
public static InMemoryEmbeddingStore<TextSegment> fromFile(String filePath)
// Merge
public static <Embedded> InMemoryEmbeddingStore<Embedded> merge(
Collection<InMemoryEmbeddingStore<Embedded>> stores
)
public static <Embedded> InMemoryEmbeddingStore<Embedded> merge(
InMemoryEmbeddingStore<Embedded> first,
InMemoryEmbeddingStore<Embedded> second
)
// Utility
public int size()
public boolean isEmpty()
}Constructors:
InMemoryEmbeddingStore() - Create empty storeInMemoryEmbeddingStore(entries) - Create from existing entriesAdd Methods:
add(embedding) - Add embedding, returns generated IDadd(id, embedding) - Add with specific IDadd(embedding, embedded) - Add with associated object, returns IDadd(id, embedding, embedded) - Add with specific ID and objectaddAll(embeddings) - Add multiple, returns IDsaddAll(ids, embeddings, embedded) - Add multiple with IDs and objectsRemove Methods:
removeAll(ids) - Remove by IDsremoveAll(filter) - Remove by metadata filterremoveAll() - Clear all embeddingsSearch:
search(request) - Find similar embeddingsPersistence:
serializeToJson() - Export to JSON stringserializeToFile(path) - Save to filefromJson(json) - Load from JSON stringfromFile(path) - Load from fileMerge:
merge(stores) - Merge multiple storesmerge(first, second) - Merge two storesUtility:
size() - Number of embeddingsisEmpty() - Check if emptyExample:
// Create and populate
InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
TextSegment segment = TextSegment.from("Example text");
Embedding embedding = model.embed(segment).content();
store.add(embedding, segment);
// Persist to file
store.serializeToFile("embeddings.json");
// Load from file
InMemoryEmbeddingStore<TextSegment> loadedStore =
InMemoryEmbeddingStore.fromFile("embeddings.json");
// Check size
System.out.println("Store contains " + loadedStore.size() + " embeddings");
// Clear
store.removeAll();Request parameters for embedding search.
package dev.langchain4j.store.embedding;
public class EmbeddingSearchRequest {
// Constructor
public EmbeddingSearchRequest(
Embedding queryEmbedding,
Integer maxResults,
Double minScore,
Filter filter
)
// Builder
public static EmbeddingSearchRequestBuilder builder()
// Getters
public Embedding queryEmbedding()
public int maxResults()
public double minScore()
public Filter filter()
}Constructor:
Builder:
builder() - Create builder for fluent configurationGetters:
queryEmbedding() - The query embedding vectormaxResults() - Maximum results to returnminScore() - Minimum similarity score thresholdfilter() - Metadata filterExample:
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;
Embedding queryEmbedding = model.embed("search query").content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(10)
.minScore(0.7)
.filter(metadataKey("category").isEqualTo("technical"))
.build();
EmbeddingSearchResult<TextSegment> result = store.search(request);Builder for EmbeddingSearchRequest.
package dev.langchain4j.store.embedding;
public interface EmbeddingSearchRequestBuilder {
EmbeddingSearchRequestBuilder queryEmbedding(Embedding queryEmbedding)
EmbeddingSearchRequestBuilder maxResults(Integer maxResults)
EmbeddingSearchRequestBuilder minScore(Double minScore)
EmbeddingSearchRequestBuilder filter(Filter filter)
EmbeddingSearchRequest build()
}Methods:
queryEmbedding(embedding) - Set query vector (required)maxResults(max) - Set max resultsminScore(min) - Set minimum score thresholdfilter(filter) - Set metadata filterbuild() - Build the requestResult of embedding search containing matches.
package dev.langchain4j.store.embedding;
public class EmbeddingSearchResult<Embedded> {
// Constructor
public EmbeddingSearchResult(List<EmbeddingMatch<Embedded>> matches)
// Methods
public List<EmbeddingMatch<Embedded>> matches()
}Constructor:
EmbeddingSearchResult(matches) - Create with match listMethods:
matches() - Get list of matches (sorted by score, highest first)Example:
EmbeddingSearchResult<TextSegment> result = store.search(request);
for (EmbeddingMatch<TextSegment> match : result.matches()) {
System.out.println("Score: " + match.score());
System.out.println("Text: " + match.embedded().text());
}Single match from embedding search.
package dev.langchain4j.store.embedding;
public class EmbeddingMatch<Embedded> {
// Methods
public double score()
public String embeddingId()
public Embedding embedding()
public Embedded embedded()
}Methods:
score() - Similarity score (0.0-1.0, higher is more similar)embeddingId() - ID of the embedding in the storeembedding() - The embedding vectorembedded() - The associated object (e.g., TextSegment)Example:
EmbeddingSearchResult<TextSegment> result = store.search(request);
for (EmbeddingMatch<TextSegment> match : result.matches()) {
double score = match.score();
String id = match.embeddingId();
TextSegment segment = match.embedded();
if (score > 0.8) {
System.out.println("High confidence match:");
System.out.println(" ID: " + id);
System.out.println(" Score: " + score);
System.out.println(" Text: " + segment.text());
}
}Entry type for InMemoryEmbeddingStore initialization.
package dev.langchain4j.store.embedding.inmemory;
public class Entry<Embedded> {
// Constructor
public Entry(String id, Embedding embedding, Embedded embedded)
// Methods
public String id()
public Embedding embedding()
public Embedded embedded()
}Constructor:
Entry(id, embedding, embedded) - Create entry with all fieldsMethods:
id() - Get entry IDembedding() - Get embedding vectorembedded() - Get associated objectExample:
import dev.langchain4j.store.embedding.inmemory.Entry;
// Create entries
List<Entry<TextSegment>> entries = new ArrayList<>();
entries.add(new Entry<>("id1", embedding1, segment1));
entries.add(new Entry<>("id2", embedding2, segment2));
// Initialize store with entries
InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>(entries);Interface for filtering embeddings by metadata.
package dev.langchain4j.store.embedding.filter;
public interface Filter {
// Test if object matches filter
boolean test(Object object)
// Combine filters
default Filter and(Filter filter)
static Filter and(Filter left, Filter right)
default Filter or(Filter filter)
static Filter or(Filter left, Filter right)
static Filter not(Filter expression)
}Test:
test(object) - Check if object matches filterCombinators:
and(filter) - Combine with AND logicor(filter) - Combine with OR logicnot(filter) - Negate filterExample:
import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;
// Single condition
Filter categoryFilter = metadataKey("category").isEqualTo("technical");
// Combined conditions
Filter complexFilter = metadataKey("category").isEqualTo("technical")
.and(metadataKey("language").isEqualTo("en"))
.and(metadataKey("version").isGreaterThan(2.0));
// Use in search
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.filter(complexFilter)
.build();import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
// Store is ready to use with EmbeddingStoreIngestor
EmbeddingStoreIngestor.ingest(documents, store);// Save after ingestion
store.serializeToFile("knowledge-base.json");
// Load later
InMemoryEmbeddingStore<TextSegment> store =
InMemoryEmbeddingStore.fromFile("knowledge-base.json");
// Use immediately
ContentRetriever retriever = EmbeddingStoreContentRetriever.from(store);// Ingest different document sets
InMemoryEmbeddingStore<TextSegment> techDocs = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(technicalDocuments, techDocs);
InMemoryEmbeddingStore<TextSegment> userGuides = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(guideDocuments, userGuides);
// Merge into single store
InMemoryEmbeddingStore<TextSegment> allDocs =
InMemoryEmbeddingStore.merge(techDocs, userGuides);import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
// Embed query
Embedding queryEmbedding = embeddingModel.embed("How to configure?").content();
// Create search request
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(5)
.minScore(0.7)
.build();
// Execute search
EmbeddingSearchResult<TextSegment> result = store.search(request);
// Process results
for (EmbeddingMatch<TextSegment> match : result.matches()) {
System.out.println(match.score() + ": " + match.embedded().text());
}import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;
// Search only in specific category
Filter filter = metadataKey("category").isEqualTo("api-docs");
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.filter(filter)
.maxResults(5)
.build();
EmbeddingSearchResult<TextSegment> result = store.search(request);// Remove by IDs
List<String> idsToRemove = Arrays.asList("id1", "id2", "id3");
store.removeAll(idsToRemove);
// Remove by filter
Filter oldDocsFilter = metadataKey("version").isLessThan(2.0);
store.removeAll(oldDocsFilter);
// Clear everything
store.removeAll();InMemoryEmbeddingStore Limitations:
For production, consider:
langchain4j-<database-name>When InMemoryEmbeddingStore is sufficient:
Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-easy-rag