LangChain4j integration for Chroma embedding store enabling storage, retrieval, and similarity search of vector embeddings with metadata filtering support for both API V1 and V2.
Metadata filtering interface and operations for searching and removing embeddings.
Base interface for metadata filters.
package dev.langchain4j.store.embedding.filter;
public interface Filterdefault Filter and(Filter filter);Combines this filter with another using AND logic.
default Filter or(Filter filter);Combines this filter with another using OR logic.
Static utility for building metadata filters.
import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.*;
Filter filter = metadataKey("author").isEqualTo("John Doe");// Start with metadata key
metadataKey(String key)Returns a builder for creating filters on the specified metadata key.
Filter isEqualTo(String key, Object value);Matches if metadata value equals specified value.
Example:
Filter f = metadataKey("status").isEqualTo("published");
Filter f = metadataKey("count").isEqualTo(42);Filter isNotEqualTo(String key, Object value);Matches if metadata value does not equal specified value.
Example:
Filter f = metadataKey("status").isNotEqualTo("draft");Filter isGreaterThan(String key, Number value);Matches if metadata value is greater than specified number.
Chroma Limitation: Only works with numeric metadata values.
Example:
Filter f = metadataKey("year").isGreaterThan(2020);
Filter f = metadataKey("rating").isGreaterThan(4.5);Filter isGreaterThanOrEqualTo(String key, Number value);Matches if metadata value is greater than or equal to specified number.
Example:
Filter f = metadataKey("year").isGreaterThanOrEqualTo(2020);Filter isLessThan(String key, Number value);Matches if metadata value is less than specified number.
Example:
Filter f = metadataKey("age").isLessThan(30);Filter isLessThanOrEqualTo(String key, Number value);Matches if metadata value is less than or equal to specified number.
Example:
Filter f = metadataKey("price").isLessThanOrEqualTo(100.0);Filter isIn(String key, Collection<?> values);Matches if metadata value is in the specified collection.
Example:
Filter f = metadataKey("category")
.isIn(Arrays.asList("tech", "science", "math"));Filter isNotIn(String key, Collection<?> values);Matches if metadata value is not in the specified collection.
Example:
Filter f = metadataKey("status")
.isNotIn(Arrays.asList("draft", "archived"));Filter and(Filter... filters);Combines filters with AND logic - all must match.
Example:
Filter f = metadataKey("status").isEqualTo("published")
.and(metadataKey("year").isGreaterThanOrEqualTo(2020));Filter or(Filter... filters);Combines filters with OR logic - at least one must match.
Example:
Filter f = metadataKey("category").isEqualTo("tech")
.or(metadataKey("category").isEqualTo("science"));Filter not(Filter filter);Negates a filter.
Chroma Limitation: NOT is automatically converted to equivalent positive operations where possible.
Example:
Filter f = Filter.not(metadataKey("status").isEqualTo("draft"));
// Equivalent to: isNotEqualTo("draft")// Exact match
Filter exactMatch = metadataKey("author").isEqualTo("John Doe");
// Numeric comparison
Filter recentYears = metadataKey("year").isGreaterThanOrEqualTo(2020);
// Collection membership
Filter categories = metadataKey("category")
.isIn(Arrays.asList("tech", "science", "engineering"));// AND combination
Filter published2024 = metadataKey("status").isEqualTo("published")
.and(metadataKey("year").isEqualTo(2024));
// OR combination
Filter multiCategory = metadataKey("category").isEqualTo("tech")
.or(metadataKey("category").isEqualTo("science"))
.or(metadataKey("category").isEqualTo("math"));
// Mixed logic
Filter complex = metadataKey("status").isEqualTo("published")
.and(
metadataKey("priority").isGreaterThanOrEqualTo(5)
.or(metadataKey("urgent").isEqualTo(true))
);// Between range (inclusive)
Filter yearRange = metadataKey("year").isGreaterThanOrEqualTo(2020)
.and(metadataKey("year").isLessThanOrEqualTo(2024));
// Rating threshold
Filter highRated = metadataKey("rating").isGreaterThan(4.0);
// Price filter
Filter affordable = metadataKey("price").isLessThanOrEqualTo(99.99);// Exact match
Filter author = metadataKey("author").isEqualTo("Jane Smith");
// Multiple values
Filter authors = metadataKey("author")
.isIn(Arrays.asList("John Doe", "Jane Smith", "Bob Johnson"));
// Exclusion
Filter notDraft = metadataKey("status").isNotEqualTo("draft");// Recent documents (last 90 days)
long ninetyDaysAgo = System.currentTimeMillis() - (90L * 24 * 60 * 60 * 1000);
Filter recent = metadataKey("timestamp").isGreaterThanOrEqualTo(ninetyDaysAgo);
// Specific date range
long startDate = parseDate("2024-01-01");
long endDate = parseDate("2024-12-31");
Filter dateRange = metadataKey("created_at").isGreaterThanOrEqualTo(startDate)
.and(metadataKey("created_at").isLessThan(endDate));// VALID - numeric comparison
Filter f1 = metadataKey("year").isGreaterThan(2020);
Filter f2 = metadataKey("rating").isGreaterThanOrEqualTo(4.5);
// INVALID - string comparison will error
// Filter f3 = metadataKey("name").isGreaterThan("M"); // Error!Use isEqualTo, isNotEqualTo, isIn, or isNotIn for string values.
Chroma doesn't natively support NOT. The library converts NOT filters:
// This NOT filter
Filter f1 = Filter.not(metadataKey("status").isEqualTo("draft"));
// Is converted to
Filter f2 = metadataKey("status").isNotEqualTo("draft");Complex NOT operations may not work as expected.
// NOT SUPPORTED - Boolean type
// Metadata meta = new Metadata().put("active", true); // Will fail
// WORKAROUND - Use string or int
Metadata meta = new Metadata().put("active", "true");
Filter f = metadataKey("active").isEqualTo("true");
// Or use 0/1
Metadata meta2 = new Metadata().put("active", 1);
Filter f2 = metadataKey("active").isEqualTo(1);Filter filter = metadataKey("category").isEqualTo("tech");
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(10)
.filter(filter)
.build();
EmbeddingSearchResult<TextSegment> result = store.search(request);Filter filter = metadataKey("status").isEqualTo("archived")
.or(metadataKey("year").isLessThan(2020));
store.removeAll(filter);More selective filters perform better:
// High selectivity - fast
Filter specific = metadataKey("id").isEqualTo("exact-id");
// Low selectivity - slower
Filter broad = metadataKey("year").isGreaterThan(2000);Deeply nested filters may impact performance:
// Simple - fast
Filter simple = metadataKey("status").isEqualTo("published");
// Complex - slower
Filter complex = metadataKey("status").isEqualTo("published")
.and(
metadataKey("category").isIn(manyCategories)
.or(metadataKey("priority").isGreaterThan(5))
)
.and(metadataKey("year").isGreaterThanOrEqualTo(2020));Apply filters in search requests rather than filtering results afterward:
// GOOD - filter during search
Filter filter = metadataKey("category").isEqualTo("tech");
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.filter(filter)
.build();
// BAD - filter after retrieval
EmbeddingSearchResult<TextSegment> all = store.search(requestWithoutFilter);
List<EmbeddingMatch<TextSegment>> filtered = all.matches().stream()
.filter(match -> "tech".equals(match.embedded().metadata().getString("category")))
.collect(Collectors.toList());Preview filter results before using in remove operations:
Filter filter = metadataKey("status").isEqualTo("outdated");
// Test with search first
EmbeddingSearchRequest testRequest = EmbeddingSearchRequest.builder()
.queryEmbedding(anyEmbedding)
.maxResults(100)
.filter(filter)
.build();
EmbeddingSearchResult<TextSegment> preview = store.search(testRequest);
System.out.println("Filter matches: " + preview.matches().size() + " documents");
// Then apply to remove
if (confirm()) {
store.removeAll(filter);
}See:
Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-chroma@1.11.0