LangChain4j integration for Chroma embedding store enabling storage, retrieval, and similarity search of vector embeddings with metadata filtering support for both API V1 and V2.
—
Metadata filtering interface and operations for searching and removing embeddings.
Base interface for metadata filters.
package dev.langchain4j.store.embedding.filter;
public interface Filterdefault Filter and(Filter filter);Combines this filter with another using AND logic.
default Filter or(Filter filter);Combines this filter with another using OR logic.
Static utility for building metadata filters.
import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.*;
Filter filter = metadataKey("author").isEqualTo("John Doe");// Start with metadata key
metadataKey(String key)Returns a builder for creating filters on the specified metadata key.
Filter isEqualTo(String key, Object value);Matches if metadata value equals specified value.
Example:
Filter f = metadataKey("status").isEqualTo("published");
Filter f = metadataKey("count").isEqualTo(42);Filter isNotEqualTo(String key, Object value);Matches if metadata value does not equal specified value.
Example:
Filter f = metadataKey("status").isNotEqualTo("draft");Filter isGreaterThan(String key, Number value);Matches if metadata value is greater than specified number.
Chroma Limitation: Only works with numeric metadata values.
Example:
Filter f = metadataKey("year").isGreaterThan(2020);
Filter f = metadataKey("rating").isGreaterThan(4.5);Filter isGreaterThanOrEqualTo(String key, Number value);Matches if metadata value is greater than or equal to specified number.
Example:
Filter f = metadataKey("year").isGreaterThanOrEqualTo(2020);Filter isLessThan(String key, Number value);Matches if metadata value is less than specified number.
Example:
Filter f = metadataKey("age").isLessThan(30);Filter isLessThanOrEqualTo(String key, Number value);Matches if metadata value is less than or equal to specified number.
Example:
Filter f = metadataKey("price").isLessThanOrEqualTo(100.0);Filter isIn(String key, Collection<?> values);Matches if metadata value is in the specified collection.
Example:
Filter f = metadataKey("category")
.isIn(Arrays.asList("tech", "science", "math"));Filter isNotIn(String key, Collection<?> values);Matches if metadata value is not in the specified collection.
Example:
Filter f = metadataKey("status")
.isNotIn(Arrays.asList("draft", "archived"));Filter and(Filter... filters);Combines filters with AND logic - all must match.
Example:
Filter f = metadataKey("status").isEqualTo("published")
.and(metadataKey("year").isGreaterThanOrEqualTo(2020));Filter or(Filter... filters);Combines filters with OR logic - at least one must match.
Example:
Filter f = metadataKey("category").isEqualTo("tech")
.or(metadataKey("category").isEqualTo("science"));Filter not(Filter filter);Negates a filter.
Chroma Limitation: NOT is automatically converted to equivalent positive operations where possible.
Example:
Filter f = Filter.not(metadataKey("status").isEqualTo("draft"));
// Equivalent to: isNotEqualTo("draft")// Exact match
Filter exactMatch = metadataKey("author").isEqualTo("John Doe");
// Numeric comparison
Filter recentYears = metadataKey("year").isGreaterThanOrEqualTo(2020);
// Collection membership
Filter categories = metadataKey("category")
.isIn(Arrays.asList("tech", "science", "engineering"));// AND combination
Filter published2024 = metadataKey("status").isEqualTo("published")
.and(metadataKey("year").isEqualTo(2024));
// OR combination
Filter multiCategory = metadataKey("category").isEqualTo("tech")
.or(metadataKey("category").isEqualTo("science"))
.or(metadataKey("category").isEqualTo("math"));
// Mixed logic
Filter complex = metadataKey("status").isEqualTo("published")
.and(
metadataKey("priority").isGreaterThanOrEqualTo(5)
.or(metadataKey("urgent").isEqualTo(true))
);// Between range (inclusive)
Filter yearRange = metadataKey("year").isGreaterThanOrEqualTo(2020)
.and(metadataKey("year").isLessThanOrEqualTo(2024));
// Rating threshold
Filter highRated = metadataKey("rating").isGreaterThan(4.0);
// Price filter
Filter affordable = metadataKey("price").isLessThanOrEqualTo(99.99);// Exact match
Filter author = metadataKey("author").isEqualTo("Jane Smith");
// Multiple values
Filter authors = metadataKey("author")
.isIn(Arrays.asList("John Doe", "Jane Smith", "Bob Johnson"));
// Exclusion
Filter notDraft = metadataKey("status").isNotEqualTo("draft");// Recent documents (last 90 days)
long ninetyDaysAgo = System.currentTimeMillis() - (90L * 24 * 60 * 60 * 1000);
Filter recent = metadataKey("timestamp").isGreaterThanOrEqualTo(ninetyDaysAgo);
// Specific date range
long startDate = parseDate("2024-01-01");
long endDate = parseDate("2024-12-31");
Filter dateRange = metadataKey("created_at").isGreaterThanOrEqualTo(startDate)
.and(metadataKey("created_at").isLessThan(endDate));// VALID - numeric comparison
Filter f1 = metadataKey("year").isGreaterThan(2020);
Filter f2 = metadataKey("rating").isGreaterThanOrEqualTo(4.5);
// INVALID - string comparison will error
// Filter f3 = metadataKey("name").isGreaterThan("M"); // Error!Use isEqualTo, isNotEqualTo, isIn, or isNotIn for string values.
Chroma doesn't natively support NOT. The library converts NOT filters:
// This NOT filter
Filter f1 = Filter.not(metadataKey("status").isEqualTo("draft"));
// Is converted to
Filter f2 = metadataKey("status").isNotEqualTo("draft");Complex NOT operations may not work as expected.
// NOT SUPPORTED - Boolean type
// Metadata meta = new Metadata().put("active", true); // Will fail
// WORKAROUND - Use string or int
Metadata meta = new Metadata().put("active", "true");
Filter f = metadataKey("active").isEqualTo("true");
// Or use 0/1
Metadata meta2 = new Metadata().put("active", 1);
Filter f2 = metadataKey("active").isEqualTo(1);Filter filter = metadataKey("category").isEqualTo("tech");
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(10)
.filter(filter)
.build();
EmbeddingSearchResult<TextSegment> result = store.search(request);Filter filter = metadataKey("status").isEqualTo("archived")
.or(metadataKey("year").isLessThan(2020));
store.removeAll(filter);More selective filters perform better:
// High selectivity - fast
Filter specific = metadataKey("id").isEqualTo("exact-id");
// Low selectivity - slower
Filter broad = metadataKey("year").isGreaterThan(2000);Deeply nested filters may impact performance:
// Simple - fast
Filter simple = metadataKey("status").isEqualTo("published");
// Complex - slower
Filter complex = metadataKey("status").isEqualTo("published")
.and(
metadataKey("category").isIn(manyCategories)
.or(metadataKey("priority").isGreaterThan(5))
)
.and(metadataKey("year").isGreaterThanOrEqualTo(2020));Apply filters in search requests rather than filtering results afterward:
// GOOD - filter during search
Filter filter = metadataKey("category").isEqualTo("tech");
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.filter(filter)
.build();
// BAD - filter after retrieval
EmbeddingSearchResult<TextSegment> all = store.search(requestWithoutFilter);
List<EmbeddingMatch<TextSegment>> filtered = all.matches().stream()
.filter(match -> "tech".equals(match.embedded().metadata().getString("category")))
.collect(Collectors.toList());Preview filter results before using in remove operations:
Filter filter = metadataKey("status").isEqualTo("outdated");
// Test with search first
EmbeddingSearchRequest testRequest = EmbeddingSearchRequest.builder()
.queryEmbedding(anyEmbedding)
.maxResults(100)
.filter(filter)
.build();
EmbeddingSearchResult<TextSegment> preview = store.search(testRequest);
System.out.println("Filter matches: " + preview.matches().size() + " documents");
// Then apply to remove
if (confirm()) {
store.removeAll(filter);
}See:
Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-chroma