CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-chroma

LangChain4j integration for Chroma embedding store enabling storage, retrieval, and similarity search of vector embeddings with metadata filtering support for both API V1 and V2.

Pending
Overview
Eval results
Files

filters.mddocs/api/

Filter API

Metadata filtering interface and operations for searching and removing embeddings.

Filter Interface

Base interface for metadata filters.

package dev.langchain4j.store.embedding.filter;

public interface Filter

Logical Operators

default Filter and(Filter filter);

Combines this filter with another using AND logic.


default Filter or(Filter filter);

Combines this filter with another using OR logic.

MetadataFilterBuilder

Static utility for building metadata filters.

import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.*;

Filter filter = metadataKey("author").isEqualTo("John Doe");

Creating Filters

// Start with metadata key
metadataKey(String key)

Returns a builder for creating filters on the specified metadata key.

Filter Operations

Equality

Filter isEqualTo(String key, Object value);

Matches if metadata value equals specified value.

Example:

Filter f = metadataKey("status").isEqualTo("published");
Filter f = metadataKey("count").isEqualTo(42);

Filter isNotEqualTo(String key, Object value);

Matches if metadata value does not equal specified value.

Example:

Filter f = metadataKey("status").isNotEqualTo("draft");

Comparison (Numeric Only)

Filter isGreaterThan(String key, Number value);

Matches if metadata value is greater than specified number.

Chroma Limitation: Only works with numeric metadata values.

Example:

Filter f = metadataKey("year").isGreaterThan(2020);
Filter f = metadataKey("rating").isGreaterThan(4.5);

Filter isGreaterThanOrEqualTo(String key, Number value);

Matches if metadata value is greater than or equal to specified number.

Example:

Filter f = metadataKey("year").isGreaterThanOrEqualTo(2020);

Filter isLessThan(String key, Number value);

Matches if metadata value is less than specified number.

Example:

Filter f = metadataKey("age").isLessThan(30);

Filter isLessThanOrEqualTo(String key, Number value);

Matches if metadata value is less than or equal to specified number.

Example:

Filter f = metadataKey("price").isLessThanOrEqualTo(100.0);

Collection

Filter isIn(String key, Collection<?> values);

Matches if metadata value is in the specified collection.

Example:

Filter f = metadataKey("category")
    .isIn(Arrays.asList("tech", "science", "math"));

Filter isNotIn(String key, Collection<?> values);

Matches if metadata value is not in the specified collection.

Example:

Filter f = metadataKey("status")
    .isNotIn(Arrays.asList("draft", "archived"));

Logical Operations

Filter and(Filter... filters);

Combines filters with AND logic - all must match.

Example:

Filter f = metadataKey("status").isEqualTo("published")
    .and(metadataKey("year").isGreaterThanOrEqualTo(2020));

Filter or(Filter... filters);

Combines filters with OR logic - at least one must match.

Example:

Filter f = metadataKey("category").isEqualTo("tech")
    .or(metadataKey("category").isEqualTo("science"));

Filter not(Filter filter);

Negates a filter.

Chroma Limitation: NOT is automatically converted to equivalent positive operations where possible.

Example:

Filter f = Filter.not(metadataKey("status").isEqualTo("draft"));
// Equivalent to: isNotEqualTo("draft")

Usage Examples

Simple Filters

// Exact match
Filter exactMatch = metadataKey("author").isEqualTo("John Doe");

// Numeric comparison
Filter recentYears = metadataKey("year").isGreaterThanOrEqualTo(2020);

// Collection membership
Filter categories = metadataKey("category")
    .isIn(Arrays.asList("tech", "science", "engineering"));

Combined Filters

// AND combination
Filter published2024 = metadataKey("status").isEqualTo("published")
    .and(metadataKey("year").isEqualTo(2024));

// OR combination
Filter multiCategory = metadataKey("category").isEqualTo("tech")
    .or(metadataKey("category").isEqualTo("science"))
    .or(metadataKey("category").isEqualTo("math"));

// Mixed logic
Filter complex = metadataKey("status").isEqualTo("published")
    .and(
        metadataKey("priority").isGreaterThanOrEqualTo(5)
        .or(metadataKey("urgent").isEqualTo(true))
    );

Numeric Range Filters

// Between range (inclusive)
Filter yearRange = metadataKey("year").isGreaterThanOrEqualTo(2020)
    .and(metadataKey("year").isLessThanOrEqualTo(2024));

// Rating threshold
Filter highRated = metadataKey("rating").isGreaterThan(4.0);

// Price filter
Filter affordable = metadataKey("price").isLessThanOrEqualTo(99.99);

String Filters

// Exact match
Filter author = metadataKey("author").isEqualTo("Jane Smith");

// Multiple values
Filter authors = metadataKey("author")
    .isIn(Arrays.asList("John Doe", "Jane Smith", "Bob Johnson"));

// Exclusion
Filter notDraft = metadataKey("status").isNotEqualTo("draft");

Time-Based Filters

// Recent documents (last 90 days)
long ninetyDaysAgo = System.currentTimeMillis() - (90L * 24 * 60 * 60 * 1000);
Filter recent = metadataKey("timestamp").isGreaterThanOrEqualTo(ninetyDaysAgo);

// Specific date range
long startDate = parseDate("2024-01-01");
long endDate = parseDate("2024-12-31");
Filter dateRange = metadataKey("created_at").isGreaterThanOrEqualTo(startDate)
    .and(metadataKey("created_at").isLessThan(endDate));

Chroma-Specific Limitations

Comparison Operators on Numeric Only

// VALID - numeric comparison
Filter f1 = metadataKey("year").isGreaterThan(2020);
Filter f2 = metadataKey("rating").isGreaterThanOrEqualTo(4.5);

// INVALID - string comparison will error
// Filter f3 = metadataKey("name").isGreaterThan("M");  // Error!

Use isEqualTo, isNotEqualTo, isIn, or isNotIn for string values.

NOT Operation Conversion

Chroma doesn't natively support NOT. The library converts NOT filters:

// This NOT filter
Filter f1 = Filter.not(metadataKey("status").isEqualTo("draft"));

// Is converted to
Filter f2 = metadataKey("status").isNotEqualTo("draft");

Complex NOT operations may not work as expected.

Boolean Values Not Supported

// NOT SUPPORTED - Boolean type
// Metadata meta = new Metadata().put("active", true);  // Will fail

// WORKAROUND - Use string or int
Metadata meta = new Metadata().put("active", "true");
Filter f = metadataKey("active").isEqualTo("true");

// Or use 0/1
Metadata meta2 = new Metadata().put("active", 1);
Filter f2 = metadataKey("active").isEqualTo(1);

Filter Application

In Search Operations

Filter filter = metadataKey("category").isEqualTo("tech");

EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(10)
    .filter(filter)
    .build();

EmbeddingSearchResult<TextSegment> result = store.search(request);

In Remove Operations

Filter filter = metadataKey("status").isEqualTo("archived")
    .or(metadataKey("year").isLessThan(2020));

store.removeAll(filter);

Performance Considerations

Filter Selectivity

More selective filters perform better:

// High selectivity - fast
Filter specific = metadataKey("id").isEqualTo("exact-id");

// Low selectivity - slower
Filter broad = metadataKey("year").isGreaterThan(2000);

Complex Filters

Deeply nested filters may impact performance:

// Simple - fast
Filter simple = metadataKey("status").isEqualTo("published");

// Complex - slower
Filter complex = metadataKey("status").isEqualTo("published")
    .and(
        metadataKey("category").isIn(manyCategories)
        .or(metadataKey("priority").isGreaterThan(5))
    )
    .and(metadataKey("year").isGreaterThanOrEqualTo(2020));

Early Filtering

Apply filters in search requests rather than filtering results afterward:

// GOOD - filter during search
Filter filter = metadataKey("category").isEqualTo("tech");
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .filter(filter)
    .build();

// BAD - filter after retrieval
EmbeddingSearchResult<TextSegment> all = store.search(requestWithoutFilter);
List<EmbeddingMatch<TextSegment>> filtered = all.matches().stream()
    .filter(match -> "tech".equals(match.embedded().metadata().getString("category")))
    .collect(Collectors.toList());

Testing Filters

Preview filter results before using in remove operations:

Filter filter = metadataKey("status").isEqualTo("outdated");

// Test with search first
EmbeddingSearchRequest testRequest = EmbeddingSearchRequest.builder()
    .queryEmbedding(anyEmbedding)
    .maxResults(100)
    .filter(filter)
    .build();

EmbeddingSearchResult<TextSegment> preview = store.search(testRequest);
System.out.println("Filter matches: " + preview.matches().size() + " documents");

// Then apply to remove
if (confirm()) {
    store.removeAll(filter);
}

Related APIs

  • Core Types - Metadata class
  • Search Types - Using filters in search
  • ChromaEmbeddingStore - Store operations with filters

Examples

See:

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-chroma

docs

api

builder.md

filters.md

search-types.md

store.md

types.md

version.md

index.md

tile.json