CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-chroma

LangChain4j integration for Chroma embedding store enabling storage, retrieval, and similarity search of vector embeddings with metadata filtering support for both API V1 and V2.

Overview
Eval results
Files

index.mddocs/

LangChain4j Chroma Embedding Store

Java integration for Chroma vector database providing storage, retrieval, and similarity search of embeddings. Implements LangChain4j's EmbeddingStore interface with metadata filtering support for both Chroma API V1 (0.5.16+) and V2 (0.7.0+).

Quick Start

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-chroma</artifactId>
    <version>1.11.0</version>
</dependency>

Minimal setup:

import dev.langchain4j.store.embedding.chroma.ChromaEmbeddingStore;
import dev.langchain4j.data.embedding.Embedding;

ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
    .baseUrl("http://localhost:8000")
    .collectionName("my-documents")
    .build();

// Add embedding
Embedding embedding = Embedding.from(new float[]{0.1f, 0.2f, 0.3f});
String id = store.add(embedding);

// Search
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(10)
    .build();
EmbeddingSearchResult<TextSegment> results = store.search(request);

Core Imports

// Main classes
import dev.langchain4j.store.embedding.chroma.ChromaEmbeddingStore;
import dev.langchain4j.store.embedding.chroma.ChromaApiVersion;

// Core types (langchain4j-core)
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import dev.langchain4j.store.embedding.filter.Filter;

Essential Concepts

Distance Metric: Always uses cosine distance (hnsw:space = cosine)

Score Calculation: score = 1 - (distance / 2) where distance ∈ [0, 2]

  • Score range: [0, 1] where 1 is perfect match

Auto-creation: Collections are automatically created if they don't exist

API Versions:

  • V1 (default): Flat structure, Chroma 0.5.16+
  • V2: Hierarchical tenant/database/collection, Chroma 0.7.0+

Common Operations

Add Embeddings

// Single with auto-generated ID
String id = store.add(embedding);

// Single with specific ID
store.add("custom-id", embedding);

// With text segment and metadata
TextSegment segment = TextSegment.from(
    "document text",
    new Metadata().put("author", "John").put("year", 2024)
);
String id = store.add(embedding, segment);

// Batch add (efficient)
List<String> ids = store.addAll(embeddings);

See: Add Operations | Metadata Guide

Search for Similar Embeddings

EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(10)
    .minScore(0.7)
    .build();

EmbeddingSearchResult<TextSegment> result = store.search(request);

for (EmbeddingMatch<TextSegment> match : result.matches()) {
    double score = match.score();
    String id = match.embeddingId();
    TextSegment segment = match.embedded();
}

See: Search Operations | Filtering Guide

Search with Metadata Filters

import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.*;

Filter filter = metadataKey("author").isEqualTo("John Doe")
    .and(metadataKey("year").isGreaterThanOrEqualTo(2020));

EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(5)
    .filter(filter)
    .build();

See: Filtering Guide | Filter API

Remove Embeddings

// Single ID
store.remove("id-to-remove");

// Multiple IDs
store.removeAll(Arrays.asList("id1", "id2", "id3"));

// By metadata filter
store.removeAll(metadataKey("status").isEqualTo("outdated"));

// All embeddings
store.removeAll();

See: Remove Operations

Configuration

V1 API (Default)

ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
    .baseUrl("http://localhost:8000")
    .collectionName("my-documents")
    .timeout(Duration.ofSeconds(10))
    .logRequests(true)
    .logResponses(true)
    .build();

V2 API (Hierarchical)

ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
    .apiVersion(ChromaApiVersion.V2)
    .baseUrl("http://localhost:8000")
    .tenantName("my-tenant")      // Default: "default"
    .databaseName("my-database")   // Default: "default"
    .collectionName("my-documents")
    .timeout(Duration.ofSeconds(10))
    .build();

See: Configuration Guide | Builder API

Integration Patterns

API Reference

Performance Tips

Use batch operations for multiple embeddings:

// Good: Single HTTP request
List<String> ids = store.addAll(embeddings);

// Bad: N HTTP requests
for (Embedding e : embeddings) { store.add(e); }

Reuse store instances - don't create per operation

Adjust timeouts for large operations:

.timeout(Duration.ofSeconds(30))

See: Performance Guide

Migration

Migrating from V1 to V2:

// V1 → V2: Add apiVersion and optional tenant/database
ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
    .apiVersion(ChromaApiVersion.V2)  // Add this
    .baseUrl("http://localhost:8000")
    .tenantName("my-tenant")          // Optional
    .databaseName("my-database")       // Optional
    .collectionName("my-collection")
    .build();

See: Migration Guide

Troubleshooting

Common issues:

  • Connection refused: Ensure Chroma server is running on baseUrl
  • Timeout errors: Increase timeout for large operations
  • Filter errors: Comparison operators (>, <) only work with numeric values
  • Dimension mismatch: All embeddings in a collection must have same dimensions

See: Error Handling Guide

External Resources

  • LangChain4j Documentation
  • Chroma Documentation
  • Integration Guide
  • LangChain4j Examples

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-chroma

docs

index.md

tile.json