CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-chroma

LangChain4j integration for Chroma embedding store enabling storage, retrieval, and similarity search of vector embeddings with metadata filtering support for both API V1 and V2.

Pending
Overview
Eval results
Files

migration.mddocs/guides/

Migration Guide

Guide for migrating from V1 to V2 API and upgrading between versions.

V1 to V2 API Migration

Overview

Chroma V2 API (0.7.0+) introduces hierarchical organization with tenants, databases, and collections.

Key Differences:

  • V1: Flat structure (collection only)
  • V2: Hierarchical (tenant → database → collection)
  • V2: Auto-creates tenant and database if they don't exist
  • V2: Better multi-tenancy support

Minimal Migration

Before (V1):

ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
    .baseUrl("http://localhost:8000")
    .collectionName("my-collection")
    .build();

After (V2):

ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
    .apiVersion(ChromaApiVersion.V2)  // Add this
    .baseUrl("http://localhost:8000")
    .collectionName("my-collection")
    // Defaults: tenantName="default", databaseName="default"
    .build();

Full Migration with Custom Tenant/Database

ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
    .apiVersion(ChromaApiVersion.V2)
    .baseUrl("http://localhost:8000")
    .tenantName("production")         // New in V2
    .databaseName("main")             // New in V2
    .collectionName("my-collection")
    .timeout(Duration.ofSeconds(15))
    .build();

Migration Checklist

  • Update Chroma server to 0.7.0+ (1.1.0)
  • Update langchain4j-chroma to 1.7.0-beta13+
  • Add .apiVersion(ChromaApiVersion.V2) to builder
  • Optionally specify tenantName and databaseName
  • Test with existing data
  • Update deployment configuration
  • Update documentation

Data Migration

Approach 1: No Migration Needed

V2 can use default tenant/database:

// V1 collection "my-collection"
// → V2 "default" tenant / "default" database / "my-collection"

ChromaEmbeddingStore storeV2 = ChromaEmbeddingStore.builder()
    .apiVersion(ChromaApiVersion.V2)
    .baseUrl("http://localhost:8000")
    // Uses default tenant and database
    .collectionName("my-collection")  // Same collection name
    .build();

Note: Chroma handles this automatically. Your existing V1 collections are accessible in V2 under default tenant/database.

Approach 2: Migrate to Custom Tenant/Database

If you want to organize collections under custom tenant/database:

public class DataMigration {

    public void migrateToCustomTenant() {
        // Source (V1 or V2 default)
        ChromaEmbeddingStore source = ChromaEmbeddingStore.builder()
            .apiVersion(ChromaApiVersion.V1)
            .baseUrl("http://localhost:8000")
            .collectionName("my-collection")
            .build();

        // Destination (V2 custom)
        ChromaEmbeddingStore dest = ChromaEmbeddingStore.builder()
            .apiVersion(ChromaApiVersion.V2)
            .baseUrl("http://localhost:8000")
            .tenantName("production")
            .databaseName("main")
            .collectionName("my-collection")
            .build();

        // Retrieve all documents
        Embedding anyEmbedding = Embedding.from(new float[]{1.0f, 0.0f, 0.0f});

        EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
            .queryEmbedding(anyEmbedding)
            .maxResults(10000)  // Adjust based on collection size
            .build();

        EmbeddingSearchResult<TextSegment> results = source.search(request);

        // Copy to new tenant/database
        List<Embedding> embeddings = new ArrayList<>();
        List<TextSegment> segments = new ArrayList<>();

        for (EmbeddingMatch<TextSegment> match : results.matches()) {
            embeddings.add(match.embedding());
            segments.add(match.embedded());
        }

        dest.addAll(embeddings, segments);
    }
}

Note: This approach works for small to medium collections. For large collections, use batch processing.

Approach 3: Batch Migration for Large Collections

public class BatchMigration {

    public void migrateLargeCollection(int batchSize) {
        ChromaEmbeddingStore source = createSourceStore();
        ChromaEmbeddingStore dest = createDestStore();

        Embedding queryEmb = Embedding.from(new float[]{1.0f, 0.0f, 0.0f});
        int offset = 0;
        boolean hasMore = true;

        while (hasMore) {
            // Retrieve batch
            EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
                .queryEmbedding(queryEmb)
                .maxResults(batchSize)
                .build();

            EmbeddingSearchResult<TextSegment> results = source.search(request);

            if (results.matches().isEmpty()) {
                hasMore = false;
                break;
            }

            // Copy batch
            List<Embedding> embeddings = results.matches().stream()
                .map(EmbeddingMatch::embedding)
                .collect(Collectors.toList());

            List<TextSegment> segments = results.matches().stream()
                .map(EmbeddingMatch::embedded)
                .collect(Collectors.toList());

            dest.addAll(embeddings, segments);

            System.out.println("Migrated batch at offset " + offset);
            offset += batchSize;
        }
    }
}

Note: Chroma's search doesn't natively support offset/pagination. For very large collections, consider using metadata timestamps to track migration progress.

Version Upgrades

Upgrading langchain4j-chroma

Maven:

<!-- From older version -->
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-chroma</artifactId>
    <version>0.30.0</version>
</dependency>

<!-- To latest -->
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-chroma</artifactId>
    <version>1.11.0</version>
</dependency>

Gradle:

// From: implementation 'dev.langchain4j:langchain4j-chroma:0.30.0'
implementation 'dev.langchain4j:langchain4j-chroma:1.11.0'

Breaking Changes

1.7.0-beta13

  • Added V2 API support
  • Deprecated direct constructor in favor of builder

Migration:

// DEPRECATED
ChromaEmbeddingStore store = new ChromaEmbeddingStore(
    "http://localhost:8000",
    "my-collection",
    Duration.ofSeconds(10),
    true,
    true
);

// RECOMMENDED
ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
    .baseUrl("http://localhost:8000")
    .collectionName("my-collection")
    .timeout(Duration.ofSeconds(10))
    .logRequests(true)
    .logResponses(true)
    .build();

1.11.0

  • Added experimental observability listeners

New features (backward compatible):

// Optional: Add listeners for observability
EmbeddingStore<TextSegment> observed = store
    .addListener(new LoggingEmbeddingStoreListener());

Chroma Server Upgrades

From 0.5.x/0.6.x to 0.7.0+

  1. Backup data:

    # Backup Chroma data directory
    cp -r /path/to/chroma/data /path/to/backup
  2. Update Chroma:

    # Docker
    docker pull chromadb/chroma:latest
    
    # Or update pip package
    pip install --upgrade chromadb
  3. Restart Chroma:

    docker run -p 8000:8000 chromadb/chroma:latest
  4. Verify migration:

    // Test connection
    ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
        .baseUrl("http://localhost:8000")
        .build();
    
    Embedding test = Embedding.from(new float[]{1.0f, 0.0f, 0.0f});
    String id = store.add(test);
    store.remove(id);
    System.out.println("Migration successful");

Multi-Tenant Migration

Organizing Existing Collections

If migrating to multi-tenant setup:

// Before: All collections in single namespace
// Collections: "customer1-docs", "customer2-docs", "customer3-docs"

// After: Organized by tenant
public class TenantMigration {

    public void migrateToTenants() {
        Map<String, String> collectionToTenant = Map.of(
            "customer1-docs", "customer-1",
            "customer2-docs", "customer-2",
            "customer3-docs", "customer-3"
        );

        for (Map.Entry<String, String> entry : collectionToTenant.entrySet()) {
            String oldCollection = entry.getKey();
            String tenant = entry.getValue();

            // Source
            ChromaEmbeddingStore source = ChromaEmbeddingStore.builder()
                .baseUrl("http://localhost:8000")
                .collectionName(oldCollection)
                .build();

            // Destination with tenant
            ChromaEmbeddingStore dest = ChromaEmbeddingStore.builder()
                .apiVersion(ChromaApiVersion.V2)
                .baseUrl("http://localhost:8000")
                .tenantName(tenant)
                .databaseName("default")
                .collectionName("documents")  // Unified name
                .build();

            migrateCollection(source, dest);
        }
    }
}

Rollback Strategy

Keep V1 Configuration Available

public class VersionedStore {

    private final String baseUrl;
    private final String collectionName;

    public ChromaEmbeddingStore createStore(boolean useV2) {
        if (useV2) {
            return ChromaEmbeddingStore.builder()
                .apiVersion(ChromaApiVersion.V2)
                .baseUrl(baseUrl)
                .collectionName(collectionName)
                .build();
        } else {
            return ChromaEmbeddingStore.builder()
                .apiVersion(ChromaApiVersion.V1)
                .baseUrl(baseUrl)
                .collectionName(collectionName)
                .build();
        }
    }
}

Feature Flag

public class FeatureFlaggedStore {

    public ChromaEmbeddingStore createStore() {
        boolean useV2 = System.getenv("USE_CHROMA_V2")
            .map(Boolean::parseBoolean)
            .orElse(false);

        ChromaEmbeddingStore.Builder builder = ChromaEmbeddingStore.builder()
            .baseUrl("http://localhost:8000")
            .collectionName("my-collection");

        if (useV2) {
            builder.apiVersion(ChromaApiVersion.V2)
                .tenantName("production")
                .databaseName("main");
        }

        return builder.build();
    }
}

Testing Migration

Verify Data Integrity

public class MigrationValidator {

    public void validateMigration(
        ChromaEmbeddingStore source,
        ChromaEmbeddingStore dest
    ) {
        // Sample queries
        List<Embedding> testQueries = generateTestQueries();

        for (Embedding query : testQueries) {
            EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
                .queryEmbedding(query)
                .maxResults(10)
                .build();

            EmbeddingSearchResult<TextSegment> sourceResults =
                source.search(request);
            EmbeddingSearchResult<TextSegment> destResults =
                dest.search(request);

            // Compare results
            if (sourceResults.matches().size() != destResults.matches().size()) {
                System.err.println("Result count mismatch!");
            }

            // Compare content
            for (int i = 0; i < sourceResults.matches().size(); i++) {
                String sourceText = sourceResults.matches().get(i)
                    .embedded().text();
                String destText = destResults.matches().get(i)
                    .embedded().text();

                if (!sourceText.equals(destText)) {
                    System.err.println("Content mismatch at index " + i);
                }
            }
        }

        System.out.println("Validation complete");
    }
}

Gradual Migration

Dual-Write Strategy

public class DualWriteStore {

    private final ChromaEmbeddingStore v1Store;
    private final ChromaEmbeddingStore v2Store;

    public String add(Embedding embedding, TextSegment segment) {
        // Write to both
        String id1 = v1Store.add(embedding, segment);
        String id2 = v2Store.add(embedding, segment);

        // Return V2 ID (primary)
        return id2;
    }

    public EmbeddingSearchResult<TextSegment> search(
        EmbeddingSearchRequest request
    ) {
        // Read from V2 (primary)
        return v2Store.search(request);
    }
}

Gradual Cutover

public class GradualCutover {

    private double v2TrafficPercentage = 0.0;

    public EmbeddingSearchResult<TextSegment> search(
        EmbeddingSearchRequest request
    ) {
        if (Math.random() < v2TrafficPercentage) {
            return v2Store.search(request);
        } else {
            return v1Store.search(request);
        }
    }

    public void increaseV2Traffic(double percentage) {
        this.v2TrafficPercentage = Math.min(1.0, percentage);
    }
}

Best Practices

  1. Test in non-production first - Verify migration in dev/staging
  2. Backup data - Always backup before major migrations
  3. Use feature flags - Enable easy rollback
  4. Migrate gradually - Use dual-write or gradual cutover
  5. Validate data - Compare results between old and new
  6. Monitor performance - Watch for issues during migration
  7. Document changes - Keep team informed of migration status
  8. Plan rollback - Have rollback strategy ready
  9. Migrate off-peak - If possible, migrate during low-traffic periods
  10. Test thoroughly - Verify all functionality after migration

Troubleshooting

Collections Not Found After Migration

// Ensure correct tenant/database
ChromaEmbeddingStore store = ChromaEmbeddingStore.builder()
    .apiVersion(ChromaApiVersion.V2)
    .baseUrl("http://localhost:8000")
    .tenantName("default")    // Check this matches
    .databaseName("default")  // Check this matches
    .collectionName("my-collection")
    .build();

Performance Degradation

  • Check Chroma server resources
  • Verify network latency
  • Review timeout settings
  • Monitor server logs

Data Loss

  • Restore from backup
  • Verify migration script
  • Check error logs
  • Contact Chroma support if needed

Related

  • Configuration Guide - Setting up V2
  • ChromaApiVersion - API version details
  • Error Handling - Handling migration errors

External Resources

  • Chroma Migration Guide
  • Chroma Changelog
  • LangChain4j Updates

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-chroma

docs

index.md

tile.json