CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-pgvector

LangChain4j PGVector integration for PostgreSQL-based vector embedding storage and retrieval

Pending
Overview
Eval results
Files

index.mddocs/

LangChain4j PGVector

LangChain4j PGVector is a PostgreSQL-based vector embedding storage and retrieval implementation for LangChain4j. It provides seamless integration with the PGVector extension, enabling developers to store and query vector embeddings directly in PostgreSQL databases with support for both standard cosine similarity search and hybrid search combining vector similarity with full-text keyword search.

Package Information

  • Package Name: langchain4j-pgvector
  • Package Type: Maven
  • Language: Java
  • Group ID: dev.langchain4j
  • Artifact ID: langchain4j-pgvector
  • Version: 1.11.0
  • Installation: Add to your Maven pom.xml:
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-pgvector</artifactId>
    <version>1.11.0</version>
</dependency>

Or for Gradle:

implementation 'dev.langchain4j:langchain4j-pgvector:1.11.0'

Core Imports

import dev.langchain4j.store.embedding.pgvector.PgVectorEmbeddingStore;
import dev.langchain4j.store.embedding.pgvector.PgVectorEmbeddingStore.SearchMode;
import dev.langchain4j.store.embedding.pgvector.MetadataStorageConfig;
import dev.langchain4j.store.embedding.pgvector.MetadataStorageMode;
import dev.langchain4j.store.embedding.pgvector.DefaultMetadataStorageConfig;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;

Basic Usage

Minimal Configuration

import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.pgvector.PgVectorEmbeddingStore;

// Create embedding store with minimal configuration
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
    .host("localhost")
    .port(5432)
    .database("postgres")
    .user("my_user")
    .password("my_password")
    .table("my_embeddings")
    .dimension(384)  // Must match your embedding model's dimension
    .build();

// Add embedding
String id = embeddingStore.add(embedding);

// Add embedding with text segment
String id2 = embeddingStore.add(embedding, textSegment);

// Search for similar embeddings
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(5)
    .minScore(0.7)
    .build();

EmbeddingSearchResult<TextSegment> results = embeddingStore.search(request);

With DataSource (Recommended for Production)

import javax.sql.DataSource;
import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;

// Configure connection pool
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://localhost:5432/postgres");
config.setUsername("my_user");
config.setPassword("my_password");
config.setMaximumPoolSize(10);

DataSource dataSource = new HikariDataSource(config);

// Create embedding store with DataSource
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.datasourceBuilder()
    .datasource(dataSource)
    .table("my_embeddings")
    .dimension(384)
    .build();

Architecture

LangChain4j PGVector is built around several key components:

  • PgVectorEmbeddingStore: Main implementation of the EmbeddingStore<TextSegment> interface, providing storage and retrieval operations
  • Search Modes: Two search strategies - VECTOR (cosine similarity) and HYBRID (vector + full-text keyword search with RRF)
  • Metadata Storage: Three configurable modes for storing embedding metadata - COLUMN_PER_KEY, COMBINED_JSON, and COMBINED_JSONB
  • Index Support: Optional IVFFlat indexing for improved query performance on large datasets
  • Connection Management: Flexible configuration supporting both direct connection parameters and DataSource integration
  • Automatic Schema Management: Optional automatic table creation and PGVector extension installation

Capabilities

Embedding Store Creation

Create and configure PgVectorEmbeddingStore instances with flexible builder patterns supporting both direct database connections and DataSource integration.

/**
 * Creates a builder for PgVectorEmbeddingStore with individual connection parameters
 * @return PgVectorEmbeddingStoreBuilder instance
 */
public static PgVectorEmbeddingStoreBuilder builder();

/**
 * Creates a builder for PgVectorEmbeddingStore with DataSource
 * @return DatasourceBuilder instance
 */
public static DatasourceBuilder datasourceBuilder();

Embedding Store Creation

Embedding Operations

Add, remove, and manage embeddings with support for single and batch operations, including text segments and metadata.

/**
 * Adds an embedding to the store with auto-generated ID
 * @param embedding The embedding to be added
 * @return The auto-generated ID
 */
String add(Embedding embedding);

/**
 * Adds an embedding with text segment to the store
 * @param embedding The embedding to be added
 * @param textSegment The original content that was embedded
 * @return The auto-generated ID
 */
String add(Embedding embedding, TextSegment textSegment);

/**
 * Adds multiple embeddings to the store
 * @param embeddings List of embeddings to be added
 * @return List of auto-generated IDs
 */
List<String> addAll(List<Embedding> embeddings);

/**
 * Removes a single embedding by its ID
 * @param id The ID of the embedding to remove
 */
void remove(String id);

/**
 * Removes all embeddings from the store
 */
void removeAll();

/**
 * Removes embeddings by their IDs
 * @param ids Collection of embedding IDs to remove
 */
void removeAll(Collection<String> ids);

Embedding Operations

Search Operations

Search for similar embeddings using vector similarity (VECTOR mode) or hybrid search combining vector similarity with full-text keyword search (HYBRID mode).

/**
 * Searches for the most similar embeddings
 * @param request Search request containing query embedding, filters, and parameters
 * @return Search results with matching embeddings
 */
EmbeddingSearchResult<TextSegment> search(EmbeddingSearchRequest request);

Search Operations

Metadata Storage Configuration

Configure how metadata is stored and indexed in the database with three flexible storage modes.

/**
 * Metadata storage mode enumeration
 */
enum MetadataStorageMode {
    /** For static metadata when you know the list of keys in advance */
    COLUMN_PER_KEY,
    /** For dynamic metadata stored as JSON */
    COMBINED_JSON,
    /** For dynamic metadata stored as binary JSON (optimized for queries) */
    COMBINED_JSONB
}

/**
 * Creates a default metadata storage configuration
 * @return Default configuration with COMBINED_JSON mode
 */
static MetadataStorageConfig defaultConfig();

Metadata Storage Configuration

Types

SearchMode

/**
 * Search mode enumeration for PgVectorEmbeddingStore
 */
enum SearchMode {
    /** Standard vector similarity search using cosine distance */
    VECTOR,
    /** Combines vector search with full-text keyword search using Reciprocal Rank Fusion (RRF) */
    HYBRID
}

PgVectorEmbeddingStore

/**
 * PGVector EmbeddingStore Implementation
 * Only cosine similarity is used for vector distance
 * Only IVFFlat index type is supported
 * Implements EmbeddingStore<TextSegment> interface
 */
class PgVectorEmbeddingStore implements EmbeddingStore<TextSegment> {
    // Main implementation - see capabilities for methods
}

Prerequisites

  • PostgreSQL database with PGVector extension installed
  • Java 11 or higher
  • LangChain4j core library (automatically included as dependency)
  • PostgreSQL JDBC driver (automatically included as dependency)

Key Features

  • Vector Similarity Search: Cosine similarity-based search for finding semantically similar embeddings
  • Hybrid Search: Combines vector similarity with PostgreSQL full-text search using Reciprocal Rank Fusion (RRF) algorithm
  • Flexible Metadata Storage: Three storage modes (COLUMN_PER_KEY, COMBINED_JSON, COMBINED_JSONB) to suit different use cases
  • IVFFlat Indexing: Optional performance optimization for large datasets (>100k embeddings)
  • Connection Pooling: Native support for DataSource and connection pooling libraries like HikariCP
  • Automatic Schema Management: Optional automatic table creation with proper schema and PGVector extension installation
  • Batch Operations: Efficient bulk add and remove operations
  • Filter Support: Metadata filtering for both search and removal operations
  • ACID Guarantees: Leverages PostgreSQL's reliability and transaction support

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-pgvector@1.11.0

docs

embedding-operations.md

index.md

metadata-storage.md

search-operations.md

store-creation.md

tile.json