An integration package connecting Chroma and LangChain for vector database operations.
—
Advanced search algorithms that optimize for both similarity to the query and diversity among results. MMR reduces redundancy by balancing relevance and diversity, making it ideal for generating varied search results.
Perform maximum marginal relevance search using text queries to find diverse, relevant results.
def max_marginal_relevance_search(
query: str,
k: int = 4,
fetch_k: int = 20,
lambda_mult: float = 0.5,
filter: Optional[dict[str, str]] = None,
where_document: Optional[dict[str, str]] = None,
**kwargs: Any
) -> list[Document]:
"""
Return documents selected using maximal marginal relevance.
Optimizes for similarity to query AND diversity among selected documents.
Parameters:
- query: Text query to search for
- k: Number of documents to return (default: 4)
- fetch_k: Number of documents to fetch for MMR algorithm (default: 20)
- lambda_mult: Diversity parameter (0-1):
0 = maximum diversity, 1 = minimum diversity (default: 0.5)
- filter: Metadata filter dictionary
- where_document: Document content filter
- **kwargs: Additional arguments passed to ChromaDB query
Returns:
List of Document objects selected by maximal marginal relevance
Raises:
ValueError: If embedding function is not provided
"""Usage Example:
# Basic MMR search with balanced diversity
results = vector_store.max_marginal_relevance_search(
query="machine learning algorithms",
k=5, # Return 5 diverse results
fetch_k=50, # Consider 50 candidates
lambda_mult=0.5 # Balanced relevance/diversity
)
# High diversity search
diverse_results = vector_store.max_marginal_relevance_search(
query="python programming",
k=10,
fetch_k=100,
lambda_mult=0.2, # Prioritize diversity
filter={"category": "tutorial"}
)
# High relevance search
relevant_results = vector_store.max_marginal_relevance_search(
query="deep learning",
k=5,
lambda_mult=0.8, # Prioritize relevance
where_document={"$contains": "neural network"}
)Perform MMR search using pre-computed embedding vectors instead of text queries.
def max_marginal_relevance_search_by_vector(
embedding: list[float],
k: int = 4,
fetch_k: int = 20,
lambda_mult: float = 0.5,
filter: Optional[dict[str, str]] = None,
where_document: Optional[dict[str, str]] = None,
**kwargs: Any
) -> list[Document]:
"""
Return documents selected using MMR with a pre-computed embedding vector.
Parameters:
- embedding: Pre-computed embedding vector to search with
- k: Number of documents to return (default: 4)
- fetch_k: Number of documents to fetch for MMR algorithm (default: 20)
- lambda_mult: Diversity parameter (0-1, default: 0.5)
- filter: Metadata filter dictionary
- where_document: Document content filter
- **kwargs: Additional arguments passed to ChromaDB query
Returns:
List of Document objects selected by maximal marginal relevance
"""Usage Example:
# MMR search with pre-computed vector
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
query_vector = embeddings.embed_query("artificial intelligence research")
results = vector_store.max_marginal_relevance_search_by_vector(
embedding=query_vector,
k=8,
fetch_k=40,
lambda_mult=0.3, # Favor diversity
filter={"domain": "research"}
)
for doc in results:
print(f"Content: {doc.page_content[:100]}...")
print(f"Metadata: {doc.metadata}")fetch_k most similar documents to the querylambda_mult)1 - lambda_mult)k documents that are both relevant and diverseThe lambda_mult parameter controls the trade-off between relevance and diversity:
For balanced results (default):
k=4, fetch_k=20, lambda_mult=0.5For high diversity (research, exploration):
k=10, fetch_k=50, lambda_mult=0.2For high relevance (focused search):
k=5, fetch_k=15, lambda_mult=0.8For large result sets:
k=20, fetch_k=100, lambda_mult=0.4MMR search integrates seamlessly with LangChain's retriever interface for RAG applications.
Usage Example:
# Create MMR retriever
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={
"k": 6,
"fetch_k": 30,
"lambda_mult": 0.4
}
)
# Use in RAG chain
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(),
retriever=retriever,
chain_type="stuff"
)
result = qa_chain.invoke({"query": "What are different machine learning approaches?"})Install with Tessl CLI
npx tessl i tessl/pypi-langchain-chroma