CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-chromadb

Chroma - the open-source embedding database

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

ChromaDB

ChromaDB is an open-source embedding database designed as the fastest way to build Python or JavaScript LLM applications with memory. It provides a simple API for storing, querying, and managing document embeddings with automatic tokenization, embedding generation, and indexing capabilities.

Package Information

  • Package Name: chromadb
  • Language: Python
  • Installation: pip install chromadb

Core Imports

import chromadb

For specific client types:

from chromadb import EphemeralClient, PersistentClient, HttpClient

For types and embedding functions:

from chromadb.api.types import Documents, Embeddings, Metadatas, Where, Include
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

Basic Usage

import chromadb

# Create a client (in-memory for testing)
client = chromadb.EphemeralClient()

# Create a collection
collection = client.create_collection(name="my_collection")

# Add documents with metadata
collection.add(
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)

# Query for similar documents
results = collection.query(
    query_texts=["This is a query document"],
    n_results=2
)

print(results)

Architecture

ChromaDB operates on a client-server architecture with multiple deployment options:

  • Client Types: Multiple client implementations for different deployment scenarios (in-memory, persistent, remote HTTP, cloud)
  • Collections: Named containers for documents and embeddings with configurable metadata and embedding functions
  • Embedding Functions: Pluggable components for generating embeddings from various providers (OpenAI, Cohere, HuggingFace, etc.)
  • Query System: Vector similarity search with metadata filtering and document text matching
  • Multi-tenancy: Support for tenants and databases for data isolation

Capabilities

Client Creation

Factory functions for creating different types of ChromaDB clients depending on deployment needs. Supports in-memory for testing, persistent storage for local development, and remote connections for production.

def EphemeralClient(settings=None, tenant="default_tenant", database="default_database"): ...
def PersistentClient(path="./chroma", settings=None, tenant="default_tenant", database="default_database"): ...
def HttpClient(host="localhost", port=8000, ssl=False, headers=None, settings=None, tenant="default_tenant", database="default_database"): ...
def CloudClient(tenant=None, database=None, api_key=None, settings=None): ...

Client Creation

Collection Operations

Core collection management including creation, retrieval, modification, and deletion. Collections are the primary containers for documents and embeddings with configurable metadata and embedding functions.

class ClientAPI:
    def create_collection(self, name: str, **kwargs) -> Collection: ...
    def get_collection(self, name: str, **kwargs) -> Collection: ...
    def delete_collection(self, name: str) -> None: ...
    def list_collections(self, limit=None, offset=None) -> Sequence[Collection]: ...

Collection Management

Document Operations

Adding, updating, querying, and deleting documents within collections. Supports embeddings, metadata, images, and URIs with flexible data formats and automatic embedding generation.

class Collection:
    def add(self, ids, documents=None, embeddings=None, metadatas=None, images=None, uris=None): ...
    def query(self, query_texts=None, query_embeddings=None, n_results=10, where=None, **kwargs): ...
    def get(self, ids=None, where=None, limit=None, **kwargs): ...
    def update(self, ids, documents=None, embeddings=None, metadatas=None, **kwargs): ...
    def delete(self, ids=None, where=None, where_document=None): ...

Document Operations

Embedding Functions

Pre-built and configurable embedding functions for generating vector embeddings from text, supporting major AI providers and embedding models with consistent interfaces.

class EmbeddingFunction:
    def __call__(self, input): ...
    def embed_with_retries(self, input, **retry_kwargs): ...

Embedding Functions

Query and Filtering

Advanced query capabilities including vector similarity search, metadata filtering, and document text matching with logical operators and flexible result formatting.

Where = Dict[Union[str, LogicalOperator], Union[LiteralValue, OperatorExpression, List[Where]]]
WhereDocument = Dict[WhereDocumentOperator, Union[str, List[WhereDocument]]]
Include = List[Literal["documents", "embeddings", "metadatas", "distances", "uris", "data"]]

Queries and Filtering

Configuration and Settings

Comprehensive configuration system for customizing ChromaDB behavior including authentication, server settings, telemetry, and storage options.

class Settings:
    def __init__(self, **kwargs): ...

def configure(**kwargs) -> None: ...
def get_settings() -> Settings: ...

Configuration

Types

Core Types

# Document and metadata types
Documents = List[str]
Metadatas = List[Dict[str, Union[str, int, float, bool]]]
IDs = List[str]
Embeddings = List[List[float]]

# Query result types
GetResult = TypedDict('GetResult', {
    'ids': List[List[str]],
    'documents': List[List[Optional[str]]],
    'metadatas': List[List[Optional[Dict]]],
    'embeddings': List[List[Optional[List[float]]]],
    'distances': List[List[float]],
    'uris': List[List[Optional[str]]],
    'data': List[List[Optional[Any]]],
    'included': List[str]
})

QueryResult = TypedDict('QueryResult', {
    'ids': List[List[str]],
    'documents': List[List[Optional[str]]],
    'metadatas': List[List[Optional[Dict]]],
    'embeddings': List[List[Optional[List[float]]]],
    'distances': List[List[float]],
    'uris': List[List[Optional[str]]],
    'data': List[List[Optional[Any]]],
    'included': List[str]
})
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/chromadb@1.0.x
Publish Source
CLI
Badge
tessl/pypi-chromadb badge