LLM framework to build customizable, production-ready LLM applications with pipelines connecting models, vector DBs, and data processors.
npx @tessl/cli install tessl/pypi-farm-haystack@1.26.0Haystack is a comprehensive end-to-end NLP framework that enables developers to build sophisticated applications powered by Large Language Models (LLMs), Transformer models, and vector search capabilities. The framework provides a modular architecture based on Pipelines that connect various Nodes (preprocessing, retrieval, language model components) to perform complex NLP tasks such as retrieval-augmented generation (RAG), question answering, semantic document search, and answer generation.
pip install farm-haystackimport haystack
from haystack import Document, Answer, Label, MultiLabel, Span, EvaluationResult, TableCell, Pipeline, hash128
from haystack.nodes.base import BaseComponentCommon imports for building pipelines:
from haystack.document_stores import InMemoryDocumentStore, ElasticsearchDocumentStore, FAISSDocumentStore
from haystack.nodes import BM25Retriever, EmbeddingRetriever, FARMReader, TransformersReader
from haystack.pipelines import ExtractiveQAPipeline, DocumentSearchPipelinefrom haystack import Document, Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, FARMReader
from haystack.pipelines import ExtractiveQAPipeline
# Create documents
docs = [
Document(content="Paris is the capital of France."),
Document(content="Berlin is the capital of Germany."),
Document(content="Madrid is the capital of Spain.")
]
# Initialize document store and add documents
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)
# Create retriever and reader components
retriever = BM25Retriever(document_store=document_store)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")
# Build pipeline
pipeline = ExtractiveQAPipeline(reader=reader, retriever=retriever)
# Ask a question
result = pipeline.run(query="What is the capital of France?")
print(result["answers"][0].answer) # "Paris"Haystack follows a modular component-based architecture with three core concepts:
The framework supports both Retrieval-Augmented Generation (RAG) workflows and Agent-based interactive systems, making it suitable for production-grade applications requiring sophisticated natural language processing capabilities.
Fundamental data classes for documents, answers, labels, and evaluation results that form the foundation of all Haystack operations.
class Document:
def __init__(self, content: Union[str, DataFrame], content_type: str = "text",
meta: Dict[str, Any] = None, id: Optional[str] = None): ...
class Answer:
def __init__(self, answer: str, type: str = "extractive",
score: Optional[float] = None, context: Optional[str] = None): ...
class Label:
def __init__(self, query: str, answer: Answer, is_correct_answer: bool = True,
is_correct_document: bool = True): ...Backend storage systems supporting vector and keyword search across multiple databases including Elasticsearch, FAISS, Pinecone, Weaviate, and others.
class BaseDocumentStore:
def write_documents(self, documents: List[Document]): ...
def get_all_documents(self) -> List[Document]: ...
def query(self, query: str, top_k: int = 10) -> List[Document]: ...
class ElasticsearchDocumentStore(BaseDocumentStore): ...
class FAISSDocumentStore(BaseDocumentStore): ...
class PineconeDocumentStore(BaseDocumentStore): ...Dense and sparse retrieval components for finding relevant documents using embeddings, BM25, TF-IDF, and specialized retrieval methods.
class BM25Retriever(BaseRetriever):
def __init__(self, document_store: BaseDocumentStore): ...
def retrieve(self, query: str, top_k: int = 10) -> List[Document]: ...
class EmbeddingRetriever(BaseRetriever):
def __init__(self, document_store: BaseDocumentStore,
embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2"): ...Reading comprehension components for extractive question answering using FARM, Transformers, and specialized table readers.
class FARMReader(BaseReader):
def __init__(self, model_name_or_path: str = "deepset/roberta-base-squad2"): ...
def predict(self, query: str, documents: List[Document]) -> List[Answer]: ...
class TransformersReader(BaseReader):
def __init__(self, model_name_or_path: str = "deepset/roberta-base-squad2"): ...Language model components for text generation using OpenAI, Transformers, and other LLM providers for generative QA and text synthesis.
class OpenAIAnswerGenerator(BaseGenerator):
def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"): ...
def predict(self, query: str, documents: List[Document]) -> List[Answer]: ...
class OpenAIChatGenerator(BaseGenerator):
def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"): ...Pre-built and custom pipeline templates for orchestrating component workflows including QA, search, generation, and indexing pipelines.
class Pipeline:
def __init__(self): ...
def add_node(self, component: BaseComponent, name: str, inputs: List[str]): ...
def run(self, **kwargs): ...
class ExtractiveQAPipeline(Pipeline): ...
class GenerativeQAPipeline(Pipeline): ...
class DocumentSearchPipeline(Pipeline): ...Interactive LLM agents with tool usage, memory management, and conversational capabilities for complex reasoning tasks.
class Agent:
def __init__(self, prompt_node: PromptNode, memory: Optional[BaseMemory] = None): ...
def run(self, query: str) -> AgentStep: ...
class ConversationalAgent(Agent): ...
class Tool:
def __init__(self, name: str, pipeline_or_node: Union[BaseComponent, Pipeline]): ...Document converters and preprocessors for handling PDF, DOCX, HTML, images, and other file formats with text extraction and cleaning.
class BaseConverter:
def convert(self, file_path: Path, **kwargs) -> List[Document]: ...
class PDFToTextConverter(BaseConverter): ...
class DocxToTextConverter(BaseConverter): ...
class PreProcessor(BaseComponent):
def process(self, documents: List[Document]) -> List[Document]: ...Evaluation metrics, model evaluation tools, and utility functions for assessing pipeline performance and data processing.
def eval_pipeline(pipeline: Pipeline, eval_labels: List[Label]) -> EvaluationResult: ...
class EvaluationResult:
def __init__(self): ...
def calculate_metrics(self) -> Dict[str, float]: ...