0
# Haystack-AI
1
2
A comprehensive end-to-end LLM framework for building production-ready applications powered by large language models, transformer models, and vector search capabilities. Haystack enables developers to perform retrieval-augmented generation (RAG), document search, question answering, and answer generation by orchestrating state-of-the-art embedding models and LLMs into flexible pipelines.
3
4
## Package Information
5
6
- **Package Name**: haystack-ai
7
- **Language**: Python
8
- **Installation**: `pip install haystack-ai`
9
10
## Core Imports
11
12
```python
13
import haystack
14
```
15
16
Main components:
17
18
```python
19
from haystack import Pipeline, Document, component
20
from haystack.components.generators import OpenAIGenerator
21
from haystack.components.embedders import OpenAITextEmbedder
22
from haystack.components.retrievers import InMemoryEmbeddingRetriever
23
```
24
25
## Basic Usage
26
27
```python
28
from haystack import Pipeline, Document, component
29
from haystack.components.generators import OpenAIGenerator
30
from haystack.components.builders import PromptBuilder
31
from haystack.document_stores.in_memory import InMemoryDocumentStore
32
from haystack.components.retrievers import InMemoryEmbeddingRetriever
33
from haystack.components.embedders import OpenAITextEmbedder, OpenAIDocumentEmbedder
34
35
# Create a simple RAG pipeline
36
documents = [
37
Document(content="Python is a programming language."),
38
Document(content="Berlin is the capital of Germany."),
39
Document(content="Pipelines connect components in Haystack.")
40
]
41
42
# Initialize document store and components
43
document_store = InMemoryDocumentStore()
44
45
# Create pipeline
46
rag_pipeline = Pipeline()
47
48
# Add components
49
rag_pipeline.add_component("text_embedder", OpenAITextEmbedder())
50
rag_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
51
rag_pipeline.add_component("prompt_builder", PromptBuilder(template="Answer the question based on the context: {{query}} Context: {{documents}}"))
52
rag_pipeline.add_component("generator", OpenAIGenerator())
53
54
# Connect components
55
rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
56
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
57
rag_pipeline.connect("prompt_builder.prompt", "generator.prompt")
58
59
# Embed and store documents
60
doc_embedder = OpenAIDocumentEmbedder()
61
embedded_docs = doc_embedder.run(documents=documents)
62
document_store.write_documents(embedded_docs["documents"])
63
64
# Run the pipeline
65
response = rag_pipeline.run({
66
"text_embedder": {"text": "What is Python?"},
67
"prompt_builder": {"query": "What is Python?"}
68
})
69
70
print(response["generator"]["replies"][0])
71
```
72
73
## Architecture
74
75
Haystack follows a modular, component-based architecture:
76
77
- **Pipeline**: Orchestrates the flow of data between components using a directed acyclic graph (DAG)
78
- **Components**: Modular building blocks that perform specific tasks (embedding, generation, retrieval, etc.)
79
- **Document Stores**: Storage systems for documents and embeddings
80
- **Data Classes**: Structured data types (Document, Answer, ChatMessage, etc.) that flow between components
81
82
This design enables flexible composition of AI workflows, from simple Q&A systems to complex multi-step reasoning chains and autonomous agents.
83
84
## Capabilities
85
86
### Core Framework
87
88
Essential framework components for building pipelines, managing data flow, and creating custom components.
89
90
```python { .api }
91
class Pipeline:
92
def add_component(self, name: str, instance: Any) -> None: ...
93
def connect(self, sender: str, receiver: str) -> None: ...
94
def run(self, inputs: Dict[str, Any]) -> Dict[str, Any]: ...
95
96
class AsyncPipeline:
97
async def run(self, inputs: Dict[str, Any]) -> Dict[str, Any]: ...
98
99
@component
100
def my_component() -> None: ...
101
102
class Document:
103
def __init__(self, content: str, meta: Dict[str, Any] = None): ...
104
```
105
106
[Core Framework](./core-framework.md)
107
108
### Text Generation
109
110
Large language model integrations for text generation, chat completions, and answer synthesis.
111
112
```python { .api }
113
class OpenAIGenerator:
114
def run(self, prompt: str, **kwargs) -> Dict[str, Any]: ...
115
116
class OpenAIChatGenerator:
117
def run(self, messages: List[ChatMessage], **kwargs) -> Dict[str, Any]: ...
118
119
class HuggingFaceLocalGenerator:
120
def run(self, prompt: str, **kwargs) -> Dict[str, Any]: ...
121
```
122
123
[Text Generation](./text-generation.md)
124
125
### Text Embeddings
126
127
Convert text and documents into vector embeddings for semantic search and retrieval.
128
129
```python { .api }
130
class OpenAITextEmbedder:
131
def run(self, text: str) -> Dict[str, List[float]]: ...
132
133
class OpenAIDocumentEmbedder:
134
def run(self, documents: List[Document]) -> Dict[str, List[Document]]: ...
135
136
class SentenceTransformersTextEmbedder:
137
def run(self, text: str) -> Dict[str, List[float]]: ...
138
```
139
140
[Text Embeddings](./text-embeddings.md)
141
142
### Document Processing
143
144
Convert various file formats to Haystack Document objects and preprocess text for optimal retrieval.
145
146
```python { .api }
147
class PyPDFToDocument:
148
def run(self, sources: List[str]) -> Dict[str, List[Document]]: ...
149
150
class HTMLToDocument:
151
def run(self, sources: List[str]) -> Dict[str, List[Document]]: ...
152
153
class DocumentSplitter:
154
def run(self, documents: List[Document]) -> Dict[str, List[Document]]: ...
155
```
156
157
[Document Processing](./document-processing.md)
158
159
### Retrieval
160
161
Search and retrieve relevant documents using various retrieval strategies.
162
163
```python { .api }
164
class InMemoryEmbeddingRetriever:
165
def run(self, query_embedding: List[float], top_k: int = 10) -> Dict[str, List[Document]]: ...
166
167
class InMemoryBM25Retriever:
168
def run(self, query: str, top_k: int = 10) -> Dict[str, List[Document]]: ...
169
170
class FilterRetriever:
171
def run(self, filters: Dict[str, Any]) -> Dict[str, List[Document]]: ...
172
```
173
174
[Retrieval](./retrieval.md)
175
176
### Prompt Building
177
178
Create and format prompts for language models with dynamic content injection.
179
180
```python { .api }
181
class PromptBuilder:
182
def run(self, **kwargs) -> Dict[str, str]: ...
183
184
class ChatPromptBuilder:
185
def run(self, **kwargs) -> Dict[str, List[ChatMessage]]: ...
186
```
187
188
[Prompt Building](./prompt-building.md)
189
190
### Document Stores
191
192
Storage backends for documents and embeddings with filtering and search capabilities.
193
194
```python { .api }
195
class InMemoryDocumentStore:
196
def write_documents(self, documents: List[Document]) -> int: ...
197
def filter_documents(self, filters: Dict[str, Any]) -> List[Document]: ...
198
def count_documents(self) -> int: ...
199
```
200
201
[Document Stores](./document-stores.md)
202
203
### Evaluation
204
205
Metrics and evaluation components for assessing pipeline performance and answer quality.
206
207
```python { .api }
208
class ContextRelevanceEvaluator:
209
def run(self, questions: List[str], contexts: List[List[str]]) -> Dict[str, List[float]]: ...
210
211
class FaithfulnessEvaluator:
212
def run(self, questions: List[str], contexts: List[List[str]], responses: List[str]) -> Dict[str, List[float]]: ...
213
```
214
215
[Evaluation](./evaluation.md)
216
217
### Agent Framework
218
219
Build autonomous agents that can use tools and maintain conversation state.
220
221
```python { .api }
222
class Agent:
223
def run(self, messages: List[ChatMessage]) -> Dict[str, List[ChatMessage]]: ...
224
225
class ToolInvoker:
226
def run(self, tool_calls: List[ToolCall]) -> Dict[str, List[ToolCallResult]]: ...
227
```
228
229
[Agent Framework](./agent-framework.md)
230
231
## Types
232
233
```python { .api }
234
class Document:
235
content: str
236
meta: Dict[str, Any]
237
id: str
238
score: Optional[float]
239
embedding: Optional[List[float]]
240
241
class ChatMessage:
242
content: str
243
role: ChatRole
244
name: Optional[str]
245
tool_calls: Optional[List[ToolCall]]
246
tool_call_result: Optional[ToolCallResult]
247
248
class ChatRole(Enum):
249
USER = "user"
250
ASSISTANT = "assistant"
251
SYSTEM = "system"
252
TOOL = "tool"
253
254
class GeneratedAnswer:
255
data: str
256
query: str
257
documents: List[Document]
258
meta: Dict[str, Any]
259
260
class ExtractedAnswer:
261
query: str
262
score: Optional[float]
263
data: str
264
document: Optional[Document]
265
context: Optional[str]
266
offsets_in_document: List[Span]
267
offsets_in_context: List[Span]
268
meta: Dict[str, Any]
269
270
class ToolCall:
271
tool_name: str
272
arguments: Dict[str, Any]
273
id: Optional[str]
274
275
class ToolCallResult:
276
result: str
277
origin: ToolCall
278
error: bool
279
```