Tessl Tile for pypi/farm-haystack@1.26.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

agents.md core-schema.md document-stores.md evaluation-utilities.md file-processing.md generators.md index.md pipelines.md readers.md retrievers.md

index.mddocs/

0
# Haystack
1

2
Haystack is a comprehensive end-to-end NLP framework that enables developers to build sophisticated applications powered by Large Language Models (LLMs), Transformer models, and vector search capabilities. The framework provides a modular architecture based on Pipelines that connect various Nodes (preprocessing, retrieval, language model components) to perform complex NLP tasks such as retrieval-augmented generation (RAG), question answering, semantic document search, and answer generation.
3

4
## Package Information
5

6
- **Package Name**: farm-haystack
7
- **Language**: Python
8
- **Installation**: `pip install farm-haystack`
9
- **Python Support**: 3.8+
10
- **License**: Apache-2.0
11

12
## Core Imports
13

14
```python
15
import haystack
16
from haystack import Document, Answer, Label, MultiLabel, Span, EvaluationResult, TableCell, Pipeline, hash128
17
from haystack.nodes.base import BaseComponent
18
```
19

20
Common imports for building pipelines:
21

22
```python
23
from haystack.document_stores import InMemoryDocumentStore, ElasticsearchDocumentStore, FAISSDocumentStore
24
from haystack.nodes import BM25Retriever, EmbeddingRetriever, FARMReader, TransformersReader
25
from haystack.pipelines import ExtractiveQAPipeline, DocumentSearchPipeline
26
```
27

28
## Basic Usage
29

30
```python
31
from haystack import Document, Pipeline
32
from haystack.document_stores import InMemoryDocumentStore
33
from haystack.nodes import BM25Retriever, FARMReader
34
from haystack.pipelines import ExtractiveQAPipeline
35

36
# Create documents
37
docs = [
38
    Document(content="Paris is the capital of France."),
39
    Document(content="Berlin is the capital of Germany."),
40
    Document(content="Madrid is the capital of Spain.")
41
]
42

43
# Initialize document store and add documents
44
document_store = InMemoryDocumentStore()
45
document_store.write_documents(docs)
46

47
# Create retriever and reader components
48
retriever = BM25Retriever(document_store=document_store)
49
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")
50

51
# Build pipeline
52
pipeline = ExtractiveQAPipeline(reader=reader, retriever=retriever)
53

54
# Ask a question
55
result = pipeline.run(query="What is the capital of France?")
56
print(result["answers"][0].answer)  # "Paris"
57
```
58

59
## Architecture
60

61
Haystack follows a modular component-based architecture with three core concepts:
62

63
- **Components (Nodes)**: Modular processing units that perform specific tasks (retrieval, reading, generation, preprocessing)
64
- **Document Stores**: Backend storage systems for documents and embeddings (Elasticsearch, FAISS, Pinecone, etc.)
65
- **Pipelines**: Orchestration layer that connects components in directed graphs to solve complex NLP tasks
66

67
The framework supports both **Retrieval-Augmented Generation (RAG)** workflows and **Agent-based** interactive systems, making it suitable for production-grade applications requiring sophisticated natural language processing capabilities.
68

69
## Capabilities
70

71
### Core Schema & Data Structures
72

73
Fundamental data classes for documents, answers, labels, and evaluation results that form the foundation of all Haystack operations.
74

75
```python { .api }
76
class Document:
77
    def __init__(self, content: Union[str, DataFrame], content_type: str = "text", 
78
                 meta: Dict[str, Any] = None, id: Optional[str] = None): ...
79

80
class Answer:
81
    def __init__(self, answer: str, type: str = "extractive", 
82
                 score: Optional[float] = None, context: Optional[str] = None): ...
83

84
class Label:
85
    def __init__(self, query: str, answer: Answer, is_correct_answer: bool = True, 
86
                 is_correct_document: bool = True): ...
87
```
88

89
[Core Schema](./core-schema.md)
90

91
### Document Stores
92

93
Backend storage systems supporting vector and keyword search across multiple databases including Elasticsearch, FAISS, Pinecone, Weaviate, and others.
94

95
```python { .api }
96
class BaseDocumentStore:
97
    def write_documents(self, documents: List[Document]): ...
98
    def get_all_documents(self) -> List[Document]: ...
99
    def query(self, query: str, top_k: int = 10) -> List[Document]: ...
100

101
class ElasticsearchDocumentStore(BaseDocumentStore): ...
102
class FAISSDocumentStore(BaseDocumentStore): ...
103
class PineconeDocumentStore(BaseDocumentStore): ...
104
```
105

106
[Document Stores](./document-stores.md)
107

108
### Retriever Components
109

110
Dense and sparse retrieval components for finding relevant documents using embeddings, BM25, TF-IDF, and specialized retrieval methods.
111

112
```python { .api }
113
class BM25Retriever(BaseRetriever):
114
    def __init__(self, document_store: BaseDocumentStore): ...
115
    def retrieve(self, query: str, top_k: int = 10) -> List[Document]: ...
116

117
class EmbeddingRetriever(BaseRetriever):
118
    def __init__(self, document_store: BaseDocumentStore, 
119
                 embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2"): ...
120
```
121

122
[Retriever Components](./retrievers.md)
123

124
### Reader Components
125

126
Reading comprehension components for extractive question answering using FARM, Transformers, and specialized table readers.
127

128
```python { .api }
129
class FARMReader(BaseReader):
130
    def __init__(self, model_name_or_path: str = "deepset/roberta-base-squad2"): ...
131
    def predict(self, query: str, documents: List[Document]) -> List[Answer]: ...
132

133
class TransformersReader(BaseReader):
134
    def __init__(self, model_name_or_path: str = "deepset/roberta-base-squad2"): ...
135
```
136

137
[Reader Components](./readers.md)
138

139
### Generator Components
140

141
Language model components for text generation using OpenAI, Transformers, and other LLM providers for generative QA and text synthesis.
142

143
```python { .api }
144
class OpenAIAnswerGenerator(BaseGenerator):
145
    def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"): ...
146
    def predict(self, query: str, documents: List[Document]) -> List[Answer]: ...
147

148
class OpenAIChatGenerator(BaseGenerator):
149
    def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"): ...
150
```
151

152
[Generator Components](./generators.md)
153

154
### Pipeline System
155

156
Pre-built and custom pipeline templates for orchestrating component workflows including QA, search, generation, and indexing pipelines.
157

158
```python { .api }
159
class Pipeline:
160
    def __init__(self): ...
161
    def add_node(self, component: BaseComponent, name: str, inputs: List[str]): ...
162
    def run(self, **kwargs): ...
163

164
class ExtractiveQAPipeline(Pipeline): ...
165
class GenerativeQAPipeline(Pipeline): ...
166
class DocumentSearchPipeline(Pipeline): ...
167
```
168

169
[Pipeline System](./pipelines.md)
170

171
### Agent System
172

173
Interactive LLM agents with tool usage, memory management, and conversational capabilities for complex reasoning tasks.
174

175
```python { .api }
176
class Agent:
177
    def __init__(self, prompt_node: PromptNode, memory: Optional[BaseMemory] = None): ...
178
    def run(self, query: str) -> AgentStep: ...
179

180
class ConversationalAgent(Agent): ...
181
class Tool:
182
    def __init__(self, name: str, pipeline_or_node: Union[BaseComponent, Pipeline]): ...
183
```
184

185
[Agent System](./agents.md)
186

187
### File Processing
188

189
Document converters and preprocessors for handling PDF, DOCX, HTML, images, and other file formats with text extraction and cleaning.
190

191
```python { .api }
192
class BaseConverter:
193
    def convert(self, file_path: Path, **kwargs) -> List[Document]: ...
194

195
class PDFToTextConverter(BaseConverter): ...
196
class DocxToTextConverter(BaseConverter): ...
197
class PreProcessor(BaseComponent):
198
    def process(self, documents: List[Document]) -> List[Document]: ...
199
```
200

201
[File Processing](./file-processing.md)
202

203
### Evaluation & Utilities
204

205
Evaluation metrics, model evaluation tools, and utility functions for assessing pipeline performance and data processing.
206

207
```python { .api }
208
def eval_pipeline(pipeline: Pipeline, eval_labels: List[Label]) -> EvaluationResult: ...
209

210
class EvaluationResult:
211
    def __init__(self): ...
212
    def calculate_metrics(self) -> Dict[str, float]: ...
213
```
214

215
[Evaluation & Utilities](./evaluation-utilities.md)

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/