or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

agents.mdcore-schema.mddocument-stores.mdevaluation-utilities.mdfile-processing.mdgenerators.mdindex.mdpipelines.mdreaders.mdretrievers.md

index.mddocs/

0

# Haystack

1

2

Haystack is a comprehensive end-to-end NLP framework that enables developers to build sophisticated applications powered by Large Language Models (LLMs), Transformer models, and vector search capabilities. The framework provides a modular architecture based on Pipelines that connect various Nodes (preprocessing, retrieval, language model components) to perform complex NLP tasks such as retrieval-augmented generation (RAG), question answering, semantic document search, and answer generation.

3

4

## Package Information

5

6

- **Package Name**: farm-haystack

7

- **Language**: Python

8

- **Installation**: `pip install farm-haystack`

9

- **Python Support**: 3.8+

10

- **License**: Apache-2.0

11

12

## Core Imports

13

14

```python

15

import haystack

16

from haystack import Document, Answer, Label, MultiLabel, Span, EvaluationResult, TableCell, Pipeline, hash128

17

from haystack.nodes.base import BaseComponent

18

```

19

20

Common imports for building pipelines:

21

22

```python

23

from haystack.document_stores import InMemoryDocumentStore, ElasticsearchDocumentStore, FAISSDocumentStore

24

from haystack.nodes import BM25Retriever, EmbeddingRetriever, FARMReader, TransformersReader

25

from haystack.pipelines import ExtractiveQAPipeline, DocumentSearchPipeline

26

```

27

28

## Basic Usage

29

30

```python

31

from haystack import Document, Pipeline

32

from haystack.document_stores import InMemoryDocumentStore

33

from haystack.nodes import BM25Retriever, FARMReader

34

from haystack.pipelines import ExtractiveQAPipeline

35

36

# Create documents

37

docs = [

38

Document(content="Paris is the capital of France."),

39

Document(content="Berlin is the capital of Germany."),

40

Document(content="Madrid is the capital of Spain.")

41

]

42

43

# Initialize document store and add documents

44

document_store = InMemoryDocumentStore()

45

document_store.write_documents(docs)

46

47

# Create retriever and reader components

48

retriever = BM25Retriever(document_store=document_store)

49

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")

50

51

# Build pipeline

52

pipeline = ExtractiveQAPipeline(reader=reader, retriever=retriever)

53

54

# Ask a question

55

result = pipeline.run(query="What is the capital of France?")

56

print(result["answers"][0].answer) # "Paris"

57

```

58

59

## Architecture

60

61

Haystack follows a modular component-based architecture with three core concepts:

62

63

- **Components (Nodes)**: Modular processing units that perform specific tasks (retrieval, reading, generation, preprocessing)

64

- **Document Stores**: Backend storage systems for documents and embeddings (Elasticsearch, FAISS, Pinecone, etc.)

65

- **Pipelines**: Orchestration layer that connects components in directed graphs to solve complex NLP tasks

66

67

The framework supports both **Retrieval-Augmented Generation (RAG)** workflows and **Agent-based** interactive systems, making it suitable for production-grade applications requiring sophisticated natural language processing capabilities.

68

69

## Capabilities

70

71

### Core Schema & Data Structures

72

73

Fundamental data classes for documents, answers, labels, and evaluation results that form the foundation of all Haystack operations.

74

75

```python { .api }

76

class Document:

77

def __init__(self, content: Union[str, DataFrame], content_type: str = "text",

78

meta: Dict[str, Any] = None, id: Optional[str] = None): ...

79

80

class Answer:

81

def __init__(self, answer: str, type: str = "extractive",

82

score: Optional[float] = None, context: Optional[str] = None): ...

83

84

class Label:

85

def __init__(self, query: str, answer: Answer, is_correct_answer: bool = True,

86

is_correct_document: bool = True): ...

87

```

88

89

[Core Schema](./core-schema.md)

90

91

### Document Stores

92

93

Backend storage systems supporting vector and keyword search across multiple databases including Elasticsearch, FAISS, Pinecone, Weaviate, and others.

94

95

```python { .api }

96

class BaseDocumentStore:

97

def write_documents(self, documents: List[Document]): ...

98

def get_all_documents(self) -> List[Document]: ...

99

def query(self, query: str, top_k: int = 10) -> List[Document]: ...

100

101

class ElasticsearchDocumentStore(BaseDocumentStore): ...

102

class FAISSDocumentStore(BaseDocumentStore): ...

103

class PineconeDocumentStore(BaseDocumentStore): ...

104

```

105

106

[Document Stores](./document-stores.md)

107

108

### Retriever Components

109

110

Dense and sparse retrieval components for finding relevant documents using embeddings, BM25, TF-IDF, and specialized retrieval methods.

111

112

```python { .api }

113

class BM25Retriever(BaseRetriever):

114

def __init__(self, document_store: BaseDocumentStore): ...

115

def retrieve(self, query: str, top_k: int = 10) -> List[Document]: ...

116

117

class EmbeddingRetriever(BaseRetriever):

118

def __init__(self, document_store: BaseDocumentStore,

119

embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2"): ...

120

```

121

122

[Retriever Components](./retrievers.md)

123

124

### Reader Components

125

126

Reading comprehension components for extractive question answering using FARM, Transformers, and specialized table readers.

127

128

```python { .api }

129

class FARMReader(BaseReader):

130

def __init__(self, model_name_or_path: str = "deepset/roberta-base-squad2"): ...

131

def predict(self, query: str, documents: List[Document]) -> List[Answer]: ...

132

133

class TransformersReader(BaseReader):

134

def __init__(self, model_name_or_path: str = "deepset/roberta-base-squad2"): ...

135

```

136

137

[Reader Components](./readers.md)

138

139

### Generator Components

140

141

Language model components for text generation using OpenAI, Transformers, and other LLM providers for generative QA and text synthesis.

142

143

```python { .api }

144

class OpenAIAnswerGenerator(BaseGenerator):

145

def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"): ...

146

def predict(self, query: str, documents: List[Document]) -> List[Answer]: ...

147

148

class OpenAIChatGenerator(BaseGenerator):

149

def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"): ...

150

```

151

152

[Generator Components](./generators.md)

153

154

### Pipeline System

155

156

Pre-built and custom pipeline templates for orchestrating component workflows including QA, search, generation, and indexing pipelines.

157

158

```python { .api }

159

class Pipeline:

160

def __init__(self): ...

161

def add_node(self, component: BaseComponent, name: str, inputs: List[str]): ...

162

def run(self, **kwargs): ...

163

164

class ExtractiveQAPipeline(Pipeline): ...

165

class GenerativeQAPipeline(Pipeline): ...

166

class DocumentSearchPipeline(Pipeline): ...

167

```

168

169

[Pipeline System](./pipelines.md)

170

171

### Agent System

172

173

Interactive LLM agents with tool usage, memory management, and conversational capabilities for complex reasoning tasks.

174

175

```python { .api }

176

class Agent:

177

def __init__(self, prompt_node: PromptNode, memory: Optional[BaseMemory] = None): ...

178

def run(self, query: str) -> AgentStep: ...

179

180

class ConversationalAgent(Agent): ...

181

class Tool:

182

def __init__(self, name: str, pipeline_or_node: Union[BaseComponent, Pipeline]): ...

183

```

184

185

[Agent System](./agents.md)

186

187

### File Processing

188

189

Document converters and preprocessors for handling PDF, DOCX, HTML, images, and other file formats with text extraction and cleaning.

190

191

```python { .api }

192

class BaseConverter:

193

def convert(self, file_path: Path, **kwargs) -> List[Document]: ...

194

195

class PDFToTextConverter(BaseConverter): ...

196

class DocxToTextConverter(BaseConverter): ...

197

class PreProcessor(BaseComponent):

198

def process(self, documents: List[Document]) -> List[Document]: ...

199

```

200

201

[File Processing](./file-processing.md)

202

203

### Evaluation & Utilities

204

205

Evaluation metrics, model evaluation tools, and utility functions for assessing pipeline performance and data processing.

206

207

```python { .api }

208

def eval_pipeline(pipeline: Pipeline, eval_labels: List[Label]) -> EvaluationResult: ...

209

210

class EvaluationResult:

211

def __init__(self): ...

212

def calculate_metrics(self) -> Dict[str, float]: ...

213

```

214

215

[Evaluation & Utilities](./evaluation-utilities.md)