CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-llamaindex

Data framework for your LLM application

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

chat-engines.mddocs/

Chat Engines

Conversational interfaces that maintain context and enable back-and-forth interactions with your data in LlamaIndex.TS.

Import

import { VectorStoreIndex } from "llamaindex";
// Or from specific submodules
import { ContextChatEngine, SimpleChatEngine } from "llamaindex/engines";

Overview

Chat engines in LlamaIndex.TS provide conversational interfaces that can maintain context across multiple turns of conversation. Unlike query engines that handle single queries, chat engines are designed for interactive, multi-turn conversations while leveraging your indexed data.

Base Chat Engine Interface

All chat engines implement the base interface.

interface BaseChatEngine {
  chat(message: string, options?: ChatOptions): Promise<EngineResponse>;
  achat(message: string, options?: ChatOptions): AsyncIterable<EngineResponse>;
  reset(): void;
  chatHistory: ChatMessage[];
}

interface ChatOptions {
  stream?: boolean;
  chatHistory?: ChatMessage[];
}

interface ChatMessage {
  role: MessageType;
  content: string;
}

type MessageType = "system" | "user" | "assistant";

ContextChatEngine

A chat engine that uses retrieval to provide context-aware responses while maintaining conversation history.

class ContextChatEngine implements BaseChatEngine {
  constructor(args: {
    retriever: BaseRetriever;
    memory?: BaseMemory;
    systemPrompt?: string;
    nodePostprocessors?: BasePostprocessor[];
    contextRole?: string;
  });
  
  chat(message: string, options?: ChatOptions): Promise<EngineResponse>;
  achat(message: string, options?: ChatOptions): AsyncIterable<EngineResponse>;
  reset(): void;
  
  chatHistory: ChatMessage[];
  retriever: BaseRetriever;
  memory: BaseMemory; 
  systemPrompt?: string;
}

SimpleChatEngine

A basic chat engine that maintains conversation history without retrieval.

class SimpleChatEngine implements BaseChatEngine {
  constructor(args: {
    llm: LLM;
    memory?: BaseMemory;
    systemPrompt?: string;
  });
  
  chat(message: string, options?: ChatOptions): Promise<EngineResponse>;
  achat(message: string, options?: ChatOptions): AsyncIterable<EngineResponse>;
  reset(): void;
  
  chatHistory: ChatMessage[];
  llm: LLM;
  memory: BaseMemory;
}

CondenseQuestionChatEngine

A chat engine that condenses the conversation history and current question into a standalone question for better retrieval.

class CondenseQuestionChatEngine implements BaseChatEngine {
  constructor(args: {
    queryEngine: BaseQueryEngine;
    memory?: BaseMemory;
    systemPrompt?: string;
    condenseQuestionPrompt?: string;
  });
  
  chat(message: string, options?: ChatOptions): Promise<EngineResponse>;
  achat(message: string, options?: ChatOptions): AsyncIterable<EngineResponse>;
  reset(): void;
  
  chatHistory: ChatMessage[];
  queryEngine: BaseQueryEngine;
  memory: BaseMemory;
}

Memory System

BaseMemory Interface

Interface for chat memory implementations.

interface BaseMemory {
  get(initialTokenCount?: number): ChatMessage[];
  getAll(): ChatMessage[];
  put(message: ChatMessage): void;
  set(messages: ChatMessage[]): void;
  reset(): void;
}

ChatMemoryBuffer

Simple in-memory buffer for storing chat history.

class ChatMemoryBuffer implements BaseMemory {
  constructor(args?: {
    tokenLimit?: number;
    chatHistory?: ChatMessage[];
  });
  
  get(initialTokenCount?: number): ChatMessage[];
  getAll(): ChatMessage[];
  put(message: ChatMessage): void;
  set(messages: ChatMessage[]): void;
  reset(): void;
  
  tokenLimit?: number;
  chatHistory: ChatMessage[];
}

Basic Usage

Context-Aware Chat

import { VectorStoreIndex, Document } from "llamaindex";

// Create knowledge base
const documents = [
  new Document({ text: "LlamaIndex is a data framework for LLM applications." }),
  new Document({ text: "It supports various document types and vector stores." }),
  new Document({ text: "You can build chatbots and Q&A systems with it." }),
];

const index = await VectorStoreIndex.fromDocuments(documents);

// Create context chat engine
const chatEngine = index.asChatEngine({
  chatMode: "context", // Use context-aware chat
  systemPrompt: "You are a helpful assistant that answers questions about LlamaIndex.",
});

// Start conversation
const response1 = await chatEngine.chat("What is LlamaIndex?");
console.log("Assistant:", response1.toString());

// Continue conversation with context
const response2 = await chatEngine.chat("What can I build with it?");
console.log("Assistant:", response2.toString());

// Check conversation history
console.log("Chat history:", chatEngine.chatHistory);

Simple Chat Without Retrieval

import { SimpleChatEngine, OpenAI } from "llamaindex";

// Create simple chat engine
const simpleChatEngine = new SimpleChatEngine({
  llm: new OpenAI({ model: "gpt-3.5-turbo" }),
  systemPrompt: "You are a helpful assistant.",
});

// Have a conversation
const response = await simpleChatEngine.chat("Hello! How are you?");
console.log("Response:", response.toString());

Streaming Chat

// Enable streaming for real-time responses
const response = await chatEngine.chat("Explain vector databases", { 
  stream: true 
});

// For streaming, use achat
for await (const chunk of chatEngine.achat("Tell me about embeddings")) {
  process.stdout.write(chunk.response);
}

Advanced Usage

Custom Memory Configuration

import { ContextChatEngine, ChatMemoryBuffer } from "llamaindex";

// Create chat engine with custom memory
const customMemory = new ChatMemoryBuffer({
  tokenLimit: 4000, // Limit context window
  chatHistory: [
    { role: "system", content: "You are an expert on AI and machine learning." }
  ],
});

const chatEngine = new ContextChatEngine({
  retriever: index.asRetriever(),
  memory: customMemory,
  systemPrompt: "Answer questions about AI using the provided context.",
});

Condense Question Chat Engine

import { CondenseQuestionChatEngine } from "llamaindex/engines";

// Create condense question chat engine for better multi-turn conversations
const condenseEngine = new CondenseQuestionChatEngine({
  queryEngine: index.asQueryEngine(),
  condenseQuestionPrompt: `
    Given the conversation history and a follow-up question, 
    rephrase the follow-up question to be a standalone question.
    
    Chat History: {chat_history}
    Follow-up Input: {question}
    Standalone Question:
  `,
});

// Multi-turn conversation
await condenseEngine.chat("What is machine learning?");
await condenseEngine.chat("How does it differ from deep learning?"); // Will be condensed to standalone question

Custom System Prompts

const chatEngine = index.asChatEngine({
  chatMode: "context",
  systemPrompt: `
    You are an expert technical documentation assistant. 
    
    Guidelines:
    - Always provide accurate, technical information
    - Include code examples when relevant
    - Cite your sources when using retrieved context
    - If you don't know something, say so clearly
    - Keep responses concise but comprehensive
  `,
});

Conversation Management

Managing Chat History

// Access full conversation history
const history = chatEngine.chatHistory;
console.log("Conversation turns:", history.length);

// Filter by role
const userMessages = history.filter(msg => msg.role === "user");
const assistantMessages = history.filter(msg => msg.role === "assistant");

// Reset conversation
chatEngine.reset();
console.log("History after reset:", chatEngine.chatHistory.length); // 0

Conversation Persistence

// Save conversation to storage
const saveConversation = (chatEngine: BaseChatEngine, filename: string) => {
  const conversation = {
    history: chatEngine.chatHistory,
    timestamp: new Date().toISOString(),
  };
  
  // Save to file or database
  // fs.writeFileSync(filename, JSON.stringify(conversation, null, 2));
};

// Load conversation from storage
const loadConversation = (chatEngine: BaseChatEngine, conversationData: any) => {
  chatEngine.chatHistory = conversationData.history;
};

Context Window Management

import { ChatMemoryBuffer } from "llamaindex";

// Create memory with token limit to manage context window
const limitedMemory = new ChatMemoryBuffer({
  tokenLimit: 3000, // Adjust based on your model's context window
});

const chatEngine = new ContextChatEngine({
  retriever: index.asRetriever(),
  memory: limitedMemory,
});

// The memory will automatically truncate old messages when limit is reached

Integration with Agents

Chat Engine as Agent Tool

import { QueryEngineTool, ReActAgent } from "llamaindex";

// Convert chat engine to tool for use with agents
const chatTool = new QueryEngineTool({
  queryEngine: chatEngine, // Chat engines implement BaseQueryEngine
  metadata: {
    name: "knowledge_chat",
    description: "Have a conversation about the knowledge base",
  },
});

// Use with agent
const agent = new ReActAgent({
  tools: [chatTool],
  llm: /* your LLM */,
});

Multi-Modal Chat

Image and Text Chat

// For multi-modal conversations (requires compatible LLM)
const multiModalResponse = await chatEngine.chat("What's in this image?", {
  chatHistory: [
    {
      role: "user",
      content: [
        { type: "text", text: "Analyze this image:" },
        { type: "image_url", image_url: { url: "data:image/jpeg;base64,..." } }
      ]
    }
  ]
});

Performance Optimization

Async Chat Processing

// Handle multiple chat sessions concurrently
const handleMultipleChats = async (sessions: Array<{chatEngine: BaseChatEngine, message: string}>) => {
  const responses = await Promise.all(
    sessions.map(session => session.chatEngine.chat(session.message))
  );
  
  return responses;
};

Chat Response Caching

// Simple response caching for common questions
class CachedChatEngine {
  private cache = new Map<string, EngineResponse>();
  
  constructor(private chatEngine: BaseChatEngine) {}
  
  async chat(message: string): Promise<EngineResponse> {
    const cacheKey = message.toLowerCase().trim();
    
    if (this.cache.has(cacheKey)) {
      return this.cache.get(cacheKey)!;
    }
    
    const response = await this.chatEngine.chat(message);
    this.cache.set(cacheKey, response);
    return response;
  }
}

Error Handling

Robust Chat Implementation

const safeChat = async (chatEngine: BaseChatEngine, message: string): Promise<EngineResponse | null> => {
  try {
    // Validate input
    if (!message || message.trim().length === 0) {
      console.warn("Empty message provided");
      return null;
    }
    
    const response = await chatEngine.chat(message);
    
    // Validate response
    if (!response.response || response.response.trim().length === 0) {
      console.warn("Empty response received");
      return null;
    }
    
    return response;
  } catch (error) {
    console.error("Chat error:", error);
    
    // Handle specific errors
    if (error.message.includes("context window")) {
      console.error("Context window exceeded - consider resetting conversation");
      chatEngine.reset();
    }
    
    return null;
  }
};

Best Practices

Chat Engine Selection

// Choose the right chat engine for your use case
const createChatEngine = (useCase: string, index: VectorStoreIndex) => {
  switch (useCase) {
    case "simple":
      // Basic conversation without knowledge base
      return new SimpleChatEngine({
        llm: /* your LLM */,
      });
      
    case "knowledge":
      // Conversations with knowledge base access
      return index.asChatEngine({ chatMode: "context" });
      
    case "complex":
      // Multi-turn conversations with better context handling
      return new CondenseQuestionChatEngine({
        queryEngine: index.asQueryEngine(),
      });
      
    default:
      return index.asChatEngine();
  }
};

Conversation Quality

// Configure for high-quality conversations
const highQualityChatEngine = new ContextChatEngine({
  retriever: index.asRetriever({
    similarityTopK: 3, // Focused context
  }),
  memory: new ChatMemoryBuffer({
    tokenLimit: 4000, // Manage context window
  }),
  systemPrompt: `
    You are a knowledgeable assistant. Use the provided context to give accurate answers.
    If the context doesn't contain relevant information, say so clearly.
    Always be helpful and conversational while staying factual.
  `,
});

Monitoring Chat Sessions

// Add logging and monitoring
const monitoredChat = async (chatEngine: BaseChatEngine, message: string) => {
  const startTime = Date.now();
  
  try {
    const response = await chatEngine.chat(message);
    const duration = Date.now() - startTime;
    
    console.log({
      timestamp: new Date().toISOString(),
      message: message.substring(0, 100),
      responseLength: response.response.length,
      sourceCount: response.sourceNodes?.length || 0,
      duration: `${duration}ms`,
      historyLength: chatEngine.chatHistory.length,
    });
    
    return response;
  } catch (error) {
    console.error({
      timestamp: new Date().toISOString(),
      message: message.substring(0, 100),
      error: error.message,
      duration: `${Date.now() - startTime}ms`,
    });
    throw error;
  }
};

docs

chat-engines.md

document-processing.md

embeddings.md

index.md

llm-integration.md

query-engines.md

response-synthesis.md

settings.md

storage.md

tools.md

vector-indexing.md

tile.json