name:: add-rag
description:: Add Retrieval-Augmented Generation to a Koog 1.0 agent — pick an embedding source (LLM-backed or local), index documents into a vector store, and query the store inside the agent's prompt pipeline or as a tool. Use when the user asks to "add RAG", "embed and search documents", "use a vector store", "build retrieval-augmented generation", or describes grounding the LLM in a corpus.

Add RAG Skill

Name: jbaruch/koog
Rating: 86.26 (1 reviews)
Author: jbaruch

Process steps in order. Do not skip ahead.

Step 1 — Add the Modules

RAG in Koog spans three concerns — each is its own module:

implementation("ai.koog:embeddings-base:1.0.0")
implementation("ai.koog:embeddings-llm:1.0.0")     // LLM-backed embeddings (OpenAI/Google/etc.)
implementation("ai.koog:rag-base:1.0.0")
implementation("ai.koog:rag-vector:1.0.0")         // vector store + similarity search

Reach past these only for backend-specific vector stores (Pinecone, Qdrant, etc.) — those usually ship as separate rag-vector-<backend> artifacts when supported.

Proceed immediately to Step 2.

Step 2 — Build the Embedder

LLM-backed embeddings via embeddings-llm use the same provider you'd use for completions:

import ai.koog.embeddings.llm.OpenAIEmbedder
import ai.koog.prompt.executor.clients.openai.OpenAIModels

val embedder = OpenAIEmbedder(
    apiKey = System.getenv("OPENAI_API_KEY"),
    model = OpenAIModels.Embedding.TextEmbedding3Small,
)

Cheaper local embedders exist for self-hosted setups — check the embeddings-* module list for what 1.0 ships beyond LLM-backed (Ollama-based embedders, in-process models).

Proceed immediately to Step 3.

Step 3 — Build the Vector Store

import ai.koog.rag.vector.InMemoryVectorStore

val store = InMemoryVectorStore(embedder = embedder)

// Index documents
documents.forEach { doc ->
    store.add(id = doc.id, text = doc.text, metadata = mapOf("source" to doc.path))
}

InMemoryVectorStore is fine for demos and small corpora. For production, swap in a persistent backend — JDBC, Pinecone, Qdrant, Weaviate — via the corresponding rag-vector-* module if shipped, or implement the VectorStore interface against your store of choice.

Proceed immediately to Step 4.

Step 4 — Wire Retrieval into the Agent

Two integration patterns:

(a) RAG as a tool — the LLM decides when to retrieve. Wrap the store in a tool:

@LLMDescription("Search the documentation corpus")
class DocsSearchTool(private val store: VectorStore) : ToolSet {
    @Tool
    @LLMDescription("Search for documents relevant to the query and return the top matches")
    suspend fun search(@LLMDescription("Search query") query: String): String {
        val results = store.search(query, topK = 5)
        return results.joinToString("\n---\n") { "${it.metadata["source"]}: ${it.text}" }
    }
}

(b) RAG as a prompt augmenter — retrieval happens automatically before every LLM call. Use a UserPromptAugmenter (see define-prompt) that calls store.search(currentInput) and appends results to the user message. Use this pattern when retrieval is always-on for every input.

Proceed immediately to Step 5.

Step 5 — Tune Retrieval

topK — number of documents returned. Default 5 is reasonable; raise for broader context, lower for tighter prompts
Chunking — index full documents or split into chunks? Chunked indexing produces better recall but loses cross-chunk context. The right answer depends on document length and query shape
Reranking — if recall is high but precision is low, add a reranker between store.search and the LLM prompt. Koog doesn't ship a built-in reranker; call a cross-encoder model or a second LLM with a scoring prompt

Finish here.

evals

rules

skills

add-observability

add-persistence

add-rag

SKILL.md

add-structured-output

add-token-budgeting

add-tool

author-strategy

cache-llm-calls

define-prompt

domain-model-subtask-pipeline

enable-prompt-caching

handle-agent-events

manage-state

migrate-from-0-x

model-planner-subtasks

persist-chat-history

query-sql-from-agent

scaffold-agent

snapshot-and-restore

test-koog-agents

trace-agent-internals

use-attachments

use-functional-agent

use-llm-node-variants

use-planner

wire-a2a

wire-acp-server

wire-ktor-server

wire-mcp-server

wire-spring-boot

README.md

tile.json

jbaruch/koog

SKILL.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}skills/add-rag/