Zero-configuration RAG package that bundles document parsing, embedding, and splitting for easy Retrieval-Augmented Generation in Java applications
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Dependencies, limitations, and external resources for easy-rag.
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-easy-rag</artifactId>
<version>1.11.0-beta19</version>
</dependency>Package Identifier: pkg:maven/dev.langchain4j/langchain4j-easy-rag@1.11.0
License: Apache-2.0
Language: Java
Minimum Java Version: Java 8+
Easy-rag automatically includes these dependencies:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>1.11.0</version>
</dependency>Provides core interfaces and orchestration:
EmbeddingStoreIngestorEmbeddingStore interfaceDocument, TextSegment, Embedding types<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-document-parser-apache-tika</artifactId>
<version>1.11.0-beta19</version>
</dependency>Tika Version: 3.2.3
Provides document parsing for 200+ formats:
ApacheTikaDocumentParserApacheTikaDocumentParserFactory (SPI)Tika Sub-dependencies:
tika-core - Core parsing frameworktika-parsers-standard-package - Format-specific parsers<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-embeddings-bge-small-en-v15-q</artifactId>
<version>1.11.0-beta19</version>
</dependency>Provides in-process embedding model:
BgeSmallEnV15QuantizedEmbeddingModelBgeSmallEnV15QuantizedEmbeddingModelFactory (SPI)Total Dependency Size: Approximately 80-100MB including all transitive dependencies
Via Apache Tika, easy-rag supports 200+ formats including:
Format Detection: Automatic based on content (not just file extension)
from() factory)Fixed Chunk Size:
Single Splitter Strategy:
No Incremental Loading:
English-Only Optimization:
Fixed Dimensions:
CPU-Bound Execution:
In-Process Only:
Quantization Tradeoffs:
No Built-in Vector Database:
InMemoryEmbeddingStore providedLinear Search:
InMemoryEmbeddingStore uses brute-force similarity searchSerialization Format:
Single Implementation Required:
DocumentSplitterFactory on classpathNo Priority/Ordering:
Embedding Speed:
Memory Usage:
No Caching:
No Reranking:
No Query Expansion:
Context Boundaries:
✅ Development and prototyping ✅ Learning RAG concepts ✅ Small to medium datasets (< 50k embeddings) ✅ English-language content ✅ Standard document formats ✅ Privacy-sensitive applications (on-premise processing) ✅ No external API dependencies ✅ Simple single-server deployments
❌ Large scale (>100k embeddings) → Use vector database (Pinecone, Weaviate, Qdrant)
❌ High performance requirements → Use API-based embedding models (OpenAI, Cohere)
❌ Multilingual content → Use multilingual embedding models
❌ Specialized domains → Use domain-specific embedding models
❌ Production SLAs → Consider managed services with guarantees
❌ Distributed systems → Use scalable vector databases
❌ Real-time updates → Use databases with update/delete capabilities
❌ Advanced retrieval → Add reranking, query expansion, hybrid search
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-open-ai</artifactId>
<version>1.11.0</version>
</dependency>Models:
text-embedding-3-small - 1536 dimensions, fast, cost-effectivetext-embedding-3-large - 3072 dimensions, highest qualitytext-embedding-ada-002 - 1536 dimensions, previous generationAdvantages: High quality, multilingual, fast API Tradeoffs: Requires API key, usage costs, network dependency
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-cohere</artifactId>
<version>1.11.0</version>
</dependency>Models:
embed-english-v3.0 - English, retrieval-optimizedembed-multilingual-v3.0 - 100+ languagesAdvantages: Retrieval-optimized, compression options Tradeoffs: Requires API key, usage costs
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-azure-open-ai</artifactId>
<version>1.11.0</version>
</dependency>Advantages: Enterprise SLAs, data residency, private endpoints Tradeoffs: Azure setup required, same API costs
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-ollama</artifactId>
<version>1.11.0</version>
</dependency>Models: Various open-source models (nomic-embed-text, mxbai-embed-large)
Advantages: Local execution, no API costs, privacy Tradeoffs: Requires Ollama setup, variable quality
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-pinecone</artifactId>
<version>1.11.0</version>
</dependency>Type: Fully managed cloud vector database
Advantages: Scalable, fast, managed, enterprise features Tradeoffs: Requires account, usage costs
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-weaviate</artifactId>
<version>1.11.0</version>
</dependency>Type: Open-source vector database (self-hosted or cloud)
Advantages: Open source, GraphQL API, hybrid search Tradeoffs: Requires setup, operational overhead (if self-hosted)
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-qdrant</artifactId>
<version>1.11.0</version>
</dependency>Type: Open-source vector database (self-hosted or cloud)
Advantages: Fast, efficient, easy to deploy Tradeoffs: Operational overhead (if self-hosted)
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-milvus</artifactId>
<version>1.11.0</version>
</dependency>Type: Open-source vector database for large-scale
Advantages: Very scalable, GPU support, distributed Tradeoffs: Complex setup, operational overhead
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-redis</artifactId>
<version>1.11.0</version>
</dependency>Type: Redis with vector search (RediSearch module)
Advantages: Familiar Redis infrastructure, fast Tradeoffs: Requires Redis Stack or Cloud
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-elasticsearch</artifactId>
<version>1.11.0</version>
</dependency>Type: Elasticsearch with vector search
Advantages: Existing Elasticsearch infrastructure, hybrid search Tradeoffs: Requires Elasticsearch 8.0+
1.11.0-beta19:
Future versions:
Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-easy-rag@1.11.0