In-process all-minilm-l6-v2 (quantized) embedding model
A quantized version of the SentenceTransformers all-MiniLM-L6-v2 embedding model that runs directly within Java applications without requiring external services. This package generates 384-dimensional embeddings for text using ONNX Runtime, with the quantized model providing efficient in-process execution suitable for semantic search, similarity matching, RAG (Retrieval-Augmented Generation) applications, and other NLP tasks.
pom.xml:<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-embeddings-all-minilm-l6-v2-q</artifactId>
<version>1.11.0</version>
</dependency>Or for Gradle:
implementation 'dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2-q:1.11.0'import dev.langchain4j.model.embedding.onnx.allminilml6v2q.AllMiniLmL6V2QuantizedEmbeddingModel;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.model.output.TokenUsage;
import dev.langchain4j.model.output.FinishReason;import dev.langchain4j.model.embedding.onnx.allminilml6v2q.AllMiniLmL6V2QuantizedEmbeddingModel;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
// Create the embedding model with default settings
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel();
// Embed a single text string
Response<Embedding> response = model.embed("Hello, world!");
Embedding embedding = response.content();
// Access the vector
float[] vector = embedding.vector();
int dimension = embedding.dimension(); // Returns 384
// Get embedding dimension without generating embeddings
int dim = model.dimension(); // Returns 384all-minilm-l6-v2-q.onnx (loaded from classpath)all-minilm-l6-v2-q-tokenizer.json (loaded from classpath)This package has the following key dependencies that are automatically included:
Note: This package bundles the ONNX model files within the JAR. No additional model downloads are required at runtime.
Potential Conflicts:
Create embedding model instances with default or custom executor settings.
// Default constructor - uses cached thread pool with threads = available processors
public AllMiniLmL6V2QuantizedEmbeddingModel()
// Constructor with custom executor for parallel processing control
public AllMiniLmL6V2QuantizedEmbeddingModel(java.util.concurrent.Executor executor)Parameters:
executor (Executor): Custom executor for parallelizing the embedding process. Must not be null.Throws:
NullPointerException: If executor is null (when using second constructor)Default Executor Behavior:
Usage Examples:
Default instantiation:
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel();Custom executor for controlled parallelization:
import java.util.concurrent.Executors;
import java.util.concurrent.Executor;
Executor customExecutor = Executors.newFixedThreadPool(4);
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel(customExecutor);Null Handling:
Resource Management:
Embed a single text string or TextSegment to generate a 384-dimensional vector representation.
// Embed a plain string
Response<Embedding> embed(String text)
// Embed a TextSegment (text with metadata)
Response<Embedding> embed(TextSegment textSegment)Parameters:
text (String): The text to embed. Can be null, empty, or any length.textSegment (TextSegment): A text segment containing text and optional metadata. Can contain null text.Returns: Response<Embedding> containing:
content(): The generated Embedding (never null)tokenUsage(): Token usage statistics (input tokens only, excludes special tokens [CLS] and [SEP])finishReason(): Always null for embedding modelsmetadata(): Empty map for this modelNull Handling:
null text: Treated as empty string, produces valid embeddingnull TextSegment: May throw NullPointerExceptionEdge Cases:
Performance:
Usage Examples:
Embedding a string:
Response<Embedding> response = model.embed("The quick brown fox jumps over the lazy dog");
Embedding embedding = response.content();
float[] vector = embedding.vector(); // 384-dimensional float arrayEmbedding a TextSegment:
import dev.langchain4j.data.segment.TextSegment;
TextSegment segment = TextSegment.from("Machine learning is fascinating");
Response<Embedding> response = model.embed(segment);
Embedding embedding = response.content();Handling null or empty text:
// Empty text
Response<Embedding> response1 = model.embed("");
Embedding emb1 = response1.content(); // Valid embedding
// Null text treated as empty
Response<Embedding> response2 = model.embed((String) null);
Embedding emb2 = response2.content(); // Valid embeddingEmbed multiple text segments in a single call with automatic parallel processing for efficiency.
// Embed multiple text segments
Response<java.util.List<Embedding>> embedAll(java.util.List<TextSegment> textSegments)Parameters:
textSegments (List<TextSegment>): List of text segments to embed. Must not be null or empty.Returns: Response<List<Embedding>> containing:
content(): List of Embedding objects, one per input segment in the same order (never null)tokenUsage(): Aggregated token usage across all segments (input tokens only, excludes special tokens)finishReason(): Always null for embedding modelsmetadata(): Empty map for this modelThrows:
IllegalArgumentException: If textSegments is null or emptyNullPointerException: If textSegments list contains null elementsBehavior:
Null Handling:
null list: Throws exceptionnull elements in list: Throws exceptionEdge Cases:
Performance:
Usage Examples:
import dev.langchain4j.data.segment.TextSegment;
import java.util.List;
import java.util.Arrays;
List<TextSegment> segments = Arrays.asList(
TextSegment.from("First document about artificial intelligence"),
TextSegment.from("Second document about machine learning"),
TextSegment.from("Third document about deep learning")
);
Response<List<Embedding>> response = model.embedAll(segments);
List<Embedding> embeddings = response.content(); // 3 embeddings
// Access individual embeddings
Embedding firstEmbedding = embeddings.get(0);
Embedding secondEmbedding = embeddings.get(1);
// Check token usage
Integer inputTokens = response.tokenUsage().inputTokenCount();Handling errors:
try {
List<TextSegment> segments = Arrays.asList(/* ... */);
Response<List<Embedding>> response = model.embedAll(segments);
// Process embeddings
} catch (IllegalArgumentException e) {
// Handle null or empty list
System.err.println("Invalid input: " + e.getMessage());
}Large batch processing with memory management:
import java.util.List;
import java.util.ArrayList;
List<TextSegment> allSegments = /* large list */;
int batchSize = 50;
List<Embedding> allEmbeddings = new ArrayList<>();
// Process in batches to manage memory
for (int i = 0; i < allSegments.size(); i += batchSize) {
int end = Math.min(i + batchSize, allSegments.size());
List<TextSegment> batch = allSegments.subList(i, end);
Response<List<Embedding>> response = model.embedAll(batch);
allEmbeddings.addAll(response.content());
}Get the dimension of embeddings produced by this model without generating embeddings.
// Returns the embedding dimension
int dimension()Returns: int - The embedding dimension (always 384 for this model)
Usage Example:
int dim = model.dimension(); // Returns 384Use Cases:
Get the name identifier of the underlying embedding model.
// Returns the model name
String modelName()Returns: String - The model name or "unknown" if not specified by the implementation
Usage Example:
String name = model.modelName();Note: The returned name is implementation-specific and may be "unknown" for this model.
Wrap the embedding model with listeners to observe and monitor embedding operations.
// Add a single listener
EmbeddingModel addListener(dev.langchain4j.model.embedding.listener.EmbeddingModelListener listener)
// Add multiple listeners
EmbeddingModel addListeners(java.util.List<dev.langchain4j.model.embedding.listener.EmbeddingModelListener> listeners)Parameters:
listener (EmbeddingModelListener): A listener to observe embedding operations. If null, returns the model unchanged.listeners (List<EmbeddingModelListener>): List of listeners to observe embedding operations. Called in iteration order. If null or empty, returns the model unchanged.Returns: EmbeddingModel - An observing embedding model that dispatches events to the provided listener(s)
Null Handling:
null listener: Returns original model unchanged (no-op)null or empty listeners list: Returns original model unchanged (no-op)null elements in listeners list: Skipped during event dispatchAnnotation: @Experimental (since v1.11.0)
Behavior:
Usage Example:
import dev.langchain4j.model.embedding.listener.EmbeddingModelListener;
import dev.langchain4j.model.embedding.listener.EmbeddingModelRequestContext;
import dev.langchain4j.model.embedding.listener.EmbeddingModelResponseContext;
import dev.langchain4j.model.embedding.listener.EmbeddingModelErrorContext;
// Add a listener to monitor embedding operations
EmbeddingModel observedModel = model.addListener(new EmbeddingModelListener() {
@Override
public void onRequest(EmbeddingModelRequestContext ctx) {
System.out.println("Embedding " + ctx.textSegments().size() + " segments");
// Store start time in attributes for performance tracking
ctx.attributes().put("startTime", System.currentTimeMillis());
}
@Override
public void onResponse(EmbeddingModelResponseContext ctx) {
long startTime = (Long) ctx.attributes().get("startTime");
long duration = System.currentTimeMillis() - startTime;
System.out.println("Completed in " + duration + "ms");
}
@Override
public void onError(EmbeddingModelErrorContext ctx) {
System.err.println("Error: " + ctx.error().getMessage());
}
});
// Use the observed model
Response<Embedding> response = observedModel.embed("test");Factory class for creating model instances via the SPI (Service Provider Interface) mechanism.
public class AllMiniLmL6V2QuantizedEmbeddingModelFactory implements dev.langchain4j.spi.model.embedding.EmbeddingModelFactory
// Create a new model instance with default settings
public EmbeddingModel create()Package: dev.langchain4j.model.embedding.onnx.allminilml6v2q
Returns: EmbeddingModel - A new AllMiniLmL6V2QuantizedEmbeddingModel instance with default settings (default executor)
Usage: Typically used by frameworks and service loaders rather than direct instantiation.
Example:
import dev.langchain4j.spi.model.embedding.EmbeddingModelFactory;
import java.util.ServiceLoader;
// Load via SPI
ServiceLoader<EmbeddingModelFactory> loader = ServiceLoader.load(EmbeddingModelFactory.class);
for (EmbeddingModelFactory factory : loader) {
if (factory instanceof AllMiniLmL6V2QuantizedEmbeddingModelFactory) {
EmbeddingModel model = factory.create();
break;
}
}Represents the reason why a model call finished.
public enum FinishReason {
// The model call finished because the model decided the request was done
STOP,
// The call finished because the token length was reached
LENGTH,
// The call finished signalling a need for tool execution
TOOL_EXECUTION,
// The call finished signalling a need for content filtering
CONTENT_FILTER,
// The call finished for some other reason
OTHER
}Package: dev.langchain4j.model.output
Note: For embedding models, the finish reason is always null.
Represents a dense vector embedding of text.
public class Embedding {
// Constructor
public Embedding(float[] vector)
// Get the vector array
public float[] vector()
// Get vector as a list
public java.util.List<Float> vectorAsList()
// Get embedding dimension
public int dimension()
// Normalize the vector in-place
public void normalize()
// Factory methods
public static Embedding from(float[] vector)
public static Embedding from(java.util.List<Float> vector)
}Package: dev.langchain4j.data.embedding
Key Methods:
vector(): Returns the raw float array representing the embedding. The returned array is the internal array (not a copy), so modifications will affect the embedding.vectorAsList(): Returns a copy of the vector as a List<Float>. This is a defensive copy, so modifications won't affect the embedding.dimension(): Returns the length of the vector (384 for this model)normalize(): Normalizes the vector to unit length (magnitude = 1.0) in-place. This model already produces normalized vectors, so calling this is typically unnecessary.from(float[] vector): Static factory method to create an Embedding from a float array. The array is stored directly (not copied).from(List<Float> vector): Static factory method to create an Embedding from a list. The list is converted to a float array.Null Handling:
from() methods with null: Throw NullPointerExceptionImportant Notes:
vector() is mutable; avoid modifying it unless you intend to change the embeddingnormalize() is unnecessaryvectorAsList() which returns a defensive copyRepresents metadata associated with a Document or TextSegment as key-value pairs.
public class Metadata {
// Constructors
public Metadata()
public Metadata(java.util.Map<String, ?> metadata)
// Getter methods for typed access
public String getString(String key)
public java.util.UUID getUUID(String key)
public Integer getInteger(String key)
public Long getLong(String key)
public Float getFloat(String key)
public Double getDouble(String key)
// Check for key existence
public boolean containsKey(String key)
// Add key-value pairs (fluent API)
public Metadata put(String key, String value)
public Metadata put(String key, java.util.UUID value)
public Metadata put(String key, int value)
public Metadata put(String key, long value)
public Metadata put(String key, float value)
public Metadata put(String key, double value)
public Metadata putAll(java.util.Map<String, Object> metadata)
// Remove a key
public Metadata remove(String key)
// Copy and convert
public Metadata copy()
public java.util.Map<String, Object> toMap()
// Merge with another Metadata object
public Metadata merge(Metadata another)
// Factory methods
public static Metadata from(String key, String value)
public static Metadata from(java.util.Map<String, ?> metadata)
public static Metadata metadata(String key, String value)
}Package: dev.langchain4j.data.document
Supported Value Types: String, UUID, Integer, Long, Float, Double
Key Methods:
getString(String key), getInteger(String key), etc.: Returns typed values. Returns null if key not present or value cannot be cast to the requested type.put(String key, T value): Adds key-value pair, returns this for chaining (fluent API)containsKey(String key): Checks if key existstoMap(): Returns copy as Map<String, Object>merge(Metadata another): Merges two Metadata objects. Throws exception if keys overlap.Null Handling:
null key in put/get methods: May throw NullPointerException (depends on internal map implementation)null value in put methods: Stores null valuenull Metadata in merge: Returns this unchangedEdge Cases:
Usage Examples:
// Create metadata with fluent API
Metadata meta = new Metadata()
.put("source", "document.pdf")
.put("page", 5)
.put("score", 0.95);
// Type-safe retrieval
String source = meta.getString("source");
Integer page = meta.getInteger("page");
Float score = meta.getFloat("score");
// Null handling
Integer missing = meta.getInteger("nonexistent"); // Returns nullRepresents a semantically meaningful segment of text with optional metadata.
public class TextSegment {
// Constructor
public TextSegment(String text, Metadata metadata)
// Get the text content
public String text()
// Get the metadata
public Metadata metadata()
// Factory methods
public static TextSegment from(String text)
public static TextSegment from(String text, Metadata metadata)
public static TextSegment textSegment(String text)
public static TextSegment textSegment(String text, Metadata metadata)
}Package: dev.langchain4j.data.segment
Key Methods:
text(): Returns the text content. May return null if TextSegment was created with null text.metadata(): Returns the associated metadata. Never null; returns empty Metadata if none provided.from(String text): Creates a TextSegment with empty metadatafrom(String text, Metadata metadata): Creates a TextSegment with specified metadatatextSegment(String text): Alternative factory method (same as from(String text))textSegment(String text, Metadata metadata): Alternative factory method (same as from(String text, Metadata metadata))Null Handling:
null text: Accepted; stored as-is (embedding models typically treat as empty)null metadata: Replaced with empty Metadata instanceUsage Examples:
// Simple text segment
TextSegment segment1 = TextSegment.from("This is a document");
// Text segment with metadata
Metadata meta = new Metadata().put("source", "doc1.txt");
TextSegment segment2 = TextSegment.from("Content here", meta);
// Accessing content
String text = segment2.text();
String source = segment2.metadata().getString("source");Generic wrapper for model responses containing the generated content and metadata.
public class Response<T> {
// Constructors
public Response(T content)
public Response(T content, TokenUsage tokenUsage, FinishReason finishReason)
public Response(T content, TokenUsage tokenUsage, FinishReason finishReason, java.util.Map<String, Object> metadata)
// Get the content
public T content()
// Get token usage statistics
public TokenUsage tokenUsage()
// Get finish reason
public FinishReason finishReason()
// Get response metadata
public java.util.Map<String, Object> metadata()
// Factory methods
public static <T> Response<T> from(T content)
public static <T> Response<T> from(T content, TokenUsage tokenUsage)
public static <T> Response<T> from(T content, TokenUsage tokenUsage, FinishReason finishReason)
public static <T> Response<T> from(T content, TokenUsage tokenUsage, FinishReason finishReason, java.util.Map<String, Object> metadata)
}Package: dev.langchain4j.model.output
Type Parameter:
T: The type of content (Embedding or List<Embedding> for this model)Key Methods:
content(): Returns the generated content (Embedding or List<Embedding>). Never null.tokenUsage(): Returns token usage statistics. May be null if not provided.finishReason(): Returns the finish reason. Always null for embedding models.metadata(): Returns response metadata. Returns empty map if not provided (never null).Null Handling:
null content: Stored as-is (may cause issues downstream)null tokenUsage: Accepted and storednull finishReason: Accepted and storednull metadata map: Replaced with empty mapUsage Example:
Response<Embedding> response = model.embed("test");
Embedding emb = response.content(); // Never null
TokenUsage usage = response.tokenUsage(); // May be null
FinishReason reason = response.finishReason(); // Always null for embeddings
Map<String, Object> meta = response.metadata(); // Empty map for this modelRepresents token usage statistics for a model response.
public class TokenUsage {
// Constructors
public TokenUsage()
public TokenUsage(Integer inputTokenCount)
public TokenUsage(Integer inputTokenCount, Integer outputTokenCount)
public TokenUsage(Integer inputTokenCount, Integer outputTokenCount, Integer totalTokenCount)
// Get input token count
public Integer inputTokenCount()
// Get output token count (always null for embedding models)
public Integer outputTokenCount()
// Get total token count
public Integer totalTokenCount()
// Add two TokenUsage instances
public TokenUsage add(TokenUsage that)
// Static method to sum two TokenUsage instances
public static TokenUsage sum(TokenUsage first, TokenUsage second)
}Package: dev.langchain4j.model.output
Key Methods:
inputTokenCount(): Returns the number of input tokens. May be null. For this model, excludes special tokens [CLS] and [SEP].outputTokenCount(): Returns the number of output tokens. Always null for embedding models.totalTokenCount(): Returns the total token count. May be null. For this model, equals inputTokenCount when populated.add(TokenUsage that): Adds the token usage of another TokenUsage instance to this one, returning a new TokenUsage with summed values. Returns this instance unchanged if that is null.sum(TokenUsage first, TokenUsage second): Static method to add two TokenUsage instances. Returns the non-null instance if one is null, or a new TokenUsage with summed values if both are non-null. Returns null if both are null.Null Handling:
add(null): Returns this unchangedsum(null, null): Returns nullNote: For embedding models, only inputTokenCount is populated, representing the number of tokens in the input text (excluding special tokens).
Usage Example:
Response<Embedding> response = model.embed("test text");
TokenUsage usage = response.tokenUsage();
if (usage != null) {
Integer inputTokens = usage.inputTokenCount(); // May be null
Integer totalTokens = usage.totalTokenCount(); // Equals inputTokens
}
// Summing token usage from multiple responses
TokenUsage total = usage1.add(usage2).add(usage3);Context object containing the input text segments and attributes for embedding model requests.
public class EmbeddingModelRequestContext {
// Get the input text segments to be embedded
public java.util.List<TextSegment> textSegments()
// Get the embedding model instance
public EmbeddingModel embeddingModel()
// Get the attributes map for passing data between listeners
public java.util.Map<Object, Object> attributes()
// Builder pattern for constructing instances
public static Builder builder()
// Inner Builder class
public static class Builder {
public Builder textSegments(java.util.List<TextSegment> textSegments)
public Builder embeddingModel(EmbeddingModel embeddingModel)
public Builder attributes(java.util.Map<Object, Object> attributes)
public EmbeddingModelRequestContext build()
}
}Package: dev.langchain4j.model.embedding.listener
Annotation: @Experimental (since v1.11.0)
Key Methods:
textSegments(): Returns the list of input text segments to be embedded. Never null.embeddingModel(): Returns the embedding model that will process the request. Never null.attributes(): Returns a mutable map for passing data between listener methods (e.g., for logging context, timing information). Never null; modifications are visible to subsequent callbacks.builder(): Static factory method to create a new Builder instance for constructing the contextUsage: This context is passed to EmbeddingModelListener.onRequest() before the embedding operation begins. Listeners can use the attributes map to store request-specific data that will be available in subsequent response or error callbacks.
Example:
@Override
public void onRequest(EmbeddingModelRequestContext ctx) {
// Store timing information
ctx.attributes().put("startTime", System.currentTimeMillis());
// Log request details
int segmentCount = ctx.textSegments().size();
String modelName = ctx.embeddingModel().modelName();
System.out.println("Embedding " + segmentCount + " segments with " + modelName);
}Context object containing the embedding response, input text segments, and attributes for successful embedding operations.
public class EmbeddingModelResponseContext {
// Get the embedding response containing the list of embeddings
public Response<java.util.List<Embedding>> response()
// Get the input text segments that were embedded
public java.util.List<TextSegment> textSegments()
// Get the embedding model instance
public EmbeddingModel embeddingModel()
// Get the attributes map for passing data between listeners
public java.util.Map<Object, Object> attributes()
// Builder pattern for constructing instances
public static Builder builder()
// Inner Builder class
public static class Builder {
public Builder response(Response<java.util.List<Embedding>> response)
public Builder textSegments(java.util.List<TextSegment> textSegments)
public Builder embeddingModel(EmbeddingModel embeddingModel)
public Builder attributes(java.util.Map<Object, Object> attributes)
public EmbeddingModelResponseContext build()
}
}Package: dev.langchain4j.model.embedding.listener
Annotation: @Experimental (since v1.11.0)
Key Methods:
response(): Returns the Response object containing the list of generated embeddings and metadata (token usage, etc.). Never null.textSegments(): Returns the input text segments that were successfully embedded. Never null.embeddingModel(): Returns the embedding model that processed the request. Never null.attributes(): Returns the attributes map that was passed through from the request context. Never null; contains any data stored during onRequest().builder(): Static factory method to create a new Builder instance for constructing the contextUsage: This context is passed to EmbeddingModelListener.onResponse() after a successful embedding operation. It provides access to both the request data and the resulting embeddings.
Example:
@Override
public void onResponse(EmbeddingModelResponseContext ctx) {
// Retrieve timing information from request
Long startTime = (Long) ctx.attributes().get("startTime");
long duration = System.currentTimeMillis() - startTime;
// Access response data
List<Embedding> embeddings = ctx.response().content();
TokenUsage usage = ctx.response().tokenUsage();
System.out.println("Generated " + embeddings.size() + " embeddings in " + duration + "ms");
System.out.println("Token usage: " + usage.inputTokenCount() + " tokens");
}Context object containing the error, input text segments, and attributes when an embedding operation fails.
public class EmbeddingModelErrorContext {
// Get the error that occurred during the embedding operation
public Throwable error()
// Get the input text segments that caused the error
public java.util.List<TextSegment> textSegments()
// Get the embedding model instance
public EmbeddingModel embeddingModel()
// Get the attributes map for passing data between listeners
public java.util.Map<Object, Object> attributes()
// Builder pattern for constructing instances
public static Builder builder()
// Inner Builder class
public static class Builder {
public Builder error(Throwable error)
public Builder textSegments(java.util.List<TextSegment> textSegments)
public Builder embeddingModel(EmbeddingModel embeddingModel)
public Builder attributes(java.util.Map<Object, Object> attributes)
public EmbeddingModelErrorContext build()
}
}Package: dev.langchain4j.model.embedding.listener
Annotation: @Experimental (since v1.11.0)
Key Methods:
error(): Returns the Throwable (exception or error) that occurred during the embedding operation. Never null.textSegments(): Returns the input text segments that caused the error. Never null.embeddingModel(): Returns the embedding model that encountered the error. Never null.attributes(): Returns the attributes map that was passed through from the request context. Never null; contains any data stored during onRequest().builder(): Static factory method to create a new Builder instance for constructing the contextUsage: This context is passed to EmbeddingModelListener.onError() when an embedding operation fails. It provides access to both the request data and the error details for logging, monitoring, or recovery purposes.
Example:
@Override
public void onError(EmbeddingModelErrorContext ctx) {
// Log error details
Throwable error = ctx.error();
int segmentCount = ctx.textSegments().size();
System.err.println("Error embedding " + segmentCount + " segments: " + error.getMessage());
error.printStackTrace();
// Could implement retry logic, fallback, or alerting here
}Interface for listening to embedding model requests, responses, and errors.
public interface EmbeddingModelListener {
// Called before the request is executed against the embedding model
default void onRequest(EmbeddingModelRequestContext requestContext) {}
// Called after a successful embedding operation completes
default void onResponse(EmbeddingModelResponseContext responseContext) {}
// Called when an error occurs during interaction with the embedding model
default void onError(EmbeddingModelErrorContext errorContext) {}
}Package: dev.langchain4j.model.embedding.listener
Annotation: @Experimental (since v1.11.0)
Key Methods:
onRequest(EmbeddingModelRequestContext requestContext): Called before embedding execution. The request context contains input and attributes for passing data between listeners.onResponse(EmbeddingModelResponseContext responseContext): Called after successful embedding. The response context contains the response, corresponding request, and attributes.onError(EmbeddingModelErrorContext errorContext): Called when an error occurs. The error context contains the error, corresponding request, and attributes.Important Characteristics:
Thread Safety: Listener methods may be called from multiple threads concurrently if the model is used concurrently. Implementations should be thread-safe or synchronized as needed.
Usage Example:
public class LoggingListener implements EmbeddingModelListener {
@Override
public void onRequest(EmbeddingModelRequestContext ctx) {
ctx.attributes().put("startTime", System.nanoTime());
System.out.println("[REQUEST] Embedding " + ctx.textSegments().size() + " segments");
}
@Override
public void onResponse(EmbeddingModelResponseContext ctx) {
long duration = System.nanoTime() - (Long) ctx.attributes().get("startTime");
System.out.println("[RESPONSE] Completed in " + (duration / 1_000_000) + "ms");
}
@Override
public void onError(EmbeddingModelErrorContext ctx) {
System.err.println("[ERROR] Failed: " + ctx.error().getMessage());
}
}
// Usage
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel();
EmbeddingModel observedModel = model.addListener(new LoggingListener());This section documents exceptions that may be thrown during embedding operations.
When Thrown:
embedAll() (null list)Example:
try {
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel(null);
} catch (NullPointerException e) {
System.err.println("Executor cannot be null");
}When Thrown:
embedAll()Example:
try {
Response<List<Embedding>> response = model.embedAll(Collections.emptyList());
} catch (IllegalArgumentException e) {
System.err.println("Cannot embed empty list: " + e.getMessage());
}When Thrown:
Prevention:
// Batch processing to prevent OOM
int batchSize = 100;
for (int i = 0; i < allSegments.size(); i += batchSize) {
int end = Math.min(i + batchSize, allSegments.size());
List<TextSegment> batch = allSegments.subList(i, end);
try {
Response<List<Embedding>> response = model.embedAll(batch);
// Process batch
} catch (OutOfMemoryError e) {
// Reduce batch size and retry
System.err.println("OOM error, reducing batch size");
break;
}
}When Thrown:
Note: These exceptions typically occur during class initialization and cannot be caught in normal operation.
Example:
try {
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel();
} catch (ExceptionInInitializerError e) {
System.err.println("Failed to initialize model: " + e.getCause().getMessage());
// This indicates a serious environment problem (missing dependencies, etc.)
}public List<Embedding> embedWithFallback(List<TextSegment> segments) {
try {
Response<List<Embedding>> response = model.embedAll(segments);
return response.content();
} catch (OutOfMemoryError e) {
// Fall back to sequential processing
List<Embedding> embeddings = new ArrayList<>();
for (TextSegment segment : segments) {
Response<Embedding> response = model.embed(segment);
embeddings.add(response.content());
}
return embeddings;
} catch (Exception e) {
System.err.println("Embedding failed: " + e.getMessage());
// Return empty list or throw custom exception
return Collections.emptyList();
}
}public Response<Embedding> embedWithRetry(String text, int maxRetries) {
int attempts = 0;
Exception lastException = null;
while (attempts < maxRetries) {
try {
return model.embed(text);
} catch (Exception e) {
lastException = e;
attempts++;
if (attempts < maxRetries) {
try {
Thread.sleep(1000 * attempts); // Exponential backoff
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw new RuntimeException("Interrupted during retry", ie);
}
}
}
}
throw new RuntimeException("Failed after " + maxRetries + " attempts", lastException);
}public class RetryListener implements EmbeddingModelListener {
private final int maxRetries;
private final Map<Object, Integer> attemptCounts = new ConcurrentHashMap<>();
public RetryListener(int maxRetries) {
this.maxRetries = maxRetries;
}
@Override
public void onError(EmbeddingModelErrorContext ctx) {
Object requestId = ctx.attributes().get("requestId");
int attempts = attemptCounts.getOrDefault(requestId, 0) + 1;
if (attempts < maxRetries) {
attemptCounts.put(requestId, attempts);
System.out.println("Retrying (attempt " + attempts + ")");
// Trigger retry (would need custom retry logic)
} else {
System.err.println("Failed after " + maxRetries + " attempts");
attemptCounts.remove(requestId);
}
}
}Cause: ONNX Runtime or other dependencies are missing from classpath
Solution:
<!-- Ensure all transitive dependencies are resolved -->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-embeddings-all-minilm-l6-v2-q</artifactId>
<version>1.11.0</version>
</dependency>
<!-- No exclusions should be applied to ONNX Runtime -->Cause: Insufficient heap memory for large batch processing
Solution:
# Increase JVM heap size
java -Xmx4g -jar your-application.jar
# Or use batch processing in code
int batchSize = 50; // Adjust based on available memoryCause: This model is deterministic; inconsistency suggests concurrent modification or model reloading
Solution:
// Ensure model instance is reused (thread-safe)
private static final EmbeddingModel MODEL = new AllMiniLmL6V2QuantizedEmbeddingModel();
// Do not modify embedding vectors after generation
Embedding emb = model.embed("text").content();
float[] vector = emb.vector();
// Do not modify 'vector' arrayCause: Text exceeds recommended 256 token limit
Solution:
// Split long documents into chunks
public List<Embedding> embedLongDocument(String longText) {
// Split into ~200 token chunks (roughly 150 words)
String[] chunks = splitIntoChunks(longText, 150);
List<TextSegment> segments = Arrays.stream(chunks)
.map(TextSegment::from)
.collect(Collectors.toList());
Response<List<Embedding>> response = model.embedAll(segments);
return response.content();
}Cause: Sequential processing or suboptimal executor configuration
Solution:
// Use custom executor with appropriate thread pool size
int threads = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool(threads);
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel(executor);
// Use batch embedding for multiple segments
Response<List<Embedding>> response = model.embedAll(segments); // ParallelizedCause: Optional fields (tokenUsage, finishReason, metadata) may be null
Solution:
Response<Embedding> response = model.embed("text");
// Always check for null
TokenUsage usage = response.tokenUsage();
if (usage != null && usage.inputTokenCount() != null) {
int tokens = usage.inputTokenCount();
System.out.println("Used " + tokens + " tokens");
}EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel()
.addListener(new EmbeddingModelListener() {
@Override
public void onRequest(EmbeddingModelRequestContext ctx) {
System.out.println("Request: " + ctx.textSegments().size() + " segments");
}
@Override
public void onResponse(EmbeddingModelResponseContext ctx) {
System.out.println("Response: " + ctx.response().content().size() + " embeddings");
}
@Override
public void onError(EmbeddingModelErrorContext ctx) {
ctx.error().printStackTrace();
}
});int dim = model.dimension(); // Should always be 384
assert dim == 384 : "Unexpected dimension: " + dim;Embedding emb = model.embed("test").content();
float[] vector = emb.vector();
// Check dimension
assert vector.length == 384;
// Check normalization (magnitude ≈ 1.0)
double magnitude = 0.0;
for (float v : vector) {
magnitude += v * v;
}
magnitude = Math.sqrt(magnitude);
System.out.println("Magnitude: " + magnitude); // Should be ≈ 1.0Runtime runtime = Runtime.getRuntime();
long usedMemory = runtime.totalMemory() - runtime.freeMemory();
System.out.println("Memory used: " + (usedMemory / 1024 / 1024) + " MB");The model can handle unlimited text length, but quality degrades beyond 256 tokens. For long texts (over 510 tokens), the model automatically splits the text and averages the embeddings.
// Long text is automatically handled
String longText = "...text with more than 510 tokens...";
Response<Embedding> response = model.embed(longText);
Embedding embedding = response.content(); // Still 384-dimensional, averaged if neededBest Practice for Long Documents:
public List<Embedding> embedLongDocumentWithChunking(String document) {
// Split document into semantic chunks (e.g., paragraphs or sentences)
List<String> chunks = splitIntoSemanticChunks(document, 200); // ~200 words per chunk
List<TextSegment> segments = chunks.stream()
.map(TextSegment::from)
.collect(Collectors.toList());
Response<List<Embedding>> response = model.embedAll(segments);
return response.content();
}
// For document-level embedding, average the chunk embeddings
public Embedding getDocumentEmbedding(List<Embedding> chunkEmbeddings) {
int dim = chunkEmbeddings.get(0).dimension();
float[] avgVector = new float[dim];
for (Embedding emb : chunkEmbeddings) {
float[] vector = emb.vector();
for (int i = 0; i < dim; i++) {
avgVector[i] += vector[i];
}
}
for (int i = 0; i < dim; i++) {
avgVector[i] /= chunkEmbeddings.size();
}
Embedding docEmbedding = Embedding.from(avgVector);
docEmbedding.normalize(); // Normalize after averaging
return docEmbedding;
}Use cosine similarity to compare embeddings (since they're normalized, dot product equals cosine similarity).
import dev.langchain4j.store.embedding.CosineSimilarity;
import dev.langchain4j.store.embedding.RelevanceScore;
Embedding emb1 = model.embed("Hello world").content();
Embedding emb2 = model.embed("Hi there").content();
// Compute cosine similarity
double cosineSim = CosineSimilarity.between(emb1, emb2);
// Convert to relevance score (0 to 1 scale)
double relevance = RelevanceScore.fromCosineSimilarity(cosineSim);
// Manual cosine similarity (since vectors are normalized, just dot product)
float[] v1 = emb1.vector();
float[] v2 = emb2.vector();
double dotProduct = 0.0;
for (int i = 0; i < v1.length; i++) {
dotProduct += v1[i] * v2[i];
}
// dotProduct is the cosine similarity (vectors are unit length)Similarity Thresholds:
The model is thread-safe and supports concurrent embedding operations:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.List;
import java.util.ArrayList;
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel();
ExecutorService executor = Executors.newFixedThreadPool(10);
List<Future<Embedding>> futures = new ArrayList<>();
// Submit multiple embedding tasks concurrently
for (String text : texts) {
futures.add(executor.submit(() -> model.embed(text).content()));
}
// Collect results
for (Future<Embedding> future : futures) {
Embedding embedding = future.get();
// Process embedding
}
executor.shutdown();Thread Safety Notes:
Control the parallel processing behavior by providing a custom executor:
import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;
// Create custom executor with specific thread pool size
ExecutorService customExecutor = Executors.newFixedThreadPool(8);
// Pass to model constructor
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel(customExecutor);
// When embedding multiple segments, uses custom executor
Response<List<Embedding>> response = model.embedAll(segments);
// Don't forget to shutdown when done (or use try-with-resources pattern)
customExecutor.shutdown();Executor Selection Guidelines:
Performance Tuning:
// For CPU-bound tasks, use core count
int threads = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool(threads);
// For mixed workloads, use slightly more threads
int threads = Runtime.getRuntime().availableProcessors() + 2;
ExecutorService executor = Executors.newFixedThreadPool(threads);import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
// Embed and store documents
List<TextSegment> documents = Arrays.asList(
TextSegment.from("First document", new Metadata().put("id", "doc1")),
TextSegment.from("Second document", new Metadata().put("id", "doc2"))
);
Response<List<Embedding>> response = model.embedAll(documents);
List<Embedding> embeddings = response.content();
// Store embeddings with their documents
for (int i = 0; i < documents.size(); i++) {
embeddingStore.add(embeddings.get(i), documents.get(i));
}// Query embedding
Embedding queryEmbedding = model.embed("search query").content();
// Find similar documents
int maxResults = 5;
double minScore = 0.7;
List<EmbeddingMatch<TextSegment>> matches = embeddingStore.findRelevant(
queryEmbedding,
maxResults,
minScore
);
// Process results
for (EmbeddingMatch<TextSegment> match : matches) {
TextSegment segment = match.embedded();
double score = match.score();
System.out.println("Score: " + score + ", Text: " + segment.text());
}To avoid recomputing embeddings for the same text:
import java.util.concurrent.ConcurrentHashMap;
import java.util.Map;
public class CachedEmbeddingModel {
private final EmbeddingModel model;
private final Map<String, Embedding> cache;
public CachedEmbeddingModel(EmbeddingModel model) {
this.model = model;
this.cache = new ConcurrentHashMap<>();
}
public Embedding embed(String text) {
return cache.computeIfAbsent(text, t ->
model.embed(t).content()
);
}
public void clearCache() {
cache.clear();
}
public int getCacheSize() {
return cache.size();
}
}
// Usage
EmbeddingModel baseModel = new AllMiniLmL6V2QuantizedEmbeddingModel();
CachedEmbeddingModel cachedModel = new CachedEmbeddingModel(baseModel);
Embedding emb1 = cachedModel.embed("test"); // Computed
Embedding emb2 = cachedModel.embed("test"); // Retrieved from cache
assert emb1 == emb2; // Same instanceCache Considerations:
public List<Embedding> embedAllInBatches(List<String> texts, int batchSize) {
List<Embedding> allEmbeddings = new ArrayList<>();
for (int i = 0; i < texts.size(); i += batchSize) {
int end = Math.min(i + batchSize, texts.size());
List<TextSegment> batch = texts.subList(i, end).stream()
.map(TextSegment::from)
.collect(Collectors.toList());
Response<List<Embedding>> response = model.embedAll(batch);
allEmbeddings.addAll(response.content());
}
return allEmbeddings;
}public List<Embedding> embedAllAdaptive(List<String> texts) {
int batchSize = 100;
List<Embedding> allEmbeddings = new ArrayList<>();
for (int i = 0; i < texts.size(); i += batchSize) {
int end = Math.min(i + batchSize, texts.size());
List<TextSegment> batch = texts.subList(i, end).stream()
.map(TextSegment::from)
.collect(Collectors.toList());
try {
Response<List<Embedding>> response = model.embedAll(batch);
allEmbeddings.addAll(response.content());
} catch (OutOfMemoryError e) {
// Reduce batch size and retry
batchSize = batchSize / 2;
i -= batchSize; // Retry current batch with smaller size
System.err.println("OOM: reducing batch size to " + batchSize);
}
}
return allEmbeddings;
}public class PerformanceMonitoringListener implements EmbeddingModelListener {
private final AtomicLong totalRequests = new AtomicLong(0);
private final AtomicLong totalTime = new AtomicLong(0);
private final AtomicLong totalTokens = new AtomicLong(0);
@Override
public void onRequest(EmbeddingModelRequestContext ctx) {
totalRequests.incrementAndGet();
ctx.attributes().put("startTime", System.nanoTime());
}
@Override
public void onResponse(EmbeddingModelResponseContext ctx) {
long startTime = (Long) ctx.attributes().get("startTime");
long duration = System.nanoTime() - startTime;
totalTime.addAndGet(duration);
TokenUsage usage = ctx.response().tokenUsage();
if (usage != null && usage.inputTokenCount() != null) {
totalTokens.addAndGet(usage.inputTokenCount());
}
}
public void printStats() {
long requests = totalRequests.get();
long avgTimeMs = totalTime.get() / requests / 1_000_000;
double avgTokens = (double) totalTokens.get() / requests;
System.out.println("Total requests: " + requests);
System.out.println("Average time: " + avgTimeMs + "ms");
System.out.println("Average tokens: " + avgTokens);
}
}
// Usage
PerformanceMonitoringListener monitor = new PerformanceMonitoringListener();
EmbeddingModel model = new AllMiniLmL6V2QuantizedEmbeddingModel()
.addListener(monitor);
// ... use model ...
monitor.printStats();all-minilm-l6-v2-q.onnx) and tokenizer (all-minilm-l6-v2-q-tokenizer.json) are loaded from the JAR's classpath during class initializationnormalize() on embeddings from this model is unnecessary.Note: Times vary significantly with hardware (CPU speed, cores) and JVM configuration.
embedAll() is more efficient than multiple embed() calls