CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-springframework-ai--spring-ai-transformers

ONNX-based Transformer models for text embeddings within the Spring AI framework

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

index.mddocs/

Spring AI Transformers

ONNX-based Transformer models for text embeddings within the Spring AI framework. Provides local, CPU and GPU-accelerated embedding generation using ONNX Runtime, with zero external API dependencies for inference.

Package Information

  • Package: org.springframework.ai:spring-ai-transformers
  • Version: 1.1.2
  • Language: Java
  • Type: Maven

Installation

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-transformers</artifactId>
    <version>1.1.2</version>
</dependency>
implementation 'org.springframework.ai:spring-ai-transformers:1.1.2'

Quick Start

Spring Boot Auto-Configuration

@Service
public class MyService {
    private final TransformersEmbeddingModel embeddingModel;

    public MyService(TransformersEmbeddingModel embeddingModel) {
        this.embeddingModel = embeddingModel;
    }

    public float[] embed(String text) {
        return embeddingModel.embed(text);
    }
}

Manual Initialization

TransformersEmbeddingModel model = new TransformersEmbeddingModel();
model.afterPropertiesSet(); // Required before use
float[] embedding = model.embed("Hello world");

Core Concepts

Architecture

  • TransformersEmbeddingModel: Main API implementing Spring AI's EmbeddingModel interface
  • ONNX Runtime: Executes pre-trained transformer models
  • HuggingFace Tokenizers: Converts text to token IDs
  • ResourceCacheService: Caches models and tokenizers locally
  • Default Model: all-MiniLM-L6-v2 (384 dimensions)

Key Features

  • Local inference (no external API calls)
  • CPU and GPU support
  • Automatic resource caching
  • Spring Boot auto-configuration
  • Batch processing
  • Micrometer observation support

API Quick Reference

Embedding Operations

// Single text
float[] embed(String text);

// Document with metadata
float[] embed(Document document);

// Batch processing
List<float[]> embed(List<String> texts);

// With response metadata
EmbeddingResponse embedForResponse(List<String> texts);

// Full request
EmbeddingResponse call(EmbeddingRequest request);

// Get dimensions
int dimensions();

Configuration

// Model and tokenizer
void setModelResource(String modelResourceUri);
void setTokenizerResource(String tokenizerResourceUri);
void setTokenizerOptions(Map<String, String> tokenizerOptions);

// Hardware
void setGpuDeviceId(int gpuDeviceId);

// Caching
void setDisableCaching(boolean disableCaching);
void setResourceCacheDirectory(String resourceCacheDir);

// Model output
void setModelOutputName(String modelOutputName);

// Initialization (required)
void afterPropertiesSet() throws Exception;

Spring Boot Properties

# Model configuration
spring.ai.embedding.transformer.onnx.model-uri=classpath:/models/model.onnx
spring.ai.embedding.transformer.onnx.gpu-device-id=0
spring.ai.embedding.transformer.onnx.model-output-name=last_hidden_state

# Tokenizer configuration
spring.ai.embedding.transformer.tokenizer.uri=classpath:/tokenizers/tokenizer.json
spring.ai.embedding.transformer.tokenizer.options.modelMaxLength=512

# Cache configuration
spring.ai.embedding.transformer.cache.enabled=true
spring.ai.embedding.transformer.cache.directory=/var/cache/spring-ai

# Metadata mode
spring.ai.embedding.transformer.metadata-mode=NONE

Core Imports

import org.springframework.ai.transformers.TransformersEmbeddingModel;
import org.springframework.ai.transformers.ResourceCacheService;
import org.springframework.ai.embedding.EmbeddingResponse;
import org.springframework.ai.embedding.EmbeddingRequest;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.MetadataMode;

Default Behavior

SettingDefault Value
Modelall-MiniLM-L6-v2
Dimensions384
HardwareCPU (GPU via setGpuDeviceId(0))
Cache Location{java.io.tmpdir}/spring-ai-onnx-generative
Metadata ModeNONE
CachingEnabled

Common Patterns

Singleton Pattern (Recommended)

@Configuration
public class EmbeddingConfig {
    @Bean
    public TransformersEmbeddingModel embeddingModel() throws Exception {
        TransformersEmbeddingModel model = new TransformersEmbeddingModel();
        model.afterPropertiesSet();
        return model;
    }
}

GPU with Fallback

TransformersEmbeddingModel model = new TransformersEmbeddingModel();
model.setGpuDeviceId(0);
try {
    model.afterPropertiesSet();
} catch (Exception e) {
    if (e.getMessage().contains("GPU")) {
        model.setGpuDeviceId(-1); // Fallback to CPU
        model.afterPropertiesSet();
    }
}

Batch Processing

List<String> texts = List.of("text1", "text2", "text3");
List<float[]> embeddings = model.embed(texts); // More efficient than loop

Key Types

// Main model class
public class TransformersEmbeddingModel extends AbstractEmbeddingModel implements InitializingBean

// Resource caching
public class ResourceCacheService

// Metadata handling
public enum MetadataMode { NONE, EMBED, ALL }

// Response types
public class EmbeddingResponse
public class Embedding
public class EmbeddingRequest

Error Handling

// Initialization errors
try {
    model.afterPropertiesSet();
} catch (Exception e) {
    // Handle: model loading, GPU, cache, network errors
}

// Runtime errors
try {
    float[] embedding = model.embed(text);
} catch (IllegalArgumentException e) {
    // Null parameters
} catch (RuntimeException e) {
    // ONNX inference errors
}

Performance Characteristics

  • First call: Includes model download (~50MB) and initialization (100-500ms)
  • Subsequent calls: 5-20ms (CPU), 2-5ms (GPU)
  • Batch processing: 30-50% faster than sequential
  • Memory footprint: ~170MB per model instance
  • Thread-safe: After initialization

Documentation

Guides

  • Quick Start Guide - Detailed getting started
  • Configuration Guide - Complete configuration options
  • Integration Guide - Spring Boot integration

Examples

Reference

Troubleshooting

Auto-Configuration Not Working

# Enable debug logging
debug=true
logging.level.org.springframework.boot.autoconfigure=DEBUG

GPU Initialization Fails

# Fallback to CPU
spring.ai.embedding.transformer.onnx.gpu-device-id=-1

Slow Startup

# Use local resources
spring.ai.embedding.transformer.onnx.model-uri=classpath:/models/model.onnx
spring.ai.embedding.transformer.cache.directory=/persistent/cache

Version Compatibility

  • Spring AI: 1.x.x
  • Spring Boot: 3.x.x (for auto-configuration)
  • Java: 17+
  • ONNX Runtime: Compatible with ONNX opset 13+

docs

index.md

tile.json