CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-open-ai

LangChain4j OpenAI Integration providing Java access to OpenAI APIs including chat models, embeddings, image generation, audio transcription, and moderation.

Overview
Eval results
Files

index.mddocs/

LangChain4j OpenAI Integration

A comprehensive Java library for integrating OpenAI's powerful AI capabilities into applications through the LangChain4j framework. This module provides unified access to OpenAI's complete API suite including GPT-4o and GPT-4 chat models, text embeddings, DALL-E image generation, Whisper audio transcription, and content moderation capabilities.

The integration supports advanced features like streaming responses, structured JSON outputs with schema validation, tool/function calling with parallel execution, reasoning capabilities for o1/o3 models, prompt caching, and comprehensive token usage tracking. All models follow consistent builder patterns and support extensive configuration including retry logic, custom HTTP settings, and observability through listeners.

Package Information

  • Package Name: dev.langchain4j:langchain4j-open-ai
  • Package Type: Maven
  • Language: Java
  • Installation: Add to Maven pom.xml:
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-open-ai</artifactId>
    <version>1.11.0</version>
</dependency>

Or Gradle:

implementation 'dev.langchain4j:langchain4j-open-ai:1.11.0'

Agent Decision Guide

When implementing OpenAI integration, choose components based on your requirements:

RequirementComponentFile Reference
Conversational AI with historyOpenAiChatModelChat Models
Real-time streaming responsesOpenAiStreamingChatModelChat Models
Simple text completion (legacy)OpenAiLanguageModelLanguage Models
Semantic search / RAGOpenAiEmbeddingModelEmbedding Models
Image generation from textOpenAiImageModelImage Models
Audio to text transcriptionOpenAiAudioTranscriptionModelAudio Transcription
Content policy checkingOpenAiModerationModelModeration Models
Cost estimation before callsOpenAiTokenCountEstimatorToken Management
Query available modelsOpenAiModelCatalogModel Catalog

Core Imports

// Chat models
import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.model.openai.OpenAiStreamingChatModel;
import dev.langchain4j.model.openai.OpenAiChatModelName;

// Language models (completion interface)
import dev.langchain4j.model.openai.OpenAiLanguageModel;
import dev.langchain4j.model.openai.OpenAiStreamingLanguageModel;
import dev.langchain4j.model.openai.OpenAiLanguageModelName;

// Embedding models
import dev.langchain4j.model.openai.OpenAiEmbeddingModel;
import dev.langchain4j.model.openai.OpenAiEmbeddingModelName;

// Image generation models
import dev.langchain4j.model.openai.OpenAiImageModel;
import dev.langchain4j.model.openai.OpenAiImageModelName;

// Audio transcription models
import dev.langchain4j.model.openai.OpenAiAudioTranscriptionModel;
import dev.langchain4j.model.openai.OpenAiAudioTranscriptionModelName;

// Moderation models
import dev.langchain4j.model.openai.OpenAiModerationModel;
import dev.langchain4j.model.openai.OpenAiModerationModelName;

// Token management
import dev.langchain4j.model.openai.OpenAiTokenCountEstimator;
import dev.langchain4j.model.openai.OpenAiTokenUsage;

// Request and response metadata
import dev.langchain4j.model.openai.OpenAiChatRequestParameters;
import dev.langchain4j.model.openai.OpenAiChatResponseMetadata;

// Model catalog
import dev.langchain4j.model.openai.OpenAiModelCatalog;

// Core LangChain4j types
import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.audio.AudioTranscriptionRequest;
import dev.langchain4j.model.audio.AudioTranscriptionResponse;
import dev.langchain4j.model.catalog.ModelDescription;
import dev.langchain4j.model.output.Response;

Quick Start Examples

Simple Chat Completion

import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.model.openai.OpenAiChatModelName;

// Create a chat model with explicit defaults
OpenAiChatModel model = OpenAiChatModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))  // Required
    .modelName(OpenAiChatModelName.GPT_4_O)    // Recommended: GPT-4o
    .temperature(0.7)                           // Default: 1.0, Range: 0.0-2.0
    .build();

// Generate a response
String response = model.generate("What is the capital of France?");
System.out.println(response);  // "The capital of France is Paris."

Streaming Chat

import dev.langchain4j.model.openai.OpenAiStreamingChatModel;
import dev.langchain4j.model.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;

OpenAiStreamingChatModel streamingModel = OpenAiStreamingChatModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .modelName("gpt-4o")
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(UserMessage.from("Tell me a story"))
    .build();

streamingModel.doChat(request, new StreamingChatResponseHandler() {
    @Override
    public void onNext(String token) {
        System.out.print(token);  // Print each token as it arrives
    }

    @Override
    public void onComplete(ChatResponse response) {
        System.out.println("\nDone!");
        System.out.println("Tokens: " + response.tokenUsage().totalTokenCount());
    }

    @Override
    public void onError(Throwable error) {
        error.printStackTrace();
    }
});

Text Embeddings

import dev.langchain4j.model.openai.OpenAiEmbeddingModel;
import dev.langchain4j.model.openai.OpenAiEmbeddingModelName;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.data.embedding.Embedding;
import java.util.List;

OpenAiEmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .modelName(OpenAiEmbeddingModelName.TEXT_EMBEDDING_3_SMALL)  // Default: 1536 dimensions
    .build();

Response<List<Embedding>> embeddings = embeddingModel.embedAll(
    List.of(
        TextSegment.from("Hello world"),
        TextSegment.from("Goodbye world")
    )
);

System.out.println("Generated " + embeddings.content().size() + " embeddings");
// Each embedding is a float[] of length 1536

Architecture

The LangChain4j OpenAI integration is built around a consistent architecture:

Model Hierarchy

All OpenAI models implement standard LangChain4j interfaces:

  • ChatModel / StreamingChatModel: For chat completions with conversation context
  • LanguageModel / StreamingLanguageModel: For simple text completions
  • EmbeddingModel: For generating text embeddings
  • ImageModel: For image generation
  • ModerationModel: For content moderation
  • AudioTranscriptionModel: For speech-to-text

Builder Pattern

Every model uses a fluent builder pattern for configuration:

  1. Call ModelClass.builder() to get a builder
  2. Configure settings using builder methods (all return this for chaining)
  3. Call build() to create the model instance

Request/Response Flow

  1. Request Configuration: Set parameters via builder or per-request parameters
  2. Authentication: API key (required), organization ID (optional), project ID (optional)
  3. HTTP Transport: Configurable HTTP client with retry logic
  4. Response Processing: Parse responses with metadata including token usage
  5. Error Handling: Automatic retries with exponential backoff (default: 2 retries)

Streaming Architecture

Streaming models use a handler-based approach:

  • StreamingResponseHandler: Receives tokens incrementally
  • onNext(): Called for each token as it arrives
  • onComplete(): Called with final response including metadata
  • onError(): Called on errors (network, API, rate limit)

Token Management

The integration provides comprehensive token tracking:

  • OpenAiTokenCountEstimator: Estimate costs before API calls (uses jtokkit library)
  • OpenAiTokenUsage: Detailed usage including cached and reasoning tokens
  • Token counts match OpenAI's tokenization exactly

Capabilities

Chat Models

Provides access to OpenAI's conversational models (GPT-4o, GPT-4, GPT-3.5, o1, o3) with support for multi-turn conversations, system messages, and chat history. Includes both synchronous and streaming interfaces.

public class OpenAiChatModel implements ChatModel {
    public static OpenAiChatModelBuilder builder();
    public Response<AiMessage> generate(List<ChatMessage> messages);
    public ChatResponse doChat(ChatRequest chatRequest);
    public OpenAiChatRequestParameters defaultRequestParameters();
    public Set<Capability> supportedCapabilities();
    public List<ChatModelListener> listeners();
    public ModelProvider provider();
}
public class OpenAiStreamingChatModel implements StreamingChatModel {
    public static OpenAiStreamingChatModelBuilder builder();
    public void generate(List<ChatMessage> messages, StreamingResponseHandler<AiMessage> handler);
    public void doChat(ChatRequest chatRequest, StreamingChatResponseHandler handler);
    public OpenAiChatRequestParameters defaultRequestParameters();
    public List<ChatModelListener> listeners();
    public ModelProvider provider();
}

Chat Models

Language Models

Legacy completion interface for models like gpt-3.5-turbo-instruct. Supports simple text-to-text completion without conversation context. Recommended to use Chat Models for most use cases.

public class OpenAiLanguageModel implements LanguageModel {
    public static OpenAiLanguageModelBuilder builder();
    public Response<String> generate(String prompt);
    public String modelName();
}
public class OpenAiStreamingLanguageModel implements StreamingLanguageModel {
    public static OpenAiStreamingLanguageModelBuilder builder();
    public void generate(String prompt, StreamingResponseHandler<String> handler);
    public String modelName();
}

Language Models

Embedding Models

Generate dense vector representations of text for semantic search, clustering, and similarity comparisons. Supports text-embedding-3-small, text-embedding-3-large, and text-embedding-ada-002.

public class OpenAiEmbeddingModel extends DimensionAwareEmbeddingModel {
    public static OpenAiEmbeddingModelBuilder builder();
    public Response<Embedding> embed(String text);
    public Response<List<Embedding>> embedAll(List<TextSegment> textSegments);
    public Integer knownDimension();
    public String modelName();
}

Embedding Models

Image Models

Generate artistic images from text descriptions using DALL-E 2 or DALL-E 3. Supports various sizes, quality levels, and artistic styles.

public class OpenAiImageModel implements ImageModel {
    public static OpenAiImageModelBuilder builder();
    public Response<Image> generate(String prompt);
    public Response<List<Image>> generate(String prompt, int n);
    public String modelName();
}

Image Models

Audio Transcription Models

Convert audio to text using Whisper and GPT-4o audio models. Supports multiple audio formats and optional speaker diarization.

public class OpenAiAudioTranscriptionModel implements AudioTranscriptionModel {
    public static Builder builder();
    public AudioTranscriptionResponse transcribe(AudioTranscriptionRequest audioRequest);
    public ModelProvider provider();
}

Audio Transcription Models

Moderation Models

Analyze text content for policy violations including hate speech, violence, self-harm, and sexual content. Returns binary flagged status.

public class OpenAiModerationModel implements ModerationModel {
    public static OpenAiModerationModelBuilder builder();
    public Response<Moderation> moderate(String text);
    public Response<Moderation> moderate(List<ChatMessage> messages);
    public String modelName();
}

Moderation Models

Token Management

Estimate token usage and costs before making API calls. Provides detailed token usage information including cached tokens and reasoning tokens.

public class OpenAiTokenCountEstimator implements TokenCountEstimator {
    public OpenAiTokenCountEstimator(String modelName);
    public OpenAiTokenCountEstimator(OpenAiChatModelName modelName);
    public OpenAiTokenCountEstimator(OpenAiEmbeddingModelName modelName);
    public OpenAiTokenCountEstimator(OpenAiLanguageModelName modelName);
    public int estimateTokenCountInText(String text);
    public int estimateTokenCountInMessage(ChatMessage message);
    public int estimateTokenCountInMessages(Iterable<ChatMessage> messages);
    public List<Integer> encode(String text);
    public List<Integer> encode(String text, int maxTokensToEncode);
    public String decode(List<Integer> tokens);
}
public class OpenAiTokenUsage extends TokenUsage {
    public static Builder builder();
    public Integer inputTokenCount();
    public Integer outputTokenCount();
    public Integer totalTokenCount();
    public InputTokensDetails inputTokensDetails();
    public OutputTokensDetails outputTokensDetails();
    public OpenAiTokenUsage add(TokenUsage other);
}

Token Management

Request and Response Metadata

OpenAI-specific extensions to standard LangChain4j request parameters and response metadata, providing access to advanced features like reasoning effort, service tiers, and detailed token breakdowns.

public class OpenAiChatRequestParameters extends DefaultChatRequestParameters {
    public static Builder builder();
    public Integer maxCompletionTokens();
    public Map<String, Integer> logitBias();
    public Boolean parallelToolCalls();
    public Integer seed();
    public String user();
    public Boolean store();
    public Map<String, String> metadata();
    public String serviceTier();
    public String reasoningEffort();
    public Map<String, Object> customParameters();
    public OpenAiChatRequestParameters overrideWith(ChatRequestParameters other);
    public OpenAiChatRequestParameters defaultedBy(ChatRequestParameters defaults);
}
public class OpenAiChatResponseMetadata extends ChatResponseMetadata {
    public static Builder builder();
    public String id();
    public String modelName();
    public OpenAiTokenUsage tokenUsage();
    public FinishReason finishReason();
    public Long created();
    public String serviceTier();
    public String systemFingerprint();
}

Request and Response Metadata

Model Catalog

Query available OpenAI models and their capabilities through the API.

public class OpenAiModelCatalog implements ModelCatalog {
    public static Builder builder();
    public List<ModelDescription> listModels();
    public ModelProvider provider();
}

Model Catalog

Advanced Features

Experimental and advanced capabilities including the OpenAI Responses API for prompt caching, SPI factories for custom builder creation, and internal utilities.

public class OpenAiResponsesStreamingChatModel implements StreamingChatModel {
    public static Builder builder();
    public void doChat(ChatRequest chatRequest, StreamingChatResponseHandler handler);
    public ChatRequestParameters defaultRequestParameters();
    public List<ChatModelListener> listeners();
    public ModelProvider provider();
}

Advanced Features

Common Types

Model Names

// Chat model names
enum OpenAiChatModelName {
    GPT_3_5_TURBO,           // gpt-3.5-turbo (default snapshot)
    GPT_4,                   // gpt-4 (default snapshot)
    GPT_4_TURBO,             // gpt-4-turbo (default snapshot)
    GPT_4_O,                 // gpt-4o (default snapshot)
    GPT_4_O_MINI,            // gpt-4o-mini (default snapshot)
    O1,                      // o1 (reasoning model)
    O3,                      // o3 (reasoning model)
    O3_MINI,                 // o3-mini (reasoning model)
    O4_MINI,                 // o4-mini (reasoning model)
    GPT_4_1,                 // gpt-4.1 (default snapshot)
    GPT_4_1_MINI,            // gpt-4.1-mini (default snapshot)
    GPT_4_1_NANO,            // gpt-4.1-nano (default snapshot)
    GPT_5,                   // gpt-5 (default snapshot)
    GPT_5_MINI;              // gpt-5-mini (default snapshot)

    String toString();       // Returns model ID string
}

// Embedding model names
enum OpenAiEmbeddingModelName {
    TEXT_EMBEDDING_3_SMALL,  // 1536 dimensions (default), configurable down to 256
    TEXT_EMBEDDING_3_LARGE,  // 3072 dimensions (default), configurable down to 256
    TEXT_EMBEDDING_ADA_002;  // 1536 dimensions (fixed)

    String toString();                            // Returns model ID string
    Integer dimension();                          // Returns default dimension
    static Integer knownDimension(String modelName);  // Get dimension by model name
}

// Language model names
enum OpenAiLanguageModelName {
    GPT_3_5_TURBO_INSTRUCT;  // gpt-3.5-turbo-instruct (legacy completion)

    String toString();
}

// Image model names
enum OpenAiImageModelName {
    DALL_E_2,                // dall-e-2 (lower quality, multiple images)
    DALL_E_3;                // dall-e-3 (higher quality, single image)

    String toString();
}

// Moderation model names
enum OpenAiModerationModelName {
    TEXT_MODERATION_STABLE,      // Frozen, consistent version
    TEXT_MODERATION_LATEST,      // Updates over time, best accuracy
    OMNI_MODERATION_LATEST,      // Supports text + images
    OMNI_MODERATION_2024_09_26;  // Frozen version at specific date

    String toString();
}

// Audio transcription model names
enum OpenAiAudioTranscriptionModelName {
    WHISPER_1,                   // whisper-1 (general purpose)
    GPT_4_O_TRANSCRIBE,          // gpt-4o-transcribe (enhanced accuracy)
    GPT_4_O_MINI_TRANSCRIBE,     // gpt-4o-mini-transcribe (fast, good quality)
    GPT_4_O_TRANSCRIBE_DIARIZE;  // gpt-4o-transcribe-diarize (speaker identification)

    String toString();
}

Core LangChain4j Types

// Chat messages
interface ChatMessage {
    ChatMessageType type();  // USER, ASSISTANT, SYSTEM, TOOL_EXECUTION_RESULT
    String text();
}

class UserMessage implements ChatMessage {
    public static UserMessage from(String text);
    public static UserMessage from(String name, String text);
    public static UserMessage from(String text, List<Content> contents);
}

class AiMessage implements ChatMessage {
    public static AiMessage from(String text);
    public static AiMessage from(ToolExecutionRequest toolExecutionRequest);
    public String text();
    public boolean hasToolExecutionRequests();
    public List<ToolExecutionRequest> toolExecutionRequests();
}

class SystemMessage implements ChatMessage {
    public static SystemMessage from(String text);
}

// Response wrapper
class Response<T> {
    public T content();                  // The actual response content
    public TokenUsage tokenUsage();      // Token usage information
    public FinishReason finishReason();  // Why generation stopped
}

// Token usage
class TokenUsage {
    public Integer inputTokenCount();    // Tokens in prompt
    public Integer outputTokenCount();   // Tokens in response
    public Integer totalTokenCount();    // Sum of input + output
}

// Embeddings
class Embedding {
    public float[] vector();             // Dense vector representation
    public List<Float> vectorAsList();   // Vector as list
    public int dimension();              // Vector dimensionality
}

// Text segments
class TextSegment {
    public static TextSegment from(String text);
    public static TextSegment from(String text, Metadata metadata);
    public String text();
    public Metadata metadata();
}

// Images
class Image {
    public URI url();                    // URL to generated image (expires after 1 hour)
    public String base64Data();          // Base64-encoded image data
    public String revisedPrompt();       // AI-revised version of prompt (DALL-E 3)
}

// Audio
class AudioTranscriptionRequest {
    public byte[] audioData();           // Audio file bytes
    public String fileName();            // File name with extension
    public String language();            // ISO-639-1 code (e.g., "en")
    public String prompt();              // Context hint
    public Double temperature();         // Sampling temperature (0.0-1.0)
    public String responseFormat();      // "json", "text", "srt", "verbose_json", "vtt"
}

class AudioTranscriptionResponse {
    public String text();                // Transcribed text
}

// Moderation
class Moderation {
    public boolean flagged();            // true if content violates policy
    public String flaggedText();         // The specific flagged text (null if not flagged)
}

// Model provider
enum ModelProvider {
    OPEN_AI;                             // OpenAI provider identifier
}

// Finish reasons
enum FinishReason {
    STOP,                                // Natural completion
    LENGTH,                              // Max tokens reached
    TOOL_EXECUTION,                      // Tool call made
    CONTENT_FILTER,                      // Content filtered
    OTHER;                               // Other reason
}

// Capabilities
enum Capability {
    RESPONSE_FORMAT_JSON_SCHEMA,         // Structured JSON with schema
    RESPONSE_FORMAT_TEXT,                // Plain text response
    THINKING;                            // Reasoning/thinking capability (o1/o3)
}

Streaming Handlers

interface StreamingResponseHandler<T> {
    /**
     * Called when a new token is received.
     * @param token The token text
     */
    void onNext(String token);

    /**
     * Called when generation is complete.
     * @param response The complete response with metadata
     */
    void onComplete(Response<T> response);

    /**
     * Called when an error occurs during generation.
     * @param error The error that occurred
     */
    void onError(Throwable error);
}

interface StreamingChatResponseHandler {
    /**
     * Called when a new token is received.
     * @param token The token text
     */
    void onNext(String token);

    /**
     * Called periodically with partial accumulated response.
     * @param partialResponse Partial response with accumulated content
     */
    void onPartialResponse(ChatResponse partialResponse);

    /**
     * Called when generation is complete.
     * @param response The complete response with metadata
     */
    void onComplete(ChatResponse response);

    /**
     * Called when an error occurs during generation.
     * @param error The error that occurred
     */
    void onError(Throwable error);
}

HTTP Configuration

// HTTP client builder from langchain4j-http-client module
interface HttpClientBuilder {
    HttpClientBuilder connectTimeout(Duration timeout);
    HttpClientBuilder readTimeout(Duration timeout);
    HttpClientBuilder proxy(Proxy proxy);
    HttpClientBuilder sslContext(SSLContext sslContext);
    HttpClient build();
}

Error Handling

Common Error Scenarios

Authentication Errors (401):

  • Invalid API key
  • Expired API key
  • Missing API key
try {
    OpenAiChatModel model = OpenAiChatModel.builder()
        .apiKey("invalid-key")
        .build();
    model.generate("test");
} catch (Exception e) {
    // Handle: Check API key, verify it's not expired
    System.err.println("Authentication failed: " + e.getMessage());
}

Rate Limit Errors (429):

  • Too many requests
  • Token limit exceeded
// Automatic retry is built-in (default: 2 retries with exponential backoff)
OpenAiChatModel model = OpenAiChatModel.builder()
    .apiKey(apiKey)
    .maxRetries(5)  // Increase retries for rate limits
    .build();

Context Length Exceeded (400):

  • Input + output exceeds model's context window
// Use token estimator to validate before calling
OpenAiTokenCountEstimator estimator = new OpenAiTokenCountEstimator(OpenAiChatModelName.GPT_4_O);
int tokens = estimator.estimateTokenCountInMessages(messages);

if (tokens > 128000) {  // GPT-4o context window
    // Truncate or summarize messages
}

Content Policy Violation (400):

  • Content violates OpenAI usage policies
// Use moderation model to check before calling
OpenAiModerationModel moderationModel = OpenAiModerationModel.builder()
    .apiKey(apiKey)
    .build();

if (moderationModel.moderate(content).content().flagged()) {
    // Content violates policy, reject it
}

Performance Characteristics

Latency Expectations

OperationTypical LatencyNotes
Chat completion (streaming)50-200ms first tokenSubsequent tokens: 10-50ms
Chat completion (sync)1-5 secondsDepends on response length
Embedding (single)50-200msBatch is more efficient
Embedding (batch of 100)200-500msUse batching for best throughput
Image generation (DALL-E 2)10-30 secondsSize dependent
Image generation (DALL-E 3)20-60 secondsQuality dependent
Audio transcription10-30% of audio durationDiarization adds 50%
Moderation100-500msVery fast

Cost Optimization

Use appropriate model tiers:

  • GPT-4o-mini for simple tasks (15x cheaper than GPT-4)
  • text-embedding-3-small for embeddings (cheaper than large)
  • DALL-E 2 for image generation (cheaper than DALL-E 3)

Enable caching for repeated prompts:

  • Use consistent system messages
  • Structure prompts to maximize cache hits
  • Monitor cached token counts in response metadata

Batch operations:

  • Embed multiple texts in one call
  • Use DALL-E 2 for multiple image variations

Security Best Practices

API Key Management:

  • Never hardcode API keys
  • Use environment variables or secret management systems
  • Rotate keys regularly
  • Use separate keys for development/production

Content Filtering:

  • Always moderate user-generated content before processing
  • Implement rate limiting per user
  • Log and monitor for abuse patterns

Data Privacy:

  • Set store(false) to prevent conversation storage
  • Use user parameter for per-user tracking
  • Be aware of data retention policies

Common Mistakes to Avoid

  1. Not handling rate limits: Always set maxRetries appropriately
  2. Ignoring token limits: Use OpenAiTokenCountEstimator to validate inputs
  3. Hardcoding API keys: Use environment variables
  4. Not checking moderation: Filter user content before processing
  5. Using wrong model for task: Chat models != Language models
  6. Forgetting to handle streaming errors: Always implement onError()
  7. Not caching embeddings: Embed once, reuse many times
  8. Ignoring finish reasons: Check why generation stopped (LENGTH vs STOP)
  9. Not setting timeouts: Long requests can hang without timeout
  10. Using deprecated parameters: Check release notes for deprecations

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-open-ai

docs

advanced-features.md

audio-transcription-models.md

chat-models.md

embedding-models.md

image-models.md

index.md

language-models.md

model-catalog.md

moderation-models.md

request-response.md

token-management.md

README.md

tile.json