CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-github-models

This package provides a deprecated integration module that enables Java applications to interact with GitHub Models through the LangChain4j framework. It offers chat models (both synchronous and streaming), embedding models, and support for AI services with tool integration, JSON schema responses, and responsible AI features. The module wraps Azure AI Inference SDK to provide a unified API for accessing various language models hosted on GitHub Models, including chat completion capabilities, embeddings generation, and content filtering management. As of version 1.10.0, this module has been marked for deprecation and future removal, with users recommended to migrate to the langchain4j-openai-official module for enhanced functionality and better integration. The library is designed for reusability as a foundational component in LLM-powered Java applications that need to leverage GitHub-hosted AI models, offering builder patterns for configuration, support for proxy options, custom timeouts, and comprehensive model service versioning capabilities.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

best-practices.mddocs/reference/

Best Practices

Recommended patterns and practices for using langchain4j-github-models effectively.

Security

Token Management

✅ DO:

  • Store tokens in environment variables
  • Use secret management systems (Vault, AWS Secrets Manager, Azure Key Vault)
  • Rotate tokens periodically
  • Use different tokens for different environments
// ✅ Good
.gitHubToken(System.getenv("GITHUB_TOKEN"))

// ✅ Good
.gitHubToken(secretManager.getSecret("github-token"))

❌ DON'T:

  • Hardcode tokens in source code
  • Commit tokens to version control
  • Log or print tokens
  • Share tokens across applications unnecessarily
// ❌ Bad
.gitHubToken("ghp_hardcoded_token_12345")

Error Messages

✅ DO:

  • Log sanitized error information
  • Include correlation IDs for tracing
  • Separate sensitive data from logs

❌ DON'T:

  • Include tokens in error messages
  • Log full request/response in production
  • Expose internal details to end users
// ✅ Good
logger.error("API call failed: correlationId={}", correlationId);

// ❌ Bad
logger.error("API call failed with token: {}", token);

Configuration

Model Instance Reuse

✅ DO: Create once, reuse many times

public class ChatService {
    private final GitHubModelsChatModel model;

    public ChatService() {
        this.model = GitHubModelsChatModel.builder()
            .gitHubToken(System.getenv("GITHUB_TOKEN"))
            .modelName("gpt-4o")
            .build();
    }

    public String chat(String message) {
        return model.chat(createRequest(message))
            .aiMessage().text();
    }
}

❌ DON'T: Create new instances for every request

// ❌ Bad - creates new model every time
public String chat(String message) {
    GitHubModelsChatModel model = GitHubModelsChatModel.builder()
        .gitHubToken(System.getenv("GITHUB_TOKEN"))
        .modelName("gpt-4o")
        .build();
    return model.chat(request).aiMessage().text();
}

Use Type-Safe Enums

✅ DO: Use enum constants for model names

// ✅ Good - type-safe, autocomplete
.modelName(GitHubModelsChatModelName.GPT_4_O)

⚠️ OK: Use strings when flexibility needed

// ⚠️ OK - flexible but error-prone
.modelName("gpt-4o")

Set Appropriate Timeouts

✅ DO: Match timeout to expected response time

// Interactive UI
.timeout(Duration.ofSeconds(30))

// Background processing
.timeout(Duration.ofSeconds(120))

// Streaming
.timeout(Duration.ofSeconds(90))

❌ DON'T: Use extreme values

// ❌ Too short - likely to fail
.timeout(Duration.ofSeconds(5))

// ❌ Too long - poor UX
.timeout(Duration.ofMinutes(30))

Error Handling

Always Handle Errors

✅ DO: Wrap API calls in try-catch

try {
    ChatResponse response = model.chat(request);
    return response.aiMessage().text();
} catch (HttpResponseException e) {
    logger.error("Chat failed: {}", e.getMessage());
    return "I apologize, but I'm having trouble responding right now.";
}

❌ DON'T: Let exceptions propagate to users

Check Finish Reasons

✅ DO: Handle different finish reasons

ChatResponse response = model.chat(request);

switch (response.metadata().finishReason()) {
    case STOP:
        return response.aiMessage().text();
    case LENGTH:
        logger.warn("Response truncated");
        return response.aiMessage().text() + "...";
    case CONTENT_FILTER:
        logger.info("Content filtered");
        return "Response unavailable due to content policy.";
    default:
        return response.aiMessage().text();
}

Implement Retry Logic

✅ DO: Retry transient failures with backoff

int maxRetries = 3;
for (int i = 0; i < maxRetries; i++) {
    try {
        return model.chat(request);
    } catch (HttpResponseException e) {
        if (i == maxRetries - 1 || !isRetryable(e)) {
            throw e;
        }
        Thread.sleep(1000 * (long) Math.pow(2, i));
    }
}

Performance

Choose Appropriate Models

✅ DO: Match model to task

// Simple, high-volume tasks
.modelName(GitHubModelsChatModelName.GPT_4_O_MINI)

// Complex reasoning
.modelName(GitHubModelsChatModelName.GPT_4_O)

// Vision tasks
.modelName(GitHubModelsChatModelName.PHI_3_5_VISION_INSTRUCT)

Optimize Token Usage

✅ DO: Set reasonable max_tokens

// Short answers
.maxTokens(100)

// Paragraphs
.maxTokens(500)

// Articles
.maxTokens(2000)

❌ DON'T: Request more tokens than needed

// ❌ Wastes tokens and time
.maxTokens(4000)  // When you only need 200

Batch Embeddings Efficiently

✅ DO: Let the model handle batching

// ✅ Good - automatic batching
List<TextSegment> allSegments = loadSegments();  // e.g., 100
Response<List<Embedding>> response = model.embedAll(allSegments);

❌ DON'T: Process one at a time

// ❌ Bad - inefficient
for (TextSegment segment : segments) {
    model.embedAll(Arrays.asList(segment));  // 100 separate calls!
}

Use Streaming for Long Responses

✅ DO: Use streaming for better UX

// ✅ Good - user sees progress
GitHubModelsStreamingChatModel streamingModel = ...
streamingModel.chat(request, handler);

⚠️ OK: Use synchronous for short responses

// ⚠️ OK for quick responses
GitHubModelsChatModel model = ...
ChatResponse response = model.chat(request);

Prompt Engineering

Be Specific and Clear

✅ DO: Provide clear instructions

SystemMessage.from("You are a helpful assistant that answers in 2-3 sentences.")
UserMessage.from("Explain what photosynthesis is.")

❌ DON'T: Use vague prompts

UserMessage.from("Tell me about plants.")

Provide Context

✅ DO: Include relevant context

SystemMessage.from("Answer based on this document: " + documentText)
UserMessage.from("What is the main conclusion?")

Use System Messages

✅ DO: Set behavior with system messages

ChatRequest.builder()
    .messages(
        SystemMessage.from("You are a technical support agent. Be concise and helpful."),
        UserMessage.from("My app crashed.")
    )
    .build();

Control Output Length

✅ DO: Specify desired length in prompt and max_tokens

SystemMessage.from("Answer in exactly 3 bullet points.")
// Also set
.maxTokens(200)

Testing

Use Separate Test Tokens

✅ DO: Separate test and production tokens

public static String getToken() {
    String env = System.getenv("APP_ENV");
    if ("test".equals(env)) {
        return System.getenv("GITHUB_TOKEN_TEST");
    }
    return System.getenv("GITHUB_TOKEN");
}

Mock for Unit Tests

✅ DO: Use mocks for unit tests

@Test
public void testChatService() {
    ChatCompletionsClient mockClient = mock(ChatCompletionsClient.class);
    when(mockClient.complete(any())).thenReturn(mockResponse());

    GitHubModelsChatModel model = GitHubModelsChatModel.builder()
        .chatCompletionsClient(mockClient)
        .modelName("test")
        .build();

    // Test your code
}

Integration Tests with Real API

✅ DO: Have integration tests with real API

@Test
@Tag("integration")
public void testRealAPI() {
    GitHubModelsChatModel model = GitHubModelsChatModel.builder()
        .gitHubToken(System.getenv("GITHUB_TOKEN_TEST"))
        .modelName("gpt-4o-mini")  // Use cheaper model
        .build();

    ChatResponse response = model.chat(request);
    assertNotNull(response.aiMessage().text());
}

Monitoring and Observability

Add Correlation IDs

✅ DO: Track requests with correlation IDs

Map<String, String> headers = new HashMap<>();
headers.put("X-Correlation-ID", UUID.randomUUID().toString());

model.builder()
    .customHeaders(headers)
    .build();

Use Listeners for Metrics

✅ DO: Implement listeners for observability

public class MetricsListener implements ChatModelListener {
    @Override
    public void onRequest(ChatModelRequestContext context) {
        metrics.increment("chat.requests");
    }

    @Override
    public void onResponse(ChatModelResponseContext context) {
        int tokens = context.response().metadata()
            .tokenUsage().totalTokenCount();
        metrics.gauge("chat.tokens", tokens);
    }

    @Override
    public void onError(ChatModelErrorContext context) {
        metrics.increment("chat.errors");
    }
}

Log Important Events

✅ DO: Log with appropriate levels

logger.info("Chat request processed: correlationId={}, tokens={}",
    correlationId, tokenCount);

logger.warn("Response truncated: maxTokens={}", maxTokens);

logger.error("Chat failed: correlationId={}, error={}",
    correlationId, e.getMessage());

Production Deployment

Environment-Specific Configuration

✅ DO: Configure by environment

public static GitHubModelsChatModel createModel() {
    String env = System.getenv("APP_ENV");
    boolean isProduction = "production".equals(env);

    return GitHubModelsChatModel.builder()
        .gitHubToken(System.getenv("GITHUB_TOKEN"))
        .modelName("gpt-4o")
        .timeout(isProduction ? Duration.ofSeconds(60) : Duration.ofMinutes(5))
        .maxRetries(isProduction ? 5 : 1)
        .logRequestsAndResponses(!isProduction)
        .build();
}

Health Checks

✅ DO: Implement health checks

public boolean isModelHealthy() {
    try {
        ChatResponse response = model.chat(ChatRequest.builder()
            .messages(UserMessage.from("test"))
            .build());
        return response != null;
    } catch (Exception e) {
        logger.error("Health check failed", e);
        return false;
    }
}

Graceful Degradation

✅ DO: Handle failures gracefully

public String chat(String message) {
    try {
        return model.chat(request).aiMessage().text();
    } catch (HttpResponseException e) {
        logger.error("Primary model failed, trying fallback", e);

        try {
            return fallbackModel.chat(request).aiMessage().text();
        } catch (Exception fallbackError) {
            logger.error("Fallback also failed", fallbackError);
            return "Service temporarily unavailable.";
        }
    }
}

Rate Limiting

✅ DO: Implement application-level rate limiting

RateLimiter rateLimiter = RateLimiter.create(10.0); // 10 requests/second

public ChatResponse chat(ChatRequest request) {
    if (!rateLimiter.tryAcquire()) {
        throw new RateLimitException("Rate limit exceeded");
    }
    return model.chat(request);
}

Anti-Patterns to Avoid

Don't Create Models in Loops

// ❌ BAD - creates model every iteration
for (String message : messages) {
    GitHubModelsChatModel model = GitHubModelsChatModel.builder()
        .gitHubToken(token)
        .modelName("gpt-4o")
        .build();
    model.chat(request);
}

// ✅ GOOD - reuse model
GitHubModelsChatModel model = GitHubModelsChatModel.builder()
    .gitHubToken(token)
    .modelName("gpt-4o")
    .build();

for (String message : messages) {
    model.chat(request);
}

Don't Ignore Finish Reasons

// ❌ BAD - ignores truncation
String response = model.chat(request).aiMessage().text();

// ✅ GOOD - handles truncation
ChatResponse response = model.chat(request);
if (response.metadata().finishReason() == FinishReason.LENGTH) {
    logger.warn("Response was truncated");
}

Don't Use Magic Numbers

// ❌ BAD
.temperature(0.7523)
.maxTokens(1247)

// ✅ GOOD
.temperature(0.7)  // Balanced creativity
.maxTokens(1000)   // Approximately 750 words

Don't Catch and Ignore Errors

// ❌ BAD
try {
    model.chat(request);
} catch (Exception e) {
    // Silently ignored!
}

// ✅ GOOD
try {
    model.chat(request);
} catch (Exception e) {
    logger.error("Chat failed", e);
    throw new ServiceException("Chat service unavailable", e);
}

Code Organization

Separate Configuration from Logic

✅ DO:

// Configuration
public class ModelConfig {
    public static GitHubModelsChatModel createChatModel() {
        return GitHubModelsChatModel.builder()
            .gitHubToken(System.getenv("GITHUB_TOKEN"))
            .modelName("gpt-4o")
            .temperature(0.7)
            .maxTokens(1000)
            .build();
    }
}

// Service logic
public class ChatService {
    private final GitHubModelsChatModel model;

    public ChatService() {
        this.model = ModelConfig.createChatModel();
    }

    public String chat(String message) {
        return model.chat(createRequest(message))
            .aiMessage().text();
    }
}

Use Dependency Injection

✅ DO: Inject models as dependencies

@Service
public class ChatService {
    private final GitHubModelsChatModel model;

    @Autowired
    public ChatService(GitHubModelsChatModel model) {
        this.model = model;
    }
}

Centralize Error Handling

✅ DO: Create reusable error handlers

public class ModelErrorHandler {
    public static String handleError(Exception e) {
        if (e instanceof HttpResponseException) {
            HttpResponseException httpError = (HttpResponseException) e;
            // Handle based on status code
        }
        return "Service temporarily unavailable.";
    }
}

Summary Checklist

Security:

  • Tokens stored securely (not hardcoded)
  • Sensitive data not logged
  • Different tokens per environment

Configuration:

  • Model instances reused
  • Appropriate timeouts set
  • Type-safe enums used

Error Handling:

  • All API calls wrapped in try-catch
  • Finish reasons checked
  • Retry logic implemented

Performance:

  • Appropriate model selected
  • Token usage optimized
  • Batching used for embeddings

Production:

  • Health checks implemented
  • Monitoring/metrics in place
  • Graceful degradation configured

See Also

  • Error Handling
  • Model Catalog
  • Configuration Guide
  • Authentication

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-github-models

docs

index.md

quick-reference.md

tile.json