CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-springframework-ai--spring-ai-openai

OpenAI models support for Spring AI, providing comprehensive integration for chat completion, embeddings, image generation, audio transcription, text-to-speech, and content moderation capabilities within Spring Boot applications.

Overview
Eval results
Files

moderation-models.mddocs/reference/

Moderation Models

Analyze and detect potentially harmful content using OpenAI's content moderation API, identifying categories such as hate speech, violence, sexual content, and self-harm.

Capabilities

OpenAiModerationModel

Content moderation for detecting and classifying potentially harmful or policy-violating content.

/**
 * OpenAI content moderation model implementation
 */
public class OpenAiModerationModel implements ModerationModel {
    /**
     * Analyze content for moderation
     * @param moderationPrompt Prompt containing content to moderate
     * @return ModerationResponse with flagged categories and scores
     */
    public ModerationResponse call(ModerationPrompt moderationPrompt);

    /**
     * Get the default options for this model
     * @return ModerationOptions configuration
     */
    public OpenAiModerationOptions getDefaultOptions();

    /**
     * Set default options for this model
     * @param defaultOptions New default options
     * @return This model instance with updated defaults
     */
    public OpenAiModerationModel withDefaultOptions(OpenAiModerationOptions defaultOptions);
}

Constructors:

// Basic constructor
public OpenAiModerationModel(OpenAiModerationApi openAiModerationApi);

// With retry support
public OpenAiModerationModel(
    OpenAiModerationApi openAiModerationApi,
    RetryTemplate retryTemplate
);

Note: OpenAiModerationModel uses constructor-based initialization only and does not provide a builder pattern. Use the appropriate constructor based on your configuration needs.

Usage Example:

import org.springframework.ai.openai.OpenAiModerationModel;
import org.springframework.ai.openai.OpenAiModerationOptions;
import org.springframework.ai.openai.api.OpenAiModerationApi;
import org.springframework.ai.moderation.ModerationPrompt;
import org.springframework.ai.moderation.ModerationResponse;
import org.springframework.retry.support.RetryTemplate;

// Create API client
var moderationApi = OpenAiModerationApi.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .build();

// Configure retry template
var retryTemplate = RetryTemplate.builder()
    .maxAttempts(3)
    .exponentialBackoff(1000, 2.0, 10000)
    .build();

// Create moderation model
var moderationModel = new OpenAiModerationModel(
    moderationApi,
    retryTemplate
);

// Moderate single text
var response = moderationModel.call(
    new ModerationPrompt("I want to hurt someone")
);

var result = response.getResult();
if (result.isFlagged()) {
    System.out.println("Content flagged!");
    System.out.println("Violence: " + result.getCategoryScores().get("violence"));
    System.out.println("Hate: " + result.getCategoryScores().get("hate"));
}

// Moderate multiple texts
var multiPrompt = new ModerationPrompt(
    List.of("Hello, how are you?", "This contains harmful content...")
);

var multiResponse = moderationModel.call(multiPrompt);
multiResponse.getResults().forEach(result -> {
    System.out.println("Flagged: " + result.isFlagged());
    System.out.println("Categories: " + result.getCategories());
});

Content Filtering System:

// Build a content filtering system
public boolean isContentSafe(String userInput) {
    var response = moderationModel.call(new ModerationPrompt(userInput));
    var result = response.getResult();

    if (result.isFlagged()) {
        // Log which categories were violated
        result.getCategories().forEach((category, flagged) -> {
            if (flagged) {
                System.out.println("Violated category: " + category);
                double score = result.getCategoryScores().get(category);
                System.out.println("Score: " + score);
            }
        });
        return false;
    }

    return true;
}

// Use in application
String userComment = getUserComment();
if (!isContentSafe(userComment)) {
    return "Your comment violates our content policy.";
}

Category-Specific Checks:

// Check specific categories with thresholds
var response = moderationModel.call(new ModerationPrompt(content));
var result = response.getResult();

// Custom threshold for sexual content
double sexualScore = result.getCategoryScores().get("sexual");
if (sexualScore > 0.5) {
    System.out.println("High sexual content detected");
}

// Check for violence
boolean violenceDetected = result.getCategories().get("violence");
double violenceScore = result.getCategoryScores().get("violence");

if (violenceDetected || violenceScore > 0.3) {
    System.out.println("Violence detected with score: " + violenceScore);
}

// Check for self-harm content
if (result.getCategories().get("self-harm")) {
    // Take immediate action - this is critical content
    alertModerationTeam(content);
}

OpenAiModerationOptions

Configuration options for moderation requests.

/**
 * Configuration options for OpenAI content moderation
 */
public class OpenAiModerationOptions implements ModerationOptions {
    /**
     * Create a new builder for moderation options
     * @return Builder instance
     */
    public static Builder builder();

    /**
     * Get the moderation model identifier
     * @return Model name
     */
    public String getModel();
    public void setModel(String model);
}

Builder Pattern:

public static class Builder {
    public Builder model(String model);
    public OpenAiModerationOptions build();
}

Usage Example:

// Use specific moderation model
var options = OpenAiModerationOptions.builder()
    .model("text-moderation-latest")
    .build();

var model = moderationModel.withDefaultOptions(options);

// Use stable model version
var stableOptions = OpenAiModerationOptions.builder()
    .model("text-moderation-stable")
    .build();

Configuration Parameters Reference

Model (String)

Moderation model identifier:

  • omni-moderation-latest: Latest omni moderation model (default), continuously updated with improvements
  • text-moderation-latest: Latest moderation model, continuously updated with improvements
  • text-moderation-stable: Stable model version, less frequent updates, more consistent
  • Default model is "omni-moderation-latest" (OpenAiModerationApi.DEFAULT_MODERATION_MODEL)

Types

Request Types

// High-level moderation prompt (from spring-ai-core)
public class ModerationPrompt {
    public ModerationPrompt(String text);
    public ModerationPrompt(List<String> texts);
    public ModerationPrompt(String text, ModerationOptions options);
    public ModerationPrompt(List<String> texts, ModerationOptions options);

    public List<String> getInstructions();
    public ModerationOptions getOptions();
}

// Low-level moderation request
public record OpenAiModerationRequest(
    Object input,                        // String or List<String>
    String model                         // Moderation model
) {}

Response Types

// High-level moderation response (from spring-ai-core)
public interface ModerationResponse {
    List<ModerationResult> getResults();
    ModerationResponseMetadata getMetadata();
}

public interface ModerationResult {
    boolean isFlagged();
    Map<String, Boolean> getCategories();
    Map<String, Double> getCategoryScores();
}

// Low-level moderation response
public record OpenAiModerationResponse(
    String id,                           // Response ID
    String model,                        // Model used
    List<OpenAiModerationResult> results // Moderation results
) {}

public record OpenAiModerationResult(
    Boolean flagged,                     // Overall flagged status
    Categories categories,               // Category flags
    CategoryScores categoryScores        // Category scores
) {}

Category Types

// Category flags (boolean indicators)
public record Categories(
    Boolean sexual,                      // Sexual content
    Boolean hate,                        // Hate speech
    Boolean harassment,                  // Harassment
    Boolean selfHarm,                    // Self-harm content
    Boolean sexualMinors,                // Sexual content involving minors
    Boolean hateThreatening,             // Threatening hate speech
    Boolean violenceGraphic,             // Graphic violence
    Boolean selfHarmIntent,              // Intent to self-harm
    Boolean selfHarmInstructions,        // Self-harm instructions
    Boolean harassmentThreatening,       // Threatening harassment
    Boolean violence                     // Violence
) {}

// Category scores (0.0 to 1.0, confidence levels)
public record CategoryScores(
    Double sexual,
    Double hate,
    Double harassment,
    Double selfHarm,
    Double sexualMinors,
    Double hateThreatening,
    Double violenceGraphic,
    Double selfHarmIntent,
    Double selfHarmInstructions,
    Double harassmentThreatening,
    Double violence
) {}

Metadata Types

public class OpenAiModerationGenerationMetadata implements ModerationGenerationMetadata {
    // Currently empty implementation
    // Metadata specific to individual moderation results
}

public interface ModerationResponseMetadata {
    String getId();
    String getModel();
}

Moderation Categories

Sexual Content

Content that is sexual in nature or explicitly describes sexual acts.

  • Category: sexual
  • Examples: Explicit sexual descriptions, adult content

Sexual Content (Minors)

Sexual content involving individuals under 18 years old.

  • Category: sexual/minors
  • Severity: Critical - requires immediate action
  • Examples: Any sexual content involving children

Hate Speech

Content expressing, inciting, or promoting hate based on protected characteristics.

  • Category: hate
  • Examples: Racist, sexist, or discriminatory content
  • Subcategory: hate/threatening - Includes threats of violence

Harassment

Content intended to harass, bully, or intimidate an individual.

  • Category: harassment
  • Examples: Bullying, targeted insults
  • Subcategory: harassment/threatening - Includes threats

Self-Harm

Content related to self-harm, suicide, or eating disorders.

  • Category: self-harm
  • Subcategories:
    • self-harm/intent - Expresses intent to self-harm
    • self-harm/instructions - Provides self-harm instructions
  • Severity: Critical - may require intervention

Violence

Content depicting or glorifying violence.

  • Category: violence
  • Examples: Violent acts, death, weapons
  • Subcategory: violence/graphic - Extremely graphic violence

Common Use Cases

User-Generated Content Filtering

// Filter comments before posting
public class CommentModerator {
    private final OpenAiModerationModel moderationModel;

    public CommentModerationResult moderateComment(String comment) {
        var response = moderationModel.call(new ModerationPrompt(comment));
        var result = response.getResult();

        if (result.isFlagged()) {
            // Determine reason for flagging
            List<String> violatedCategories = result.getCategories().entrySet()
                .stream()
                .filter(Map.Entry::getValue)
                .map(Map.Entry::getKey)
                .toList();

            return CommentModerationResult.rejected(violatedCategories);
        }

        return CommentModerationResult.approved();
    }
}

// Usage
var moderator = new CommentModerator(moderationModel);
var moderationResult = moderator.moderateComment(userComment);

if (moderationResult.isRejected()) {
    return "Your comment violates our policies: " +
           String.join(", ", moderationResult.getViolatedCategories());
}

saveComment(userComment);

Chat Application Safety

// Real-time chat moderation
public class ChatModerator {
    private final OpenAiModerationModel moderationModel;
    private final double THRESHOLD = 0.7;

    public ChatMessageDecision moderateMessage(ChatMessage message) {
        var response = moderationModel.call(
            new ModerationPrompt(message.getContent())
        );

        var result = response.getResult();

        // Check critical categories first
        if (isCriticalContent(result)) {
            // Block immediately and alert moderators
            alertModerators(message, result);
            return ChatMessageDecision.block("Critical policy violation");
        }

        // Check if any category exceeds threshold
        boolean hasViolation = result.getCategoryScores().values()
            .stream()
            .anyMatch(score -> score > THRESHOLD);

        if (hasViolation) {
            return ChatMessageDecision.review(
                "Flagged for manual review"
            );
        }

        return ChatMessageDecision.allow();
    }

    private boolean isCriticalContent(ModerationResult result) {
        return result.getCategories().get("sexual/minors") ||
               result.getCategories().get("self-harm/intent") ||
               result.getCategories().get("hate/threatening");
    }
}

Content Moderation Pipeline

// Multi-stage moderation pipeline
public class ModerationPipeline {
    private final OpenAiModerationModel moderationModel;

    public ContentDecision moderate(String content) {
        // Stage 1: Quick moderation check
        var response = moderationModel.call(new ModerationPrompt(content));
        var result = response.getResult();

        if (!result.isFlagged()) {
            return ContentDecision.approve();
        }

        // Stage 2: Analyze specific categories
        var categories = result.getCategories();
        var scores = result.getCategoryScores();

        // Critical content - immediate rejection
        if (categories.get("sexual/minors") ||
            categories.get("self-harm/intent")) {
            return ContentDecision.reject("Critical violation");
        }

        // High-confidence violations - reject
        long highConfidenceViolations = scores.values()
            .stream()
            .filter(score -> score > 0.9)
            .count();

        if (highConfidenceViolations > 0) {
            return ContentDecision.reject("Policy violation");
        }

        // Medium-confidence - queue for human review
        long mediumConfidenceViolations = scores.values()
            .stream()
            .filter(score -> score > 0.5)
            .count();

        if (mediumConfidenceViolations > 0) {
            return ContentDecision.queueForReview();
        }

        // Low confidence flags - allow with monitoring
        return ContentDecision.approveWithMonitoring();
    }
}

Batch Content Screening

// Screen multiple content items efficiently
public class BatchModerator {
    private final OpenAiModerationModel moderationModel;

    public Map<String, ModerationResult> moderateBatch(List<String> contents) {
        // Moderate up to 100 texts at once
        var response = moderationModel.call(new ModerationPrompt(contents));

        Map<String, ModerationResult> results = new HashMap<>();
        for (int i = 0; i < contents.size(); i++) {
            results.put(contents.get(i), response.getResults().get(i));
        }

        return results;
    }

    public List<String> filterSafeContent(List<String> contents) {
        var results = moderateBatch(contents);

        return contents.stream()
            .filter(content -> !results.get(content).isFlagged())
            .toList();
    }
}

// Usage
var batchModerator = new BatchModerator(moderationModel);
List<String> userPosts = getUserPosts();
List<String> safePosts = batchModerator.filterSafeContent(userPosts);

API Input Validation

// Validate API inputs before processing
public class ApiInputValidator {
    private final OpenAiModerationModel moderationModel;

    public void validateInput(String userInput) throws InvalidInputException {
        var response = moderationModel.call(new ModerationPrompt(userInput));
        var result = response.getResult();

        if (result.isFlagged()) {
            // Log the violation
            logViolation(userInput, result);

            // Get specific violation details
            String violationType = result.getCategories().entrySet()
                .stream()
                .filter(Map.Entry::getValue)
                .map(Map.Entry::getKey)
                .findFirst()
                .orElse("policy_violation");

            throw new InvalidInputException(
                "Input violates content policy: " + violationType
            );
        }
    }
}

// Usage in API endpoint
@PostMapping("/api/process")
public Response processRequest(@RequestBody Request request) {
    try {
        validator.validateInput(request.getText());
        return processValidRequest(request);
    } catch (InvalidInputException e) {
        return Response.error(400, e.getMessage());
    }
}

Automated Reporting System

// Generate moderation reports
public class ModerationReporter {
    private final OpenAiModerationModel moderationModel;

    public ModerationReport analyzeContent(List<String> contents) {
        var report = new ModerationReport();

        for (String content : contents) {
            var response = moderationModel.call(new ModerationPrompt(content));
            var result = response.getResult();

            if (result.isFlagged()) {
                report.addViolation(new Violation(
                    content,
                    result.getCategories(),
                    result.getCategoryScores()
                ));
            } else {
                report.incrementSafeCount();
            }
        }

        return report;
    }
}

// Generate summary report
public class ModerationReport {
    private int safeCount = 0;
    private List<Violation> violations = new ArrayList<>();

    public String generateSummary() {
        int totalViolations = violations.size();
        int totalChecked = safeCount + totalViolations;

        var categoryBreakdown = violations.stream()
            .flatMap(v -> v.categories().entrySet().stream())
            .filter(Map.Entry::getValue)
            .collect(Collectors.groupingBy(
                Map.Entry::getKey,
                Collectors.counting()
            ));

        return String.format("""
            Moderation Summary:
            Total Items Checked: %d
            Safe: %d (%.1f%%)
            Violations: %d (%.1f%%)

            Category Breakdown:
            %s
            """,
            totalChecked,
            safeCount,
            (safeCount * 100.0 / totalChecked),
            totalViolations,
            (totalViolations * 100.0 / totalChecked),
            formatCategoryBreakdown(categoryBreakdown)
        );
    }
}

Proactive Content Monitoring

// Monitor content over time
public class ContentMonitor {
    private final OpenAiModerationModel moderationModel;
    private final Map<String, List<ModerationResult>> userHistory = new HashMap<>();

    public MonitoringDecision checkContent(String userId, String content) {
        var response = moderationModel.call(new ModerationPrompt(content));
        var result = response.getResult();

        // Store in user history
        userHistory.computeIfAbsent(userId, k -> new ArrayList<>())
                   .add(result);

        // Current content check
        if (result.isFlagged()) {
            return MonitoringDecision.reject();
        }

        // Check user's violation history
        var userViolations = userHistory.get(userId);
        long recentViolations = userViolations.stream()
            .filter(ModerationResult::isFlagged)
            .count();

        // Pattern detection - multiple near-violations
        if (recentViolations >= 3) {
            return MonitoringDecision.warn(
                "Multiple policy violations detected"
            );
        }

        return MonitoringDecision.allow();
    }
}

Best Practices

Threshold Configuration

  • Default API thresholds are conservative
  • Consider custom thresholds based on your use case
  • More permissive for creative content platforms
  • Stricter for children's content or professional platforms

Performance Optimization

  • Batch multiple texts in single API call (up to 100 items)
  • Cache moderation results for identical content
  • Implement rate limiting to manage API costs

User Experience

  • Provide clear feedback when content is rejected
  • Explain which policy was violated (without revealing exact scores)
  • Allow users to edit and resubmit content
  • Consider "shadow flagging" for borderline content

Critical Content Handling

  • Immediate action for: sexual/minors, self-harm/intent
  • Alert human moderators for critical violations
  • Consider local authority reporting for illegal content
  • Implement escalation procedures

Privacy and Compliance

  • Do not store flagged content longer than necessary
  • Anonymize moderation logs
  • Comply with GDPR and local privacy regulations
  • Provide transparency about moderation policies

False Positives

  • Moderation isn't perfect - expect some false positives
  • Implement appeal process for rejected content
  • Manual review queue for borderline cases
  • Regular audit of moderation decisions

Multi-Language Support

  • Moderation works across multiple languages
  • Performance may vary by language
  • Consider language-specific thresholds
  • Test thoroughly for your target languages

Install with Tessl CLI

npx tessl i tessl/maven-org-springframework-ai--spring-ai-openai

docs

index.md

tile.json