CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/maven-com-embabel-agent--embabel-agent-rag-core

RAG (Retrieval-Augmented Generation) framework for the Embabel Agent platform providing content ingestion, chunking, hierarchical navigation, and semantic search capabilities

Overview
Eval results
Files

data-models.mddocs/api-reference/

Data Models API Reference

Core data structures for representing content, entities, relationships, and retrievable items in the RAG framework.

Base Interfaces

Foundation interfaces for all data objects in the system.

Datum

Base interface for all data objects with metadata support.

sealed interface Datum {
    val id: String
    val uri: String?
    val metadata: Map<String, Any?>

    fun propertiesToPersist(): Map<String, Any?>
    fun labels(): Set<String>
}

Properties:

  • id: Unique identifier for the datum
  • uri: Optional URI reference for the datum
  • metadata: Key-value metadata associated with the datum

Methods:

  • propertiesToPersist(): Returns properties to be persisted to storage
  • labels(): Returns set of type labels for the datum

Embedded

Interface for objects that have embeddings.

interface Embedded {
    val embedding: Embedding?
}

Properties:

  • embedding: Optional vector embedding for semantic search

Embeddable

Interface for objects that can be embedded.

interface Embeddable {
    fun embeddableValue(): String
}

Methods:

  • embeddableValue(): Returns text representation to be embedded

Retrievable

Base interface for RAG-retrievable objects with stable IDs.

interface Retrievable : HasInfoString, Datum, Embeddable

Combines:

  • HasInfoString: Can produce human-readable info strings
  • Datum: Has ID, URI, and metadata
  • Embeddable: Can be converted to embedding

ContentElement

Base interface for all content elements.

interface ContentElement : Datum

Represents any element in a content hierarchy.

HierarchicalContentElement

Content elements that exist within a hierarchy.

interface HierarchicalContentElement : ContentElement {
    val parentId: String?
}

Properties:

  • parentId: ID of the parent element (null for root elements)

Document Hierarchy

Structures for representing hierarchical documents with sections and content.

ContentRoot

Root of a structured document.

interface ContentRoot : HierarchicalContentElement {
    override val uri: String // Required, non-null
    val title: String
    val ingestionTimestamp: Instant
}

Properties:

  • uri: Required URI for the document (non-null)
  • title: Document title
  • ingestionTimestamp: When document was ingested

Section

Base interface for all sections in the hierarchy.

sealed interface Section : HierarchicalContentElement {
    val title: String
}

Properties:

  • title: Section title

NavigableSection

Section with direct children navigation.

interface NavigableSection : Section {
    val children: Iterable<NavigableSection>
}

Properties:

  • children: Direct child sections

ContainerSection

Section that contains child sections.

interface ContainerSection : Section

NavigableContainerSection

Container section with navigation methods for traversing the hierarchy.

interface NavigableContainerSection : ContainerSection, NavigableSection {
    /**
     * Get all descendant sections (recursive)
     */
    fun descendants(): Iterable<NavigableSection>

    /**
     * Get all descendant leaf sections
     */
    fun leaves(): Iterable<LeafSection>
}

Methods:

  • descendants(): Returns all descendant sections recursively
  • leaves(): Returns all leaf sections (terminal nodes)

LeafSection

Terminal section containing content without further subdivisions.

data class LeafSection(
    val text: String,
    val title: String,
    val parentId: String?,
    override val id: String,
    override val uri: String?,
    override val metadata: Map<String, Any?> = emptyMap()
) : NavigableSection, Retrievable, HasContent

Properties:

  • text: Content text
  • title: Section title
  • parentId: Parent section ID
  • id: Unique identifier
  • uri: Optional URI
  • metadata: Metadata map

DefaultMaterializedContainerSection

In-memory representation of a container section.

data class DefaultMaterializedContainerSection(
    val title: String,
    val children: List<NavigableSection>,
    override val id: String,
    override val parentId: String?,
    override val uri: String?,
    override val metadata: Map<String, Any?> = emptyMap()
) : NavigableContainerSection

Properties:

  • title: Section title
  • children: List of child sections
  • id: Unique identifier
  • parentId: Parent section ID
  • uri: Optional URI
  • metadata: Metadata map

NavigableDocument

Navigable document root interface.

interface NavigableDocument : ContentRoot, NavigableContainerSection {
    /**
     * Create a copy with additional metadata
     */
    fun withMetadata(additionalMetadata: Map<String, Any?>): NavigableDocument
}

Methods:

  • withMetadata(): Returns new document with merged metadata

MaterializedDocument

In-memory representation of a complete document.

data class MaterializedDocument(
    val title: String,
    val ingestionTimestamp: Instant,
    val children: List<NavigableSection>,
    override val id: String,
    override val uri: String,
    override val metadata: Map<String, Any?> = emptyMap()
) : NavigableDocument

Properties:

  • title: Document title
  • ingestionTimestamp: Ingestion timestamp
  • children: Top-level sections
  • id: Unique identifier
  • uri: Document URI (required)
  • metadata: Metadata map

Chunk Model

Traditional RAG text chunks with metadata for indexing and retrieval.

Chunk

Text chunk interface with support for transformation and metadata enrichment.

interface Chunk : Source, HierarchicalContentElement {
    val text: String // Indexed text
    val urtext: String // Raw text for citation
    override val parentId: String // Non-null parent
    val pathFromRoot: List<String>?
    val uri: String?

    /**
     * Create a new chunk with transformed text
     */
    fun withText(transformed: String): Chunk

    /**
     * Create a new chunk with additional metadata
     */
    fun withAdditionalMetadata(metadata: Map<String, Any?>): Chunk

    companion object {
        /**
         * Create a chunk
         */
        operator fun invoke(
            id: String,
            text: String,
            metadata: Map<String, Any?>,
            parentId: String
        ): Chunk

        @JvmStatic
        fun create(
            text: String,
            parentId: String,
            metadata: Map<String, Any?> = emptyMap(),
            id: String = UUID.randomUUID().toString(),
            urtext: String = text
        ): Chunk
    }
}

Properties:

  • text: Indexed text (may be transformed)
  • urtext: Original raw text for citation
  • parentId: Parent section ID (required)
  • pathFromRoot: Path from document root to chunk
  • uri: Optional URI

Methods:

  • withText(): Returns new chunk with modified text
  • withAdditionalMetadata(): Returns new chunk with merged metadata

Factory Methods:

  • invoke(): Create chunk with specified properties
  • create(): Create chunk with defaults (generates UUID)

Standard Chunk Metadata Keys

Constants for standard chunk metadata from ContentChunker:

companion object {
    const val CHUNK_INDEX = "chunk_index"
    const val TOTAL_CHUNKS = "total_chunks"
    const val SEQUENCE_NUMBER = "sequence_number"
    const val ROOT_DOCUMENT_ID = "root_document_id"
    const val CONTAINER_SECTION_ID = "container_section_id"
    const val CONTAINER_SECTION_TITLE = "container_section_title"
    const val CONTAINER_SECTION_URL = "container_section_url"
    const val LEAF_SECTION_ID = "leaf_section_id"
    const val LEAF_SECTION_TITLE = "leaf_section_title"
    const val LEAF_SECTION_URL = "leaf_section_url"
}

Metadata Keys:

  • CHUNK_INDEX: Index within parent section (0-based)
  • TOTAL_CHUNKS: Total chunks from parent section
  • SEQUENCE_NUMBER: Global sequence number across document
  • ROOT_DOCUMENT_ID: ID of document root
  • CONTAINER_SECTION_ID: ID of container section
  • CONTAINER_SECTION_TITLE: Title of container section
  • CONTAINER_SECTION_URL: URL of container section
  • LEAF_SECTION_ID: ID of leaf section
  • LEAF_SECTION_TITLE: Title of leaf section
  • LEAF_SECTION_URL: URL of leaf section

Source Model

Input data for RAG systems (chunks or facts).

Source

Base interface for RAG input data.

sealed interface Source : Retrievable

Fact

Factual assertion with authority.

data class Fact(
    val assertion: String,
    val authority: String,
    override val uri: String?,
    override val metadata: Map<String, Any?>,
    override val id: String
) : Source

Properties:

  • assertion: The factual statement
  • authority: Source of authority for the fact
  • uri: Optional URI reference
  • metadata: Associated metadata
  • id: Unique identifier

Named Entity Model

Structured entities with properties and relationships.

NamedEntity

Base contract for named entities.

interface NamedEntity : Retrievable, NamedAndDescribed {
    override val id: String
    override val name: String
    override val description: String
    val uri: String? get() = null
    val metadata: Map<String, Any?> get() = emptyMap()

    fun labels(): Set<String>
    fun embeddableValue(): String
    fun infoString(verbose: Boolean? = null, indent: Int = 0): String
}

Properties:

  • id: Unique identifier
  • name: Entity name
  • description: Entity description
  • uri: Optional URI (default null)
  • metadata: Metadata map (default empty)

Methods:

  • labels(): Returns type labels for entity
  • embeddableValue(): Returns text for embedding
  • infoString(): Returns formatted info string

NamedEntityData

Storage format for named entities with arbitrary properties.

interface NamedEntityData : NamedEntity {
    val properties: Map<String, Any>
    val linkedDomainType: DomainType?

    /**
     * Convert to typed instance using ObjectMapper
     */
    fun <T : NamedEntity> toTypedInstance(objectMapper: ObjectMapper): T?

    fun <T : NamedEntity> toTypedInstance(
        objectMapper: ObjectMapper,
        type: Class<T>
    ): T?

    fun <T : NamedEntity> toTypedInstance(
        objectMapper: ObjectMapper,
        type: Class<T>,
        navigator: RelationshipNavigator?
    ): T?

    /**
     * Create dynamic proxy instance implementing specified interfaces
     */
    fun <T : NamedEntity> toInstance(
        vararg interfaces: Class<out NamedEntity>
    ): T

    fun <T : NamedEntity> toInstance(
        navigator: RelationshipNavigator?,
        vararg interfaces: Class<out NamedEntity>
    ): T

    companion object {
        val DEFAULT_EXCLUDED_PROPERTIES = setOf("embedding", "id")
        const val ENTITY_LABEL = "__Entity__"
        const val HAS_ENTITY = "HAS_ENTITY"
    }
}

Properties:

  • properties: Map of arbitrary properties
  • linkedDomainType: Optional domain type reference

Methods:

  • toTypedInstance(): Convert to typed entity class using ObjectMapper
  • toInstance(): Create dynamic proxy implementing specified interfaces

Constants:

  • DEFAULT_EXCLUDED_PROPERTIES: Properties not persisted by default
  • ENTITY_LABEL: Standard label for entities
  • HAS_ENTITY: Standard relationship name

SimpleNamedEntityData

Simple implementation of NamedEntityData.

data class SimpleNamedEntityData(
    override val id: String,
    override val name: String,
    override val description: String,
    override val properties: Map<String, Any> = emptyMap(),
    override val linkedDomainType: DomainType? = null,
    override val uri: String? = null,
    override val metadata: Map<String, Any?> = emptyMap()
) : NamedEntityData

Properties:

  • id: Unique identifier
  • name: Entity name
  • description: Entity description
  • properties: Property map (default empty)
  • linkedDomainType: Optional domain type
  • uri: Optional URI
  • metadata: Metadata map (default empty)

Relationship Model

Annotations and interfaces for entity relationships.

Relationship Annotation

Marks getter methods as navigating relationships.

@Target(AnnotationTarget.FUNCTION)
@Retention(AnnotationRetention.RUNTIME)
annotation class Relationship(
    val name: String = "",
    val direction: RelationshipDirection = RelationshipDirection.OUTGOING
)

Parameters:

  • name: Relationship name (derived from method name if empty)
  • direction: Direction of relationship traversal

RelationshipDirection

Direction of relationships.

enum class RelationshipDirection {
    OUTGOING,
    INCOMING,
    BOTH
}

Values:

  • OUTGOING: Follow relationships from source to target
  • INCOMING: Follow relationships from target to source
  • BOTH: Follow relationships in both directions

RelationshipNavigator

Provides relationship navigation capabilities.

interface RelationshipNavigator {
    fun findRelated(
        source: RetrievableIdentifier,
        relationshipName: String,
        direction: RelationshipDirection
    ): List<NamedEntityData>
}

Methods:

  • findRelated(): Find entities related to source by relationship

Parameters:

  • source: Source entity identifier
  • relationshipName: Name of relationship to follow
  • direction: Direction to traverse

Returns: List of related entities

Relationship Utility Functions

Top-level functions for working with relationships.

/**
 * Derive relationship name from method name
 * Converts method names like "getColleagues" to "COLLEAGUES"
 * @param methodName Method name to convert
 * @return Derived relationship name
 */
fun deriveRelationshipName(methodName: String): String

Parameters:

  • methodName: Method name to convert

Returns: Derived relationship name (uppercase, without "get" prefix)


Usage Examples

Working with Documents

import com.embabel.agent.rag.model.*
import java.time.Instant

// Create a document structure
val leafSection = LeafSection(
    text = "This is the content of the section.",
    title = "Introduction",
    parentId = "doc-1",
    id = "section-1",
    uri = "https://example.com/docs#intro"
)

val document = MaterializedDocument(
    title = "User Guide",
    ingestionTimestamp = Instant.now(),
    children = listOf(leafSection),
    id = "doc-1",
    uri = "https://example.com/docs"
)

// Navigate the document
val leaves = document.leaves().toList()
val allSections = document.descendants().toList()

// Add metadata
val enrichedDoc = document.withMetadata(
    mapOf("author" to "Alice", "version" to "1.0")
)

Working with Chunks

import com.embabel.agent.rag.model.Chunk
import com.embabel.agent.rag.ingestion.ContentChunker

// Create a chunk
val chunk = Chunk.create(
    text = "This is the indexed text content.",
    parentId = "section-1",
    metadata = mapOf(
        ContentChunker.CHUNK_INDEX to 0,
        ContentChunker.TOTAL_CHUNKS to 5,
        ContentChunker.ROOT_DOCUMENT_ID to "doc-1"
    )
)

// Transform chunk text
val transformed = chunk.withText("TRANSFORMED: ${chunk.text}")

// Add metadata
val enriched = chunk.withAdditionalMetadata(
    mapOf("sentiment" to "positive", "language" to "en")
)

// Access metadata
val chunkIndex = chunk.metadata[ContentChunker.CHUNK_INDEX] as? Int
val rootId = chunk.metadata[ContentChunker.ROOT_DOCUMENT_ID] as? String

Working with Named Entities

import com.embabel.agent.rag.model.*

// Create a named entity
val person = SimpleNamedEntityData(
    id = "person-123",
    name = "Alice Smith",
    description = "Senior software engineer",
    properties = mapOf(
        "role" to "engineer",
        "team" to "platform",
        "yearsExperience" to 8
    )
)

// Convert to typed instance (requires ObjectMapper)
data class Person(
    override val id: String,
    override val name: String,
    override val description: String,
    val role: String,
    val team: String,
    val yearsExperience: Int
) : NamedEntity {
    override fun labels() = setOf("Person")
    override fun embeddableValue() = "$name: $description"
    override fun infoString(verbose: Boolean?, indent: Int) = name
}

val typedPerson = person.toTypedInstance<Person>(objectMapper, Person::class.java)

// Create dynamic proxy for interfaces
interface Employee : NamedEntity {
    @Relationship(name = "WORKS_WITH")
    fun getColleagues(): List<NamedEntity>
}

val employee = person.toInstance<Employee>(
    navigator,
    Employee::class.java
)

Working with Relationships

import com.embabel.agent.rag.model.*
import com.embabel.agent.rag.service.*

// Define an entity interface with relationships
interface Project : NamedEntity {
    @Relationship(name = "HAS_CONTRIBUTOR", direction = RelationshipDirection.INCOMING)
    fun getContributors(): List<NamedEntity>

    @Relationship(name = "DEPENDS_ON")
    fun getDependencies(): List<NamedEntity>
}

// Navigate relationships using RelationshipNavigator
val navigator: RelationshipNavigator = // implementation

val contributors = navigator.findRelated(
    source = RetrievableIdentifier("project-1", "Project"),
    relationshipName = "HAS_CONTRIBUTOR",
    direction = RelationshipDirection.INCOMING
)

val dependencies = navigator.findRelated(
    source = RetrievableIdentifier("project-1", "Project"),
    relationshipName = "DEPENDS_ON",
    direction = RelationshipDirection.OUTGOING
)

// Derive relationship name from method
val relationshipName = deriveRelationshipName("getColleagues")
// Returns: "COLLEAGUES"

Working with Facts

import com.embabel.agent.rag.model.Fact

// Create a fact
val fact = Fact(
    id = "fact-456",
    assertion = "The framework supports vector search.",
    authority = "Official documentation",
    uri = "https://docs.example.com/features",
    metadata = mapOf(
        "confidence" to 0.95,
        "source" to "manual"
    )
)

// Access properties
println("Assertion: ${fact.assertion}")
println("Authority: ${fact.authority}")
println("Embeddable: ${fact.embeddableValue()}")

Building Document Hierarchies

import com.embabel.agent.rag.model.*
import java.time.Instant

// Build a complex document structure
val section1 = LeafSection(
    text = "Introduction content",
    title = "Introduction",
    parentId = "doc-1",
    id = "section-1",
    uri = "https://example.com/docs#intro"
)

val section2 = LeafSection(
    text = "Getting started content",
    title = "Getting Started",
    parentId = "doc-1",
    id = "section-2",
    uri = "https://example.com/docs#getting-started"
)

val subsection = LeafSection(
    text = "Installation instructions",
    title = "Installation",
    parentId = "section-2",
    id = "section-2-1",
    uri = "https://example.com/docs#installation"
)

val containerSection = DefaultMaterializedContainerSection(
    title = "Setup",
    children = listOf(section2, subsection),
    id = "container-1",
    parentId = "doc-1",
    uri = "https://example.com/docs#setup"
)

val document = MaterializedDocument(
    title = "Complete Guide",
    ingestionTimestamp = Instant.now(),
    children = listOf(section1, containerSection),
    id = "doc-1",
    uri = "https://example.com/docs",
    metadata = mapOf(
        "author" to "Documentation Team",
        "version" to "2.0",
        "language" to "en"
    )
)

// Navigate hierarchy
val allLeaves = document.leaves().toList()
println("Total leaf sections: ${allLeaves.size}")

val allDescendants = document.descendants().toList()
println("Total sections: ${allDescendants.size}")
tessl i tessl/maven-com-embabel-agent--embabel-agent-rag-core@0.3.1

docs

index.md

README.md

tile.json