or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

configuration.mdcore-interface.mddata-streams.mdhttp-handling.mdindex.md
tile.json

tessl/pypi-source-jina-ai-reader

Airbyte source connector for Jina AI Reader API enabling web content extraction and search through intelligent reading services

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/source-jina-ai-reader@0.1.x

To install, run

npx @tessl/cli install tessl/pypi-source-jina-ai-reader@0.1.0

index.mddocs/

Jina AI Reader Source Connector

An Airbyte source connector for the Jina AI Reader API that enables intelligent web content extraction, search, and reading through Jina AI's services. The connector provides two main data streams for reading web content from URLs and performing web searches with optional link and image summarization.

Package Information

  • Package Name: source-jina-ai-reader
  • Package Type: pypi
  • Language: Python
  • Installation: pip install source-jina-ai-reader or poetry add source-jina-ai-reader
  • Framework: Airbyte CDK Declarative (Low-Code)

Core Imports

from source_jina_ai_reader import SourceJinaAiReader

For running the connector:

from source_jina_ai_reader.run import run

For config migration:

from source_jina_ai_reader.config_migration import JinaAiReaderConfigMigration

For custom components:

from source_jina_ai_reader.components import JinaAiHttpRequester

Airbyte CDK imports:

from airbyte_cdk.sources.declarative.types import StreamSlice, StreamState
from airbyte_cdk.sources.message import MessageRepository, InMemoryMessageRepository

Basic Usage

As an Airbyte Source Connector

from source_jina_ai_reader import SourceJinaAiReader

# Initialize the connector
source = SourceJinaAiReader()

# Use with Airbyte framework
# Configuration is handled through Airbyte's config system

Command Line Usage

# Install via poetry
poetry install --with dev

# Run connector operations
poetry run source-jina-ai-reader spec
poetry run source-jina-ai-reader check --config config.json
poetry run source-jina-ai-reader discover --config config.json  
poetry run source-jina-ai-reader read --config config.json --catalog catalog.json

Configuration Example

{
  "api_key": "jina_your_api_key_here",
  "read_prompt": "https://example.com",
  "search_prompt": "AI%20powered%20search",
  "gather_links": true,
  "gather_images": true
}

Architecture

The connector follows Airbyte's declarative source pattern using YAML configuration:

  • SourceJinaAiReader: Main connector class inheriting from YamlDeclarativeSource
  • JinaAiHttpRequester: Custom HTTP requester handling Bearer token authentication
  • JinaAiReaderConfigMigration: Runtime configuration migration for URL encoding
  • Manifest Configuration: YAML-based stream definitions with two data streams
  • CLI Interface: Standard Airbyte operations (spec, check, discover, read)

The connector transforms Jina AI's web content extraction and search APIs into structured Airbyte data streams, handling authentication, request formatting, and data transformation automatically.

Capabilities

Core Connector Interface

Main connector class and entry point functions that provide the foundation for Airbyte integration and command-line usage.

class SourceJinaAiReader(YamlDeclarativeSource):
    def __init__(self): ...

def run() -> None: ...

Core Interface

Configuration Management

Configuration handling including validation, migration, and URL encoding for search prompts to ensure proper API integration.

class JinaAiReaderConfigMigration:
    @classmethod
    def should_migrate(cls, config: Mapping[str, Any]) -> bool: ...
    
    @classmethod  
    def modify(cls, config: Mapping[str, Any]) -> Mapping[str, Any]: ...
    
    @classmethod
    def migrate(cls, args: List[str], source: Source) -> None: ...

Configuration

HTTP Request Handling

Custom HTTP requester with Bearer token authentication for secure API access to Jina AI services.

class JinaAiHttpRequester(HttpRequester):
    def get_request_headers(
        self,
        *,
        stream_state: Optional[StreamState] = None,
        stream_slice: Optional[StreamSlice] = None, 
        next_page_token: Optional[Mapping[str, Any]] = None,
    ) -> Mapping[str, Any]: ...

HTTP Handling

Data Streams

Two main data streams providing web content extraction and search functionality through Jina AI's intelligent reading services.

Stream Types:

  • reader: Extracts content from specified URLs
  • search: Performs web searches with optional content summarization

Data Schema:

  • title: string - Content title
  • url: string - Source URL
  • content: string - Extracted/searched content
  • description: string - Content description
  • links: object - Optional links summary

Data Streams

Types

# Configuration types
ConfigDict = Mapping[str, Any]

# Airbyte CDK stream types 
StreamState = Optional[Mapping[str, Any]]  # Current state of data stream for incremental sync
StreamSlice = Optional[Mapping[str, Any]]  # Current slice being processed for parallel execution

# Airbyte CDK message types
MessageRepository = InMemoryMessageRepository  # Repository for storing connector messages

# Data record structure returned by both streams
class ContentRecord(TypedDict):
    title: str  # Content or search result title
    url: str  # Source URL of the content
    content: str  # Extracted text content
    description: str  # Brief description or summary
    links: Dict[str, Any]  # Optional links summary with dynamic properties

# Configuration specification
class ConfigSpec(TypedDict, total=False):
    api_key: str  # Optional Jina AI API key (marked as secret in manifest)
    read_prompt: str  # URL to read content from (default: "https://www.google.com")
    search_prompt: str  # URL-encoded search query (default: "Search%20airbyte")
    gather_links: bool  # Include links summary section (optional parameter)
    gather_images: bool  # Include images summary section (optional parameter)