CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-source-jina-ai-reader

Airbyte source connector for Jina AI Reader API enabling web content extraction and search through intelligent reading services

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

http-handling.mddocs/

HTTP Request Handling

Custom HTTP requester with Bearer token authentication for secure API access to Jina AI services. The JinaAiHttpRequester extends Airbyte's standard HTTP requester to provide authentication and header management specific to Jina AI's API requirements.

Capabilities

Custom HTTP Requester

Extends Airbyte's HttpRequester to provide custom authentication and header handling for Jina AI API integration.

@dataclass
class JinaAiHttpRequester(HttpRequester):
    """
    Custom HTTP requester for Jina AI Reader API integration.
    
    Extends Airbyte CDK's HttpRequester to provide Bearer token authentication
    and custom header management for Jina AI Reader and Search APIs.
    
    Attributes:
        request_headers (Optional[Union[str, Mapping[str, str]]]): 
            Custom headers configuration for API requests
    """
    
    request_headers: Optional[Union[str, Mapping[str, str]]] = None

Post-Initialization Setup

Handles setup of header interpolation after object initialization.

def __post_init__(self, parameters: Mapping[str, Any]) -> None:
    """
    Post-initialization setup for header interpolation.
    
    Args:
        parameters (Mapping[str, Any]): Configuration parameters from the connector
        
    Initializes the headers interpolator that processes template variables
    in request headers, enabling dynamic header values based on configuration
    and runtime context.
    """

Request Header Management

Builds and manages HTTP request headers including Bearer token authentication.

def get_request_headers(
    self,
    *,
    stream_state: Optional[StreamState] = None,
    stream_slice: Optional[StreamSlice] = None,
    next_page_token: Optional[Mapping[str, Any]] = None,
) -> Mapping[str, Any]:
    """
    Generate HTTP request headers with Bearer token authentication.
    
    Args:
        stream_state (Optional[StreamState]): Current state of the data stream
        stream_slice (Optional[StreamSlice]): Current slice being processed
        next_page_token (Optional[Mapping[str, Any]]): Pagination token if applicable
        
    Returns:
        Mapping[str, Any]: Dictionary of HTTP headers including authentication
        
    This method:
    1. Evaluates header templates using the interpolator
    2. Checks for api_key in configuration
    3. Adds Bearer token authentication header if api_key is present
    4. Returns complete header dictionary for API requests
    
    The Bearer token is only added when api_key is configured, making
    authentication optional for public API access.
    """

HTTP Request Configuration

The HTTP requester is configured through the manifest.yaml file and supports the following patterns:

Authentication Headers

# Automatic Bearer token authentication when api_key is configured
headers = {
    "Authorization": f"Bearer {api_key}",  # Added automatically if api_key present
    "Accept": "application/json",           # Always included
    "X-With-Links-Summary": "true",         # Based on gather_links config
    "X-With-Images-Summary": "false"        # Based on gather_images config
}

API Endpoints

The requester handles requests to two main Jina AI endpoints:

Reader Stream:

  • Base URL: https://r.jina.ai/{read_prompt}
  • Method: GET
  • Purpose: Extract content from specified URLs

Search Stream:

  • Base URL: https://s.jina.ai/{search_prompt}
  • Method: GET
  • Purpose: Perform web searches with content extraction

Request Headers Configuration

Headers are configured through template interpolation supporting:

request_headers:
  Accept: application/json
  X-With-Links-Summary: "{{ config['gather_links'] }}"
  X-With-Images-Summary: "{{ config['gather_images'] }}"

Integration with Airbyte CDK

HttpRequester Inheritance

The custom requester inherits from Airbyte CDK's HttpRequester:

  • Base Functionality: Standard HTTP request handling, retries, error handling
  • Custom Extensions: Bearer token authentication, header interpolation
  • Template Support: Dynamic header values based on configuration
  • Stream Context: Access to stream state and pagination context

Declarative Configuration

Configured through manifest.yaml as a custom requester:

requester:
  type: CustomRequester
  class_name: source_jina_ai_reader.components.JinaAiHttpRequester
  url_base: "https://r.jina.ai/{{ config['read_prompt'] }}"
  http_method: "GET"
  path: "/"
  authenticator:
    type: NoAuth  # Authentication handled by custom requester

Usage Examples

With API Key Authentication

# Configuration with API key
config = {
    "api_key": "jina_abc123xyz",
    "read_prompt": "https://example.com",
    "gather_links": True,
    "gather_images": False
}

# Results in headers:
# {
#     "Authorization": "Bearer jina_abc123xyz",
#     "Accept": "application/json", 
#     "X-With-Links-Summary": "true",
#     "X-With-Images-Summary": "false"
# }

Without API Key (Public Access)

# Configuration without API key
config = {
    "read_prompt": "https://example.com",
    "gather_links": False,
    "gather_images": True
}

# Results in headers:
# {
#     "Accept": "application/json",
#     "X-With-Links-Summary": "false", 
#     "X-With-Images-Summary": "true"
# }
# No Authorization header added

Error Handling

  • Header Validation: Ensures headers are properly formatted dictionaries
  • Authentication: Gracefully handles missing api_key by omitting auth header
  • Template Processing: Robust interpolation of configuration values
  • HTTP Errors: Inherits standard Airbyte CDK error handling and retry logic

Install with Tessl CLI

npx tessl i tessl/pypi-source-jina-ai-reader

docs

configuration.md

core-interface.md

data-streams.md

http-handling.md

index.md

tile.json