Airbyte source connector for Jina AI Reader API enabling web content extraction and search through intelligent reading services
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Custom HTTP requester with Bearer token authentication for secure API access to Jina AI services. The JinaAiHttpRequester extends Airbyte's standard HTTP requester to provide authentication and header management specific to Jina AI's API requirements.
Extends Airbyte's HttpRequester to provide custom authentication and header handling for Jina AI API integration.
@dataclass
class JinaAiHttpRequester(HttpRequester):
"""
Custom HTTP requester for Jina AI Reader API integration.
Extends Airbyte CDK's HttpRequester to provide Bearer token authentication
and custom header management for Jina AI Reader and Search APIs.
Attributes:
request_headers (Optional[Union[str, Mapping[str, str]]]):
Custom headers configuration for API requests
"""
request_headers: Optional[Union[str, Mapping[str, str]]] = NoneHandles setup of header interpolation after object initialization.
def __post_init__(self, parameters: Mapping[str, Any]) -> None:
"""
Post-initialization setup for header interpolation.
Args:
parameters (Mapping[str, Any]): Configuration parameters from the connector
Initializes the headers interpolator that processes template variables
in request headers, enabling dynamic header values based on configuration
and runtime context.
"""Builds and manages HTTP request headers including Bearer token authentication.
def get_request_headers(
self,
*,
stream_state: Optional[StreamState] = None,
stream_slice: Optional[StreamSlice] = None,
next_page_token: Optional[Mapping[str, Any]] = None,
) -> Mapping[str, Any]:
"""
Generate HTTP request headers with Bearer token authentication.
Args:
stream_state (Optional[StreamState]): Current state of the data stream
stream_slice (Optional[StreamSlice]): Current slice being processed
next_page_token (Optional[Mapping[str, Any]]): Pagination token if applicable
Returns:
Mapping[str, Any]: Dictionary of HTTP headers including authentication
This method:
1. Evaluates header templates using the interpolator
2. Checks for api_key in configuration
3. Adds Bearer token authentication header if api_key is present
4. Returns complete header dictionary for API requests
The Bearer token is only added when api_key is configured, making
authentication optional for public API access.
"""The HTTP requester is configured through the manifest.yaml file and supports the following patterns:
# Automatic Bearer token authentication when api_key is configured
headers = {
"Authorization": f"Bearer {api_key}", # Added automatically if api_key present
"Accept": "application/json", # Always included
"X-With-Links-Summary": "true", # Based on gather_links config
"X-With-Images-Summary": "false" # Based on gather_images config
}The requester handles requests to two main Jina AI endpoints:
Reader Stream:
https://r.jina.ai/{read_prompt}Search Stream:
https://s.jina.ai/{search_prompt}Headers are configured through template interpolation supporting:
request_headers:
Accept: application/json
X-With-Links-Summary: "{{ config['gather_links'] }}"
X-With-Images-Summary: "{{ config['gather_images'] }}"The custom requester inherits from Airbyte CDK's HttpRequester:
Configured through manifest.yaml as a custom requester:
requester:
type: CustomRequester
class_name: source_jina_ai_reader.components.JinaAiHttpRequester
url_base: "https://r.jina.ai/{{ config['read_prompt'] }}"
http_method: "GET"
path: "/"
authenticator:
type: NoAuth # Authentication handled by custom requester# Configuration with API key
config = {
"api_key": "jina_abc123xyz",
"read_prompt": "https://example.com",
"gather_links": True,
"gather_images": False
}
# Results in headers:
# {
# "Authorization": "Bearer jina_abc123xyz",
# "Accept": "application/json",
# "X-With-Links-Summary": "true",
# "X-With-Images-Summary": "false"
# }# Configuration without API key
config = {
"read_prompt": "https://example.com",
"gather_links": False,
"gather_images": True
}
# Results in headers:
# {
# "Accept": "application/json",
# "X-With-Links-Summary": "false",
# "X-With-Images-Summary": "true"
# }
# No Authorization header addedInstall with Tessl CLI
npx tessl i tessl/pypi-source-jina-ai-reader