CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-source-jina-ai-reader

Airbyte source connector for Jina AI Reader API enabling web content extraction and search through intelligent reading services

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

configuration.mddocs/

Configuration Management

Configuration handling including validation, migration, and URL encoding for search prompts to ensure proper API integration. The JinaAiReaderConfigMigration class provides runtime configuration transformation capabilities.

Capabilities

Configuration Migration

Handles runtime migration of configuration values, specifically URL encoding of search prompts for proper API integration.

class JinaAiReaderConfigMigration:
    """
    Handles runtime configuration migration for the Jina AI Reader connector.
    
    Primary purpose is to ensure search_prompt values are properly URL-encoded
    for API consumption while maintaining backward compatibility.
    """
    
    message_repository: MessageRepository = InMemoryMessageRepository()

Migration Detection

Determines whether configuration migration is needed for a given config.

@classmethod
def should_migrate(cls, config: Mapping[str, Any]) -> bool:
    """
    Determines if configuration migration is required.
    
    Args:
        config (Mapping[str, Any]): Configuration dictionary containing connector settings
        
    Returns:
        bool: True if search_prompt needs URL encoding, False otherwise
        
    The method checks if the search_prompt parameter is properly URL-encoded.
    If the search_prompt is not encoded, migration is required.
    """

URL Encoding Validation

Validates whether a string is properly URL-encoded by comparing it with its unquoted version.

@classmethod  
def is_url_encoded(cls, s: str) -> bool:
    """
    Check if a string is URL-encoded by comparing with its unquoted version.
    
    Args:
        s (str): String to validate for URL encoding
        
    Returns:
        bool: True if string is not URL-encoded (needs encoding), False if already encoded
        
    Implementation uses urllib.parse.unquote to determine if the string has been URL-encoded.
    Returns True when the string equals its unquoted version (meaning it needs encoding).
    This method is used internally by should_migrate() to determine migration necessity.
    """

Configuration Transformation

Applies necessary transformations to configuration values.

@classmethod
def modify(cls, config: Mapping[str, Any]) -> Mapping[str, Any]:
    """
    Apply configuration modifications, specifically URL-encoding the search_prompt.
    
    Args:
        config (Mapping[str, Any]): Original configuration dictionary
        
    Returns:
        Mapping[str, Any]: Modified configuration with URL-encoded search_prompt
        
    Raises:
        ValueError: If configuration is invalid or malformed
        
    If search_prompt exists in config and needs encoding, applies urllib.parse.quote.
    """

Configuration Persistence

Modifies configuration and saves it to file.

@classmethod
def modify_and_save(cls, config_path: str, source: Source, config: Mapping[str, Any]) -> Mapping[str, Any]:
    """
    Modify configuration and save to specified path.
    
    Args:
        config_path (str): Path to configuration file
        source (Source): Airbyte source instance for file operations
        config (Mapping[str, Any]): Configuration to modify and save
        
    Returns:
        Mapping[str, Any]: The modified configuration
        
    Applies modifications via modify() method then saves using source.write_config().
    """

Control Message Emission

Emits Airbyte control messages for configuration changes.

@classmethod
def emit_control_message(cls, migrated_config: Mapping[str, Any]) -> None:
    """
    Emit Airbyte control message for configuration changes.
    
    Args:
        migrated_config (Mapping[str, Any]): The migrated configuration
        
    Creates and emits an Airbyte connector config control message to notify
    the Airbyte platform about configuration changes. Messages are queued
    and printed to stdout for platform consumption.
    """

Migration Orchestration

Main migration orchestration method that handles the complete migration workflow.

@classmethod
def migrate(cls, args: List[str], source: Source) -> None:
    """
    Main migration orchestration method.
    
    Args:
        args (List[str]): Command-line arguments from sys.argv
        source (Source): Airbyte source instance
        
    Orchestrates the complete migration process:
    1. Extracts config path from command-line arguments
    2. Reads existing configuration if --config argument provided
    3. Checks if migration is needed via should_migrate()
    4. Applies modifications and saves config via modify_and_save()
    5. Emits control message via emit_control_message()
    
    Only performs migration when --config argument is provided and migration is needed.
    """

Configuration Schema

The connector accepts the following configuration parameters:

class ConfigSpec(TypedDict):
    """Configuration specification for Jina AI Reader connector."""
    api_key: str  # Optional API key for authentication (marked as secret)
    read_prompt: str  # URL to read content from (default: "https://www.google.com") 
    search_prompt: str  # URL-encoded search query (default: "Search%20airbyte")
    gather_links: bool  # Include links summary section (optional)
    gather_images: bool  # Include images summary section (optional)

Usage Examples

Basic Configuration

config = {
    "api_key": "jina_your_api_key_here",
    "read_prompt": "https://example.com/article",
    "search_prompt": "machine%20learning%20news",
    "gather_links": True,
    "gather_images": False
}

Migration Example

from source_jina_ai_reader.config_migration import JinaAiReaderConfigMigration

# Check if migration needed
config = {"search_prompt": "AI news"}  # Not URL encoded
needs_migration = JinaAiReaderConfigMigration.should_migrate(config)  # True

# Apply migration
migrated_config = JinaAiReaderConfigMigration.modify(config)
# Result: {"search_prompt": "AI%20news"}  # URL encoded

Error Handling

  • ValueError: Raised when configuration is malformed, invalid, or when search_prompt is already URL-encoded but migration is attempted
  • Missing search_prompt: If search_prompt key is missing from config, migration is skipped
  • Validation: URL encoding validation prevents API request failures by ensuring proper encoding
  • Backward Compatibility: Migration ensures existing configs continue working with newer versions
  • Control Messages: Proper Airbyte protocol compliance for config changes via emit_control_message()
  • File Operations: write_config() handles file writing errors gracefully through Airbyte CDK
  • Command Line Parsing: migrate() method handles missing --config argument by checking if config_path exists

Install with Tessl CLI

npx tessl i tessl/pypi-source-jina-ai-reader

docs

configuration.md

core-interface.md

data-streams.md

http-handling.md

index.md

tile.json