or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-source-jina-ai-reader

Airbyte source connector for Jina AI Reader API enabling web content extraction and search through intelligent reading services

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/source-jina-ai-reader@0.1.x

To install, run

npx @tessl/cli install tessl/pypi-source-jina-ai-reader@0.1.0

0

# Jina AI Reader Source Connector

1

2

An Airbyte source connector for the Jina AI Reader API that enables intelligent web content extraction, search, and reading through Jina AI's services. The connector provides two main data streams for reading web content from URLs and performing web searches with optional link and image summarization.

3

4

## Package Information

5

6

- **Package Name**: source-jina-ai-reader

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install source-jina-ai-reader` or `poetry add source-jina-ai-reader`

10

- **Framework**: Airbyte CDK Declarative (Low-Code)

11

12

## Core Imports

13

14

```python

15

from source_jina_ai_reader import SourceJinaAiReader

16

```

17

18

For running the connector:

19

20

```python

21

from source_jina_ai_reader.run import run

22

```

23

24

For config migration:

25

26

```python

27

from source_jina_ai_reader.config_migration import JinaAiReaderConfigMigration

28

```

29

30

For custom components:

31

32

```python

33

from source_jina_ai_reader.components import JinaAiHttpRequester

34

```

35

36

Airbyte CDK imports:

37

38

```python

39

from airbyte_cdk.sources.declarative.types import StreamSlice, StreamState

40

from airbyte_cdk.sources.message import MessageRepository, InMemoryMessageRepository

41

```

42

43

## Basic Usage

44

45

### As an Airbyte Source Connector

46

47

```python

48

from source_jina_ai_reader import SourceJinaAiReader

49

50

# Initialize the connector

51

source = SourceJinaAiReader()

52

53

# Use with Airbyte framework

54

# Configuration is handled through Airbyte's config system

55

```

56

57

### Command Line Usage

58

59

```bash

60

# Install via poetry

61

poetry install --with dev

62

63

# Run connector operations

64

poetry run source-jina-ai-reader spec

65

poetry run source-jina-ai-reader check --config config.json

66

poetry run source-jina-ai-reader discover --config config.json

67

poetry run source-jina-ai-reader read --config config.json --catalog catalog.json

68

```

69

70

### Configuration Example

71

72

```json

73

{

74

"api_key": "jina_your_api_key_here",

75

"read_prompt": "https://example.com",

76

"search_prompt": "AI%20powered%20search",

77

"gather_links": true,

78

"gather_images": true

79

}

80

```

81

82

## Architecture

83

84

The connector follows Airbyte's declarative source pattern using YAML configuration:

85

86

- **SourceJinaAiReader**: Main connector class inheriting from YamlDeclarativeSource

87

- **JinaAiHttpRequester**: Custom HTTP requester handling Bearer token authentication

88

- **JinaAiReaderConfigMigration**: Runtime configuration migration for URL encoding

89

- **Manifest Configuration**: YAML-based stream definitions with two data streams

90

- **CLI Interface**: Standard Airbyte operations (spec, check, discover, read)

91

92

The connector transforms Jina AI's web content extraction and search APIs into structured Airbyte data streams, handling authentication, request formatting, and data transformation automatically.

93

94

## Capabilities

95

96

### Core Connector Interface

97

98

Main connector class and entry point functions that provide the foundation for Airbyte integration and command-line usage.

99

100

```python { .api }

101

class SourceJinaAiReader(YamlDeclarativeSource):

102

def __init__(self): ...

103

104

def run() -> None: ...

105

```

106

107

[Core Interface](./core-interface.md)

108

109

### Configuration Management

110

111

Configuration handling including validation, migration, and URL encoding for search prompts to ensure proper API integration.

112

113

```python { .api }

114

class JinaAiReaderConfigMigration:

115

@classmethod

116

def should_migrate(cls, config: Mapping[str, Any]) -> bool: ...

117

118

@classmethod

119

def modify(cls, config: Mapping[str, Any]) -> Mapping[str, Any]: ...

120

121

@classmethod

122

def migrate(cls, args: List[str], source: Source) -> None: ...

123

```

124

125

[Configuration](./configuration.md)

126

127

### HTTP Request Handling

128

129

Custom HTTP requester with Bearer token authentication for secure API access to Jina AI services.

130

131

```python { .api }

132

class JinaAiHttpRequester(HttpRequester):

133

def get_request_headers(

134

self,

135

*,

136

stream_state: Optional[StreamState] = None,

137

stream_slice: Optional[StreamSlice] = None,

138

next_page_token: Optional[Mapping[str, Any]] = None,

139

) -> Mapping[str, Any]: ...

140

```

141

142

[HTTP Handling](./http-handling.md)

143

144

### Data Streams

145

146

Two main data streams providing web content extraction and search functionality through Jina AI's intelligent reading services.

147

148

**Stream Types:**

149

- `reader`: Extracts content from specified URLs

150

- `search`: Performs web searches with optional content summarization

151

152

**Data Schema:**

153

- `title`: string - Content title

154

- `url`: string - Source URL

155

- `content`: string - Extracted/searched content

156

- `description`: string - Content description

157

- `links`: object - Optional links summary

158

159

[Data Streams](./data-streams.md)

160

161

## Types

162

163

```python { .api }

164

# Configuration types

165

ConfigDict = Mapping[str, Any]

166

167

# Airbyte CDK stream types

168

StreamState = Optional[Mapping[str, Any]] # Current state of data stream for incremental sync

169

StreamSlice = Optional[Mapping[str, Any]] # Current slice being processed for parallel execution

170

171

# Airbyte CDK message types

172

MessageRepository = InMemoryMessageRepository # Repository for storing connector messages

173

174

# Data record structure returned by both streams

175

class ContentRecord(TypedDict):

176

title: str # Content or search result title

177

url: str # Source URL of the content

178

content: str # Extracted text content

179

description: str # Brief description or summary

180

links: Dict[str, Any] # Optional links summary with dynamic properties

181

182

# Configuration specification

183

class ConfigSpec(TypedDict, total=False):

184

api_key: str # Optional Jina AI API key (marked as secret in manifest)

185

read_prompt: str # URL to read content from (default: "https://www.google.com")

186

search_prompt: str # URL-encoded search query (default: "Search%20airbyte")

187

gather_links: bool # Include links summary section (optional parameter)

188

gather_images: bool # Include images summary section (optional parameter)

189

```