0
# HTTP Request Handling
1
2
Custom HTTP requester with Bearer token authentication for secure API access to Jina AI services. The JinaAiHttpRequester extends Airbyte's standard HTTP requester to provide authentication and header management specific to Jina AI's API requirements.
3
4
## Capabilities
5
6
### Custom HTTP Requester
7
8
Extends Airbyte's HttpRequester to provide custom authentication and header handling for Jina AI API integration.
9
10
```python { .api }
11
@dataclass
12
class JinaAiHttpRequester(HttpRequester):
13
"""
14
Custom HTTP requester for Jina AI Reader API integration.
15
16
Extends Airbyte CDK's HttpRequester to provide Bearer token authentication
17
and custom header management for Jina AI Reader and Search APIs.
18
19
Attributes:
20
request_headers (Optional[Union[str, Mapping[str, str]]]):
21
Custom headers configuration for API requests
22
"""
23
24
request_headers: Optional[Union[str, Mapping[str, str]]] = None
25
```
26
27
### Post-Initialization Setup
28
29
Handles setup of header interpolation after object initialization.
30
31
```python { .api }
32
def __post_init__(self, parameters: Mapping[str, Any]) -> None:
33
"""
34
Post-initialization setup for header interpolation.
35
36
Args:
37
parameters (Mapping[str, Any]): Configuration parameters from the connector
38
39
Initializes the headers interpolator that processes template variables
40
in request headers, enabling dynamic header values based on configuration
41
and runtime context.
42
"""
43
```
44
45
### Request Header Management
46
47
Builds and manages HTTP request headers including Bearer token authentication.
48
49
```python { .api }
50
def get_request_headers(
51
self,
52
*,
53
stream_state: Optional[StreamState] = None,
54
stream_slice: Optional[StreamSlice] = None,
55
next_page_token: Optional[Mapping[str, Any]] = None,
56
) -> Mapping[str, Any]:
57
"""
58
Generate HTTP request headers with Bearer token authentication.
59
60
Args:
61
stream_state (Optional[StreamState]): Current state of the data stream
62
stream_slice (Optional[StreamSlice]): Current slice being processed
63
next_page_token (Optional[Mapping[str, Any]]): Pagination token if applicable
64
65
Returns:
66
Mapping[str, Any]: Dictionary of HTTP headers including authentication
67
68
This method:
69
1. Evaluates header templates using the interpolator
70
2. Checks for api_key in configuration
71
3. Adds Bearer token authentication header if api_key is present
72
4. Returns complete header dictionary for API requests
73
74
The Bearer token is only added when api_key is configured, making
75
authentication optional for public API access.
76
"""
77
```
78
79
## HTTP Request Configuration
80
81
The HTTP requester is configured through the manifest.yaml file and supports the following patterns:
82
83
### Authentication Headers
84
85
```python
86
# Automatic Bearer token authentication when api_key is configured
87
headers = {
88
"Authorization": f"Bearer {api_key}", # Added automatically if api_key present
89
"Accept": "application/json", # Always included
90
"X-With-Links-Summary": "true", # Based on gather_links config
91
"X-With-Images-Summary": "false" # Based on gather_images config
92
}
93
```
94
95
### API Endpoints
96
97
The requester handles requests to two main Jina AI endpoints:
98
99
**Reader Stream:**
100
- Base URL: `https://r.jina.ai/{read_prompt}`
101
- Method: GET
102
- Purpose: Extract content from specified URLs
103
104
**Search Stream:**
105
- Base URL: `https://s.jina.ai/{search_prompt}`
106
- Method: GET
107
- Purpose: Perform web searches with content extraction
108
109
### Request Headers Configuration
110
111
Headers are configured through template interpolation supporting:
112
113
```yaml
114
request_headers:
115
Accept: application/json
116
X-With-Links-Summary: "{{ config['gather_links'] }}"
117
X-With-Images-Summary: "{{ config['gather_images'] }}"
118
```
119
120
## Integration with Airbyte CDK
121
122
### HttpRequester Inheritance
123
124
The custom requester inherits from Airbyte CDK's HttpRequester:
125
126
- **Base Functionality**: Standard HTTP request handling, retries, error handling
127
- **Custom Extensions**: Bearer token authentication, header interpolation
128
- **Template Support**: Dynamic header values based on configuration
129
- **Stream Context**: Access to stream state and pagination context
130
131
### Declarative Configuration
132
133
Configured through manifest.yaml as a custom requester:
134
135
```yaml
136
requester:
137
type: CustomRequester
138
class_name: source_jina_ai_reader.components.JinaAiHttpRequester
139
url_base: "https://r.jina.ai/{{ config['read_prompt'] }}"
140
http_method: "GET"
141
path: "/"
142
authenticator:
143
type: NoAuth # Authentication handled by custom requester
144
```
145
146
## Usage Examples
147
148
### With API Key Authentication
149
150
```python
151
# Configuration with API key
152
config = {
153
"api_key": "jina_abc123xyz",
154
"read_prompt": "https://example.com",
155
"gather_links": True,
156
"gather_images": False
157
}
158
159
# Results in headers:
160
# {
161
# "Authorization": "Bearer jina_abc123xyz",
162
# "Accept": "application/json",
163
# "X-With-Links-Summary": "true",
164
# "X-With-Images-Summary": "false"
165
# }
166
```
167
168
### Without API Key (Public Access)
169
170
```python
171
# Configuration without API key
172
config = {
173
"read_prompt": "https://example.com",
174
"gather_links": False,
175
"gather_images": True
176
}
177
178
# Results in headers:
179
# {
180
# "Accept": "application/json",
181
# "X-With-Links-Summary": "false",
182
# "X-With-Images-Summary": "true"
183
# }
184
# No Authorization header added
185
```
186
187
## Error Handling
188
189
- **Header Validation**: Ensures headers are properly formatted dictionaries
190
- **Authentication**: Gracefully handles missing api_key by omitting auth header
191
- **Template Processing**: Robust interpolation of configuration values
192
- **HTTP Errors**: Inherits standard Airbyte CDK error handling and retry logic