0
# Airbyte Source Notion
1
2
A Python-based Airbyte source connector for integrating with the Notion API. This connector enables data extraction from Notion workspaces, allowing users to sync databases, pages, blocks, users, and comments to their preferred data destinations. Built using Airbyte's declarative low-code CDK framework with custom Python streams for complex operations.
3
4
## Package Information
5
6
- **Package Name**: airbyte-source-notion
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: Available as Airbyte connector (typically not installed directly via pip)
10
- **Local Development**: Clone Airbyte repository and navigate to `airbyte-integrations/connectors/source-notion/`
11
- **Python Version**: 3.9+
12
13
## Core Imports
14
15
```python
16
from source_notion import SourceNotion
17
from source_notion.run import run
18
```
19
20
For accessing individual stream classes:
21
22
```python
23
from source_notion.streams import (
24
Pages, Blocks, NotionStream, IncrementalNotionStream,
25
StateValueWrapper, NotionAvailabilityStrategy, MAX_BLOCK_DEPTH
26
)
27
from source_notion.components import (
28
NotionUserTransformation,
29
NotionPropertiesTransformation,
30
NotionDataFeedFilter
31
)
32
```
33
34
## Basic Usage
35
36
### As Airbyte Connector (Command Line)
37
38
```bash
39
# Display connector specification
40
source-notion spec
41
42
# Test connection
43
source-notion check --config config.json
44
45
# Discover available streams
46
source-notion discover --config config.json
47
48
# Extract data
49
source-notion read --config config.json --catalog catalog.json
50
```
51
52
### As Python Library
53
54
```python
55
from source_notion import SourceNotion
56
from airbyte_cdk.models import ConfiguredAirbyteCatalog
57
58
# Initialize the connector
59
source = SourceNotion()
60
61
# Configuration with OAuth2.0
62
config = {
63
"credentials": {
64
"auth_type": "OAuth2.0",
65
"client_id": "your_client_id",
66
"client_secret": "your_client_secret",
67
"access_token": "your_access_token"
68
},
69
"start_date": "2023-01-01T00:00:00.000Z"
70
}
71
72
# Get available streams
73
streams = source.streams(config)
74
75
# Check connection
76
connection_status = source.check(logger, config)
77
```
78
79
## Architecture
80
81
The connector is built using Airbyte's hybrid architecture combining:
82
83
- **Declarative YAML Configuration**: For standard streams (users, databases, comments) using manifest.yaml
84
- **Python Streams**: For complex operations requiring custom logic (pages, blocks)
85
- **Authentication Layer**: Supports both OAuth2.0 and token-based authentication
86
- **Incremental Sync**: Uses cursor-based pagination with state management
87
- **Error Handling**: Custom retry logic for Notion API rate limits and errors
88
89
Key components:
90
- **SourceNotion**: Main connector class extending YamlDeclarativeSource
91
- **Stream Classes**: Custom stream implementations for Notion API specifics
92
- **Transformations**: Data processing for Notion-specific response formats
93
- **Filters**: Custom filtering for incremental sync optimization
94
95
## Capabilities
96
97
### Connector Initialization and Configuration
98
99
Core functionality for setting up and configuring the Notion source connector with authentication and stream management.
100
101
```python { .api }
102
class SourceNotion(YamlDeclarativeSource):
103
def __init__(self): ...
104
def streams(self, config: Mapping[str, Any]) -> List[Stream]: ...
105
def _get_authenticator(self, config: Mapping[str, Any]) -> TokenAuthenticator: ...
106
107
def run(): ...
108
```
109
110
[Connector Setup](./connector-setup.md)
111
112
### Data Stream Management
113
114
Base classes and functionality for managing Notion data streams with pagination, error handling, and incremental sync capabilities.
115
116
```python { .api }
117
class NotionStream(HttpStream, ABC):
118
url_base: str
119
primary_key: str
120
page_size: int
121
def backoff_time(self, response: requests.Response) -> Optional[float]: ...
122
def should_retry(self, response: requests.Response) -> bool: ...
123
124
class IncrementalNotionStream(NotionStream, CheckpointMixin, ABC):
125
cursor_field: str
126
def read_records(self, sync_mode: SyncMode, stream_state: Mapping[str, Any] = None, **kwargs) -> Iterable[Mapping[str, Any]]: ...
127
```
128
129
[Stream Management](./stream-management.md)
130
131
### Data Extraction Streams
132
133
Specific stream implementations for extracting different types of data from Notion workspaces, including pages and nested block content.
134
135
```python { .api }
136
class Pages(IncrementalNotionStream):
137
state_checkpoint_interval: int
138
def __init__(self, **kwargs): ...
139
140
class Blocks(HttpSubStream, IncrementalNotionStream):
141
block_id_stack: List[str]
142
def stream_slices(self, sync_mode: SyncMode, cursor_field: List[str] = None, stream_state: Mapping[str, Any] = None) -> Iterable[Optional[Mapping[str, Any]]]: ...
143
def read_records(self, **kwargs) -> Iterable[Mapping[str, Any]]: ...
144
```
145
146
[Data Streams](./data-streams.md)
147
148
### Data Transformations and Filtering
149
150
Custom components for transforming Notion API responses and filtering data for efficient incremental synchronization.
151
152
```python { .api }
153
class NotionUserTransformation(RecordTransformation):
154
def transform(self, record: MutableMapping[str, Any], **kwargs) -> MutableMapping[str, Any]: ...
155
156
class NotionPropertiesTransformation(RecordTransformation):
157
def transform(self, record: MutableMapping[str, Any], **kwargs) -> MutableMapping[str, Any]: ...
158
159
class NotionDataFeedFilter(RecordFilter):
160
def filter_records(self, records: List[Mapping[str, Any]], stream_state: StreamState, stream_slice: Optional[StreamSlice] = None, **kwargs) -> List[Mapping[str, Any]]: ...
161
```
162
163
[Transformations](./transformations.md)
164
165
## Configuration Schema
166
167
The connector supports flexible authentication methods:
168
169
### OAuth2.0 Authentication
170
```json
171
{
172
"credentials": {
173
"auth_type": "OAuth2.0",
174
"client_id": "notion_client_id",
175
"client_secret": "notion_client_secret",
176
"access_token": "oauth_access_token"
177
},
178
"start_date": "2023-01-01T00:00:00.000Z"
179
}
180
```
181
182
### Token Authentication
183
```json
184
{
185
"credentials": {
186
"auth_type": "token",
187
"token": "notion_integration_token"
188
},
189
"start_date": "2023-01-01T00:00:00.000Z"
190
}
191
```
192
193
### Legacy Format (Backward Compatibility)
194
```json
195
{
196
"access_token": "notion_token",
197
"start_date": "2023-01-01T00:00:00.000Z"
198
}
199
```
200
201
## Available Data Streams
202
203
The connector provides access to these Notion API resources:
204
205
1. **users** - Workspace users and bots (full refresh)
206
2. **databases** - Notion databases with metadata (incremental)
207
3. **pages** - Pages from databases and workspaces (incremental)
208
4. **blocks** - Block content with recursive hierarchy traversal (incremental)
209
5. **comments** - Comments on pages and databases (incremental)
210
211
## Error Handling
212
213
The connector implements comprehensive error handling for common Notion API scenarios:
214
215
- **Rate Limiting**: Automatic backoff using retry-after headers (~3 req/sec limit)
216
- **Gateway Timeouts**: Page size throttling for 504 responses
217
- **Permission Errors**: Clear messaging for 403/404 access issues
218
- **Invalid Cursors**: Graceful handling of pagination cursor errors
219
- **Unsupported Content**: Filtering of unsupported block types (ai_block)
220
221
## Dependencies
222
223
- **airbyte-cdk**: Airbyte Connector Development Kit
224
- **pendulum**: Date/time manipulation
225
- **pydantic**: Data validation and serialization
226
- **requests**: HTTP client for API communication