or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

connector-setup.mddata-streams.mdindex.mdstream-management.mdtransformations.md
tile.json

tessl/pypi-airbyte-source-notion

Airbyte source connector for extracting data from Notion workspaces with OAuth2.0 and token authentication support.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/airbyte-source-notion@3.0.x

To install, run

npx @tessl/cli install tessl/pypi-airbyte-source-notion@3.0.0

index.mddocs/

Airbyte Source Notion

A Python-based Airbyte source connector for integrating with the Notion API. This connector enables data extraction from Notion workspaces, allowing users to sync databases, pages, blocks, users, and comments to their preferred data destinations. Built using Airbyte's declarative low-code CDK framework with custom Python streams for complex operations.

Package Information

  • Package Name: airbyte-source-notion
  • Package Type: pypi
  • Language: Python
  • Installation: Available as Airbyte connector (typically not installed directly via pip)
  • Local Development: Clone Airbyte repository and navigate to airbyte-integrations/connectors/source-notion/
  • Python Version: 3.9+

Core Imports

from source_notion import SourceNotion
from source_notion.run import run

For accessing individual stream classes:

from source_notion.streams import (
    Pages, Blocks, NotionStream, IncrementalNotionStream,
    StateValueWrapper, NotionAvailabilityStrategy, MAX_BLOCK_DEPTH
)
from source_notion.components import (
    NotionUserTransformation,
    NotionPropertiesTransformation, 
    NotionDataFeedFilter
)

Basic Usage

As Airbyte Connector (Command Line)

# Display connector specification
source-notion spec

# Test connection
source-notion check --config config.json

# Discover available streams
source-notion discover --config config.json

# Extract data
source-notion read --config config.json --catalog catalog.json

As Python Library

from source_notion import SourceNotion
from airbyte_cdk.models import ConfiguredAirbyteCatalog

# Initialize the connector
source = SourceNotion()

# Configuration with OAuth2.0
config = {
    "credentials": {
        "auth_type": "OAuth2.0",
        "client_id": "your_client_id",
        "client_secret": "your_client_secret", 
        "access_token": "your_access_token"
    },
    "start_date": "2023-01-01T00:00:00.000Z"
}

# Get available streams
streams = source.streams(config)

# Check connection
connection_status = source.check(logger, config)

Architecture

The connector is built using Airbyte's hybrid architecture combining:

  • Declarative YAML Configuration: For standard streams (users, databases, comments) using manifest.yaml
  • Python Streams: For complex operations requiring custom logic (pages, blocks)
  • Authentication Layer: Supports both OAuth2.0 and token-based authentication
  • Incremental Sync: Uses cursor-based pagination with state management
  • Error Handling: Custom retry logic for Notion API rate limits and errors

Key components:

  • SourceNotion: Main connector class extending YamlDeclarativeSource
  • Stream Classes: Custom stream implementations for Notion API specifics
  • Transformations: Data processing for Notion-specific response formats
  • Filters: Custom filtering for incremental sync optimization

Capabilities

Connector Initialization and Configuration

Core functionality for setting up and configuring the Notion source connector with authentication and stream management.

class SourceNotion(YamlDeclarativeSource):
    def __init__(self): ...
    def streams(self, config: Mapping[str, Any]) -> List[Stream]: ...
    def _get_authenticator(self, config: Mapping[str, Any]) -> TokenAuthenticator: ...

def run(): ...

Connector Setup

Data Stream Management

Base classes and functionality for managing Notion data streams with pagination, error handling, and incremental sync capabilities.

class NotionStream(HttpStream, ABC):
    url_base: str
    primary_key: str
    page_size: int
    def backoff_time(self, response: requests.Response) -> Optional[float]: ...
    def should_retry(self, response: requests.Response) -> bool: ...

class IncrementalNotionStream(NotionStream, CheckpointMixin, ABC):
    cursor_field: str
    def read_records(self, sync_mode: SyncMode, stream_state: Mapping[str, Any] = None, **kwargs) -> Iterable[Mapping[str, Any]]: ...

Stream Management

Data Extraction Streams

Specific stream implementations for extracting different types of data from Notion workspaces, including pages and nested block content.

class Pages(IncrementalNotionStream):
    state_checkpoint_interval: int
    def __init__(self, **kwargs): ...

class Blocks(HttpSubStream, IncrementalNotionStream):
    block_id_stack: List[str]
    def stream_slices(self, sync_mode: SyncMode, cursor_field: List[str] = None, stream_state: Mapping[str, Any] = None) -> Iterable[Optional[Mapping[str, Any]]]: ...
    def read_records(self, **kwargs) -> Iterable[Mapping[str, Any]]: ...

Data Streams

Data Transformations and Filtering

Custom components for transforming Notion API responses and filtering data for efficient incremental synchronization.

class NotionUserTransformation(RecordTransformation):
    def transform(self, record: MutableMapping[str, Any], **kwargs) -> MutableMapping[str, Any]: ...

class NotionPropertiesTransformation(RecordTransformation):
    def transform(self, record: MutableMapping[str, Any], **kwargs) -> MutableMapping[str, Any]: ...

class NotionDataFeedFilter(RecordFilter):
    def filter_records(self, records: List[Mapping[str, Any]], stream_state: StreamState, stream_slice: Optional[StreamSlice] = None, **kwargs) -> List[Mapping[str, Any]]: ...

Transformations

Configuration Schema

The connector supports flexible authentication methods:

OAuth2.0 Authentication

{
  "credentials": {
    "auth_type": "OAuth2.0",
    "client_id": "notion_client_id",
    "client_secret": "notion_client_secret",
    "access_token": "oauth_access_token"
  },
  "start_date": "2023-01-01T00:00:00.000Z"
}

Token Authentication

{
  "credentials": {
    "auth_type": "token", 
    "token": "notion_integration_token"
  },
  "start_date": "2023-01-01T00:00:00.000Z"
}

Legacy Format (Backward Compatibility)

{
  "access_token": "notion_token",
  "start_date": "2023-01-01T00:00:00.000Z"
}

Available Data Streams

The connector provides access to these Notion API resources:

  1. users - Workspace users and bots (full refresh)
  2. databases - Notion databases with metadata (incremental)
  3. pages - Pages from databases and workspaces (incremental)
  4. blocks - Block content with recursive hierarchy traversal (incremental)
  5. comments - Comments on pages and databases (incremental)

Error Handling

The connector implements comprehensive error handling for common Notion API scenarios:

  • Rate Limiting: Automatic backoff using retry-after headers (~3 req/sec limit)
  • Gateway Timeouts: Page size throttling for 504 responses
  • Permission Errors: Clear messaging for 403/404 access issues
  • Invalid Cursors: Graceful handling of pagination cursor errors
  • Unsupported Content: Filtering of unsupported block types (ai_block)

Dependencies

  • airbyte-cdk: Airbyte Connector Development Kit
  • pendulum: Date/time manipulation
  • pydantic: Data validation and serialization
  • requests: HTTP client for API communication