or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-airbyte-source-notion

Airbyte source connector for extracting data from Notion workspaces with OAuth2.0 and token authentication support.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/airbyte-source-notion@3.0.x

To install, run

npx @tessl/cli install tessl/pypi-airbyte-source-notion@3.0.0

0

# Airbyte Source Notion

1

2

A Python-based Airbyte source connector for integrating with the Notion API. This connector enables data extraction from Notion workspaces, allowing users to sync databases, pages, blocks, users, and comments to their preferred data destinations. Built using Airbyte's declarative low-code CDK framework with custom Python streams for complex operations.

3

4

## Package Information

5

6

- **Package Name**: airbyte-source-notion

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: Available as Airbyte connector (typically not installed directly via pip)

10

- **Local Development**: Clone Airbyte repository and navigate to `airbyte-integrations/connectors/source-notion/`

11

- **Python Version**: 3.9+

12

13

## Core Imports

14

15

```python

16

from source_notion import SourceNotion

17

from source_notion.run import run

18

```

19

20

For accessing individual stream classes:

21

22

```python

23

from source_notion.streams import (

24

Pages, Blocks, NotionStream, IncrementalNotionStream,

25

StateValueWrapper, NotionAvailabilityStrategy, MAX_BLOCK_DEPTH

26

)

27

from source_notion.components import (

28

NotionUserTransformation,

29

NotionPropertiesTransformation,

30

NotionDataFeedFilter

31

)

32

```

33

34

## Basic Usage

35

36

### As Airbyte Connector (Command Line)

37

38

```bash

39

# Display connector specification

40

source-notion spec

41

42

# Test connection

43

source-notion check --config config.json

44

45

# Discover available streams

46

source-notion discover --config config.json

47

48

# Extract data

49

source-notion read --config config.json --catalog catalog.json

50

```

51

52

### As Python Library

53

54

```python

55

from source_notion import SourceNotion

56

from airbyte_cdk.models import ConfiguredAirbyteCatalog

57

58

# Initialize the connector

59

source = SourceNotion()

60

61

# Configuration with OAuth2.0

62

config = {

63

"credentials": {

64

"auth_type": "OAuth2.0",

65

"client_id": "your_client_id",

66

"client_secret": "your_client_secret",

67

"access_token": "your_access_token"

68

},

69

"start_date": "2023-01-01T00:00:00.000Z"

70

}

71

72

# Get available streams

73

streams = source.streams(config)

74

75

# Check connection

76

connection_status = source.check(logger, config)

77

```

78

79

## Architecture

80

81

The connector is built using Airbyte's hybrid architecture combining:

82

83

- **Declarative YAML Configuration**: For standard streams (users, databases, comments) using manifest.yaml

84

- **Python Streams**: For complex operations requiring custom logic (pages, blocks)

85

- **Authentication Layer**: Supports both OAuth2.0 and token-based authentication

86

- **Incremental Sync**: Uses cursor-based pagination with state management

87

- **Error Handling**: Custom retry logic for Notion API rate limits and errors

88

89

Key components:

90

- **SourceNotion**: Main connector class extending YamlDeclarativeSource

91

- **Stream Classes**: Custom stream implementations for Notion API specifics

92

- **Transformations**: Data processing for Notion-specific response formats

93

- **Filters**: Custom filtering for incremental sync optimization

94

95

## Capabilities

96

97

### Connector Initialization and Configuration

98

99

Core functionality for setting up and configuring the Notion source connector with authentication and stream management.

100

101

```python { .api }

102

class SourceNotion(YamlDeclarativeSource):

103

def __init__(self): ...

104

def streams(self, config: Mapping[str, Any]) -> List[Stream]: ...

105

def _get_authenticator(self, config: Mapping[str, Any]) -> TokenAuthenticator: ...

106

107

def run(): ...

108

```

109

110

[Connector Setup](./connector-setup.md)

111

112

### Data Stream Management

113

114

Base classes and functionality for managing Notion data streams with pagination, error handling, and incremental sync capabilities.

115

116

```python { .api }

117

class NotionStream(HttpStream, ABC):

118

url_base: str

119

primary_key: str

120

page_size: int

121

def backoff_time(self, response: requests.Response) -> Optional[float]: ...

122

def should_retry(self, response: requests.Response) -> bool: ...

123

124

class IncrementalNotionStream(NotionStream, CheckpointMixin, ABC):

125

cursor_field: str

126

def read_records(self, sync_mode: SyncMode, stream_state: Mapping[str, Any] = None, **kwargs) -> Iterable[Mapping[str, Any]]: ...

127

```

128

129

[Stream Management](./stream-management.md)

130

131

### Data Extraction Streams

132

133

Specific stream implementations for extracting different types of data from Notion workspaces, including pages and nested block content.

134

135

```python { .api }

136

class Pages(IncrementalNotionStream):

137

state_checkpoint_interval: int

138

def __init__(self, **kwargs): ...

139

140

class Blocks(HttpSubStream, IncrementalNotionStream):

141

block_id_stack: List[str]

142

def stream_slices(self, sync_mode: SyncMode, cursor_field: List[str] = None, stream_state: Mapping[str, Any] = None) -> Iterable[Optional[Mapping[str, Any]]]: ...

143

def read_records(self, **kwargs) -> Iterable[Mapping[str, Any]]: ...

144

```

145

146

[Data Streams](./data-streams.md)

147

148

### Data Transformations and Filtering

149

150

Custom components for transforming Notion API responses and filtering data for efficient incremental synchronization.

151

152

```python { .api }

153

class NotionUserTransformation(RecordTransformation):

154

def transform(self, record: MutableMapping[str, Any], **kwargs) -> MutableMapping[str, Any]: ...

155

156

class NotionPropertiesTransformation(RecordTransformation):

157

def transform(self, record: MutableMapping[str, Any], **kwargs) -> MutableMapping[str, Any]: ...

158

159

class NotionDataFeedFilter(RecordFilter):

160

def filter_records(self, records: List[Mapping[str, Any]], stream_state: StreamState, stream_slice: Optional[StreamSlice] = None, **kwargs) -> List[Mapping[str, Any]]: ...

161

```

162

163

[Transformations](./transformations.md)

164

165

## Configuration Schema

166

167

The connector supports flexible authentication methods:

168

169

### OAuth2.0 Authentication

170

```json

171

{

172

"credentials": {

173

"auth_type": "OAuth2.0",

174

"client_id": "notion_client_id",

175

"client_secret": "notion_client_secret",

176

"access_token": "oauth_access_token"

177

},

178

"start_date": "2023-01-01T00:00:00.000Z"

179

}

180

```

181

182

### Token Authentication

183

```json

184

{

185

"credentials": {

186

"auth_type": "token",

187

"token": "notion_integration_token"

188

},

189

"start_date": "2023-01-01T00:00:00.000Z"

190

}

191

```

192

193

### Legacy Format (Backward Compatibility)

194

```json

195

{

196

"access_token": "notion_token",

197

"start_date": "2023-01-01T00:00:00.000Z"

198

}

199

```

200

201

## Available Data Streams

202

203

The connector provides access to these Notion API resources:

204

205

1. **users** - Workspace users and bots (full refresh)

206

2. **databases** - Notion databases with metadata (incremental)

207

3. **pages** - Pages from databases and workspaces (incremental)

208

4. **blocks** - Block content with recursive hierarchy traversal (incremental)

209

5. **comments** - Comments on pages and databases (incremental)

210

211

## Error Handling

212

213

The connector implements comprehensive error handling for common Notion API scenarios:

214

215

- **Rate Limiting**: Automatic backoff using retry-after headers (~3 req/sec limit)

216

- **Gateway Timeouts**: Page size throttling for 504 responses

217

- **Permission Errors**: Clear messaging for 403/404 access issues

218

- **Invalid Cursors**: Graceful handling of pagination cursor errors

219

- **Unsupported Content**: Filtering of unsupported block types (ai_block)

220

221

## Dependencies

222

223

- **airbyte-cdk**: Airbyte Connector Development Kit

224

- **pendulum**: Date/time manipulation

225

- **pydantic**: Data validation and serialization

226

- **requests**: HTTP client for API communication