Airbyte source connector for extracting data from Microsoft OneDrive cloud storage with OAuth authentication and file-based streaming capabilities.
npx @tessl/cli install tessl/pypi-source-microsoft-onedrive@0.2.00
# Microsoft OneDrive Source Connector
1
2
An Airbyte source connector that enables data extraction and synchronization from Microsoft OneDrive cloud storage. Built on the Airbyte CDK file-based framework with OAuth 2.0 authentication integration, automated file discovery, and comprehensive configuration management for enterprise data integration workflows.
3
4
## Package Information
5
6
- **Package Name**: source-microsoft-onedrive
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install source-microsoft-onedrive`
10
- **Version**: 0.2.44
11
12
## Core Imports
13
14
```python
15
from source_microsoft_onedrive import SourceMicrosoftOneDrive
16
```
17
18
For CLI usage:
19
```python
20
from source_microsoft_onedrive.run import run
21
```
22
23
Internal imports (for advanced usage):
24
```python
25
from source_microsoft_onedrive.spec import SourceMicrosoftOneDriveSpec
26
from source_microsoft_onedrive.stream_reader import SourceMicrosoftOneDriveStreamReader, SourceMicrosoftOneDriveClient
27
```
28
29
## Basic Usage
30
31
### As Airbyte Source Connector
32
33
```python
34
from source_microsoft_onedrive import SourceMicrosoftOneDrive
35
from airbyte_cdk import launch
36
37
# Configuration with OAuth credentials
38
config = {
39
"credentials": {
40
"auth_type": "Client",
41
"tenant_id": "your-tenant-id",
42
"client_id": "your-client-id",
43
"client_secret": "your-client-secret",
44
"refresh_token": "your-refresh-token"
45
},
46
"drive_name": "OneDrive",
47
"search_scope": "ALL",
48
"folder_path": ".",
49
"streams": [{
50
"name": "files",
51
"globs": ["*.csv", "*.json"],
52
"validation_policy": "Emit Record",
53
"format": {"filetype": "csv"}
54
}]
55
}
56
57
# Initialize and run connector
58
source = SourceMicrosoftOneDrive(None, config, None)
59
launch(source, ["read", "--config", "config.json", "--catalog", "catalog.json"])
60
```
61
62
### CLI Usage
63
64
```bash
65
# Install via poetry
66
poetry install
67
68
# Run connector commands
69
source-microsoft-onedrive spec
70
source-microsoft-onedrive check --config config.json
71
source-microsoft-onedrive discover --config config.json
72
source-microsoft-onedrive read --config config.json --catalog catalog.json
73
```
74
75
## Architecture
76
77
The connector is built on Airbyte's file-based framework with these key components:
78
79
- **SourceMicrosoftOneDrive**: Main connector class extending FileBasedSource
80
- **SourceMicrosoftOneDriveStreamReader**: Handles file discovery and reading from OneDrive
81
- **SourceMicrosoftOneDriveClient**: Microsoft Graph API client with MSAL authentication
82
- **Configuration Models**: Pydantic models for OAuth and service authentication
83
84
The connector supports both OAuth (user delegation) and service principal authentication, can search across accessible drives and shared items, handles nested folder structures, and integrates with smart-open for efficient file reading across various formats.
85
86
## Capabilities
87
88
### Source Connector
89
90
Core Airbyte source connector functionality including specification generation, configuration validation, stream discovery, and data reading with OAuth authentication support.
91
92
```python { .api }
93
class SourceMicrosoftOneDrive(FileBasedSource):
94
def __init__(self, catalog: Optional[ConfiguredAirbyteCatalog], config: Optional[Mapping[str, Any]], state: Optional[TState]): ...
95
def spec(self, *args: Any, **kwargs: Any) -> ConnectorSpecification: ...
96
```
97
98
[Source Connector](./source-connector.md)
99
100
### Configuration Management
101
102
Comprehensive configuration models supporting OAuth and service authentication with validation, schema generation, and documentation URL management.
103
104
```python { .api }
105
class SourceMicrosoftOneDriveSpec(AbstractFileBasedSpec, BaseModel):
106
credentials: Union[OAuthCredentials, ServiceCredentials]
107
drive_name: Optional[str]
108
search_scope: str
109
folder_path: str
110
111
@classmethod
112
def documentation_url(cls) -> str: ...
113
@classmethod
114
def schema(cls, *args: Any, **kwargs: Any) -> Dict[str, Any]: ...
115
```
116
117
[Configuration](./configuration.md)
118
119
### File Operations
120
121
File discovery, enumeration, and reading capabilities across OneDrive drives and shared items with glob pattern matching and metadata extraction.
122
123
```python { .api }
124
class SourceMicrosoftOneDriveStreamReader(AbstractFileBasedStreamReader):
125
def get_matching_files(self, globs: List[str], prefix: Optional[str], logger: logging.Logger) -> Iterable[RemoteFile]: ...
126
def open_file(self, file: RemoteFile, mode: FileReadMode, encoding: Optional[str], logger: logging.Logger) -> IOBase: ...
127
def get_all_files(self): ...
128
```
129
130
[File Operations](./file-operations.md)
131
132
### Authentication
133
134
Microsoft Graph API authentication using MSAL with support for OAuth refresh tokens and service principal credentials.
135
136
```python { .api }
137
class SourceMicrosoftOneDriveClient:
138
def __init__(self, config: SourceMicrosoftOneDriveSpec): ...
139
@property
140
def client(self): ...
141
def _get_access_token(self): ...
142
```
143
144
[Authentication](./authentication.md)
145
146
## CLI Entry Points
147
148
```python { .api }
149
def run():
150
"""Main CLI entry point that processes command-line arguments and launches the connector."""
151
```
152
153
## Types
154
155
```python { .api }
156
from typing import Any, Dict, List, Mapping, Optional, Union, Iterable
157
from datetime import datetime
158
from io import IOBase
159
160
# Airbyte CDK imports
161
from airbyte_cdk import ConfiguredAirbyteCatalog, ConnectorSpecification, TState
162
from airbyte_cdk.sources.file_based.file_based_source import FileBasedSource
163
from airbyte_cdk.sources.file_based.stream.cursor.default_file_based_cursor import DefaultFileBasedCursor
164
from airbyte_cdk.sources.file_based.file_based_stream_reader import AbstractFileBasedStreamReader, FileReadMode
165
from airbyte_cdk.sources.file_based.remote_file import RemoteFile
166
from airbyte_cdk.sources.file_based.config.abstract_file_based_spec import AbstractFileBasedSpec
167
168
# Pydantic for configuration models
169
from pydantic import BaseModel, Field
170
171
# Microsoft authentication
172
from msal import ConfidentialClientApplication
173
from office365.graph_client import GraphClient
174
175
# Additional imports for error handling and web requests
176
from airbyte_cdk import AirbyteTracedException, FailureType
177
import requests
178
import smart_open
179
import logging
180
```