CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-source-microsoft-onedrive

Airbyte source connector for extracting data from Microsoft OneDrive cloud storage with OAuth authentication and file-based streaming capabilities.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

authentication.mddocs/

Authentication

Microsoft Graph API authentication using MSAL (Microsoft Authentication Library) with support for OAuth refresh tokens and service principal credentials. Provides secure access to OneDrive resources with proper token management and error handling.

Capabilities

Authentication Client

Main client class for handling Microsoft Graph API authentication and providing GraphClient instances.

class SourceMicrosoftOneDriveClient:
    def __init__(self, config: SourceMicrosoftOneDriveSpec):
        """
        Initialize the OneDrive authentication client.
        
        Parameters:
        - config: SourceMicrosoftOneDriveSpec - Configuration containing authentication credentials
        """
        
    @property
    def msal_app(self):
        """
        Returns a cached MSAL app instance for authentication.
        Uses @lru_cache for efficient reuse across operations.
        
        Returns:
        ConfidentialClientApplication: MSAL application configured with client credentials
        """
        
    @property  
    def client(self):
        """
        Initializes and returns a GraphClient instance for Microsoft Graph API operations.
        Creates the client on first access and reuses for subsequent calls.
        
        Returns:
        GraphClient: Configured Office365 GraphClient for OneDrive operations
        
        Raises:
        - ValueError: If configuration is missing or invalid
        
        Implementation:
        Checks if configuration exists, then lazily initializes _client with GraphClient
        passing the _get_access_token method as the token provider.
        """
        
    def _get_access_token(self):
        """
        Retrieves an access token for OneDrive access using configured authentication method.
        Handles both OAuth refresh token and service principal authentication flows.
        
        Returns:
        Dict: Token response containing access_token and metadata
        
        Raises:
        - MsalServiceError: If token acquisition fails
        """

MSAL Application Configuration

The MSAL application is configured based on the authentication method:

# OAuth/Service Principal Configuration
authority = f"https://login.microsoftonline.com/{tenant_id}"
client_credential = client_secret
scope = ["https://graph.microsoft.com/.default"]

Authentication Flows

OAuth Flow (User Delegation)

For interactive user authentication with refresh token:

# OAuth configuration
oauth_config = {
    "credentials": {
        "auth_type": "Client",
        "tenant_id": "12345678-1234-1234-1234-123456789012",
        "client_id": "87654321-4321-4321-4321-210987654321",
        "client_secret": "your-client-secret", 
        "refresh_token": "your-refresh-token"
    }
}

# Client initialization
from source_microsoft_onedrive.stream_reader import SourceMicrosoftOneDriveClient
from source_microsoft_onedrive.spec import SourceMicrosoftOneDriveSpec

config = SourceMicrosoftOneDriveSpec(**oauth_config)
auth_client = SourceMicrosoftOneDriveClient(config)

# Get GraphClient for API operations
graph_client = auth_client.client

Service Principal Flow (Application-Only)

For service-to-service authentication without user interaction:

# Service principal configuration
service_config = {
    "credentials": {
        "auth_type": "Service",
        "tenant_id": "12345678-1234-1234-1234-123456789012", 
        "user_principal_name": "serviceuser@company.onmicrosoft.com",
        "client_id": "87654321-4321-4321-4321-210987654321",
        "client_secret": "your-app-secret"
    }
}

config = SourceMicrosoftOneDriveSpec(**service_config)
auth_client = SourceMicrosoftOneDriveClient(config)

# Access specific user's OneDrive
graph_client = auth_client.client
user_drive = graph_client.users.get_by_principal_name(
    config.credentials.user_principal_name
).drive.get().execute_query()

Token Management

Access Token Retrieval

# Direct access token retrieval
access_token = auth_client._get_access_token()

# Token structure
{
    "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIs...",
    "token_type": "Bearer",
    "expires_in": 3600,
    "scope": "https://graph.microsoft.com/.default"
}

Token Refresh (OAuth)

For OAuth flows, tokens are automatically refreshed using the refresh token:

# Automatic refresh in _get_access_token()
if refresh_token:
    result = self.msal_app.acquire_token_by_refresh_token(refresh_token, scopes=scope)
else:
    result = self.msal_app.acquire_token_for_client(scopes=scope)

Service Principal Authentication

For service principal flows, client credentials are used directly:

# Service principal token acquisition
result = self.msal_app.acquire_token_for_client(scopes=scope)

Usage Examples

Basic Authentication Setup

from source_microsoft_onedrive.stream_reader import SourceMicrosoftOneDriveClient
from source_microsoft_onedrive.spec import SourceMicrosoftOneDriveSpec

# OAuth configuration
config_data = {
    "credentials": {
        "auth_type": "Client",
        "tenant_id": "your-tenant-id",
        "client_id": "your-client-id",
        "client_secret": "your-client-secret",
        "refresh_token": "your-refresh-token"
    },
    "drive_name": "OneDrive",
    "search_scope": "ALL",
    "folder_path": "."
}

# Initialize authentication
config = SourceMicrosoftOneDriveSpec(**config_data)
auth_client = SourceMicrosoftOneDriveClient(config)

# Get authenticated GraphClient
try:
    graph_client = auth_client.client
    print("Authentication successful")
except Exception as e:
    print(f"Authentication failed: {e}")

Token Validation

# Validate token and get user info
try:
    token_info = auth_client._get_access_token()
    
    if "access_token" in token_info:
        print("Token acquired successfully")
        
        # Use GraphClient for OneDrive operations
        drives = graph_client.drives.get().execute_query()
        print(f"Found {len(drives)} accessible drives")
        
except Exception as e:
    print(f"Token acquisition failed: {e}")

Multi-User Service Authentication

# Service principal with multiple user access
service_users = [
    "user1@company.onmicrosoft.com",
    "user2@company.onmicrosoft.com"
]

for user_principal in service_users:
    service_config = {
        "credentials": {
            "auth_type": "Service",
            "tenant_id": "your-tenant-id",
            "user_principal_name": user_principal,
            "client_id": "your-app-id", 
            "client_secret": "your-app-secret"
        }
    }
    
    config = SourceMicrosoftOneDriveSpec(**service_config)
    auth_client = SourceMicrosoftOneDriveClient(config)
    
    try:
        graph_client = auth_client.client
        user_drive = graph_client.users.get_by_principal_name(user_principal).drive.get().execute_query()
        print(f"Accessed drive for {user_principal}: {user_drive.name}")
    except Exception as e:
        print(f"Failed to access drive for {user_principal}: {e}")

Error Handling

Authentication Errors

from msal.exceptions import MsalServiceError

try:
    token = auth_client._get_access_token()
except MsalServiceError as e:
    error_code = e.error_code
    error_description = e.error_description
    print(f"MSAL Error {error_code}: {error_description}")

Common Error Scenarios

  • Invalid Credentials: Wrong client_id, client_secret, or tenant_id
  • Expired Refresh Token: OAuth refresh token has expired
  • Insufficient Permissions: Application lacks required Microsoft Graph permissions
  • Tenant Access Issues: Service principal not configured for tenant access
  • Network Connectivity: Unable to reach Microsoft authentication endpoints

Security Considerations

Credential Protection

  • All authentication credentials are marked as secrets in configuration
  • Tokens are not logged or exposed in error messages
  • HTTPS-only communication with Microsoft authentication endpoints

Permission Scopes

The connector uses https://graph.microsoft.com/.default scope which provides:

  • Read access to OneDrive files and folders
  • Access to drive metadata and sharing information
  • User profile information for service principal flows

Token Lifecycle

  • Access tokens typically expire after 1 hour
  • Refresh tokens have longer lifetime (varies by tenant configuration)
  • Service principal tokens are acquired on-demand for each operation
  • Failed authentication triggers immediate retry with fresh token acquisition

Microsoft Graph API Integration

The authentication client provides a configured GraphClient for:

  • Drive enumeration and access
  • File and folder operations
  • Shared item discovery
  • User profile access (service principal mode)
  • Metadata extraction and file streaming

Install with Tessl CLI

npx tessl i tessl/pypi-source-microsoft-onedrive

docs

authentication.md

configuration.md

file-operations.md

index.md

source-connector.md

tile.json