or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

admin-operations.mddatabase-operations.mdindex.md
tile.json

tessl/pypi-apache-airflow-providers-apache-pinot

Apache Airflow provider package enabling integration with Apache Pinot for real-time analytics queries and administrative operations.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/apache-airflow-providers-apache-pinot@4.8.x

To install, run

npx @tessl/cli install tessl/pypi-apache-airflow-providers-apache-pinot@4.8.0

index.mddocs/

Apache Airflow Providers Apache Pinot

Apache Airflow provider package that enables integration with Apache Pinot, a real-time distributed OLAP datastore designed for ultra-low latency analytics. The provider offers comprehensive connectivity for both SQL-based queries and administrative operations.

Package Information

  • Package Name: apache-airflow-providers-apache-pinot
  • Language: Python
  • Installation: pip install apache-airflow-providers-apache-pinot

Core Imports

from airflow.providers.apache.pinot.hooks.pinot import PinotDbApiHook, PinotAdminHook

Version access:

from airflow.providers.apache.pinot import __version__

Basic Usage

Query Operations

from airflow.providers.apache.pinot.hooks.pinot import PinotDbApiHook

# Initialize database hook (uses default connection ID)
hook = PinotDbApiHook()

# Execute SQL queries
sql = "SELECT COUNT(*) FROM my_table WHERE timestamp > '2023-01-01'"
results = hook.get_records(sql)

# Get first record only
first_result = hook.get_first(sql)

# Get connection URI
uri = hook.get_uri()

Administrative Operations

from airflow.providers.apache.pinot.hooks.pinot import PinotAdminHook

# Initialize admin hook
admin_hook = PinotAdminHook(conn_id='pinot_admin_default')

# Add schema
admin_hook.add_schema('path/to/schema.json')

# Add table
admin_hook.add_table('path/to/table_config.json')

# Create and upload segment
admin_hook.create_segment(
    table_name='my_table',
    data_dir='/path/to/data',
    out_dir='/path/to/segments'
)
admin_hook.upload_segment('/path/to/segments/my_segment')

Architecture

The provider is built around two main hook classes that handle different aspects of Pinot integration:

  • PinotDbApiHook: Extends Apache Airflow's DbApiHook to provide SQL query capabilities against Pinot brokers using the pinotdb client library
  • PinotAdminHook: Wraps the pinot-admin.sh command-line tool for administrative operations like schema management, table creation, and segment operations

Both hooks integrate with Airflow's connection management system, supporting authentication and configuration through Airflow connections.

Capabilities

Database Query Operations

SQL-based querying capabilities for retrieving data from Pinot clusters through the broker API. Supports standard SQL queries with result retrieval in various formats.

class PinotDbApiHook(DbApiHook):
    def get_conn(self): ...
    def get_uri(self) -> str: ...
    def get_records(self, sql: str | list[str], parameters=None, **kwargs): ...
    def get_first(self, sql: str | list[str], parameters=None): ...

Database Query Operations

Administrative Operations

Administrative functionality for managing Pinot clusters including schema management, table configuration, segment creation, and data ingestion workflows.

class PinotAdminHook(BaseHook):
    def add_schema(self, schema_file: str, with_exec: bool = True): ...
    def add_table(self, file_path: str, with_exec: bool = True): ...
    def create_segment(self, **kwargs): ...
    def upload_segment(self, segment_dir: str, table_name: str | None = None): ...
    def run_cli(self, cmd: list[str], verbose: bool = True) -> str: ...

Administrative Operations

Connection Types

The provider registers two Airflow connection types:

  • pinot: For database query operations via PinotDbApiHook
  • pinot_admin: For administrative operations via PinotAdminHook

Version Compatibility

from airflow.providers.apache.pinot.version_compat import AIRFLOW_V_3_1_PLUS, BaseHook

def get_base_airflow_version_tuple() -> tuple[int, int, int]: ...

The provider maintains compatibility with Airflow 2.10+ and includes version-specific compatibility handling for different Airflow releases.

Types

# Connection imports
from typing import TYPE_CHECKING, Any, Iterable, Mapping
from collections.abc import Iterable, Mapping

if TYPE_CHECKING:
    from airflow.models import Connection