or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/apache-airflow-backport-providers-apache-sqoop@2021.3.x
tile.json

tessl/pypi-apache-airflow-backport-providers-apache-sqoop

tessl install tessl/pypi-apache-airflow-backport-providers-apache-sqoop@2021.3.0

Apache Airflow backport provider package for Apache Sqoop integration, providing SqoopHook and SqoopOperator for data import/export between relational databases and Hadoop

Agent Success

Agent success rate when using this tile

92%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.39x

Baseline

Agent success rate without this tile

66%

task.mdevals/scenario-3/

Sqoop Data Bridge DAG

Create an Airflow DAG that orchestrates a Sqoop-based data exchange between a relational source and HDFS, then sends curated files back into a warehouse table. The DAG should be easy to configure via parameters and macros, and should behave safely when runs are cancelled.

Capabilities

Import filtered rows into dated HDFS directories

  • When triggered for a run date like 2025-02-10, builds an import that reads only rows updated on or after that date and writes to hdfs_import_dir/<ds_nodash>/<source_table>. The import must leverage the provided split column for parallel mappers. @test
  • The import command must render its WHERE filter and target path via templates/macros rather than hardcoded literals so each run pulls only that run's slice. @test

Export curated directory back to warehouse table

  • Exports the provided export_dir contents into target_table in batch mode, with explicit null sentinels for string and non-string fields applied during export. @test

DAG orchestration and cancellation safety

  • The export task must wait for the import task and be skipped when the import fails so no partial data is pushed. @test
  • Cancelling the DAG run should terminate any active Sqoop child process started by a running task to avoid orphaned jobs. @test

Implementation

@generates

API

from airflow.models import DAG

def create_dag(
    dag_id: str,
    schedule: str,
    conn_id: str,
    source_table: str,
    split_column: str,
    hdfs_import_dir: str,
    export_dir: str,
    target_table: str,
) -> DAG:
    """
    Returns an Airflow DAG that loads filtered rows from the source table into a dated HDFS path
    and exports a curated directory back into a warehouse table using the Sqoop provider.
    """

Dependencies { .dependencies }

Apache Airflow Sqoop provider { .dependency }

Provides Sqoop import/export operator for DAG tasks. @satisfied-by