tessl install tessl/pypi-apache-airflow-backport-providers-apache-sqoop@2021.3.0Apache Airflow backport provider package for Apache Sqoop integration, providing SqoopHook and SqoopOperator for data import/export between relational databases and Hadoop
Agent Success
Agent success rate when using this tile
92%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.39x
Baseline
Agent success rate without this tile
66%
Create an Airflow DAG that orchestrates a Sqoop-based data exchange between a relational source and HDFS, then sends curated files back into a warehouse table. The DAG should be easy to configure via parameters and macros, and should behave safely when runs are cancelled.
hdfs_import_dir/<ds_nodash>/<source_table>. The import must leverage the provided split column for parallel mappers. @testexport_dir contents into target_table in batch mode, with explicit null sentinels for string and non-string fields applied during export. @test@generates
from airflow.models import DAG
def create_dag(
dag_id: str,
schedule: str,
conn_id: str,
source_table: str,
split_column: str,
hdfs_import_dir: str,
export_dir: str,
target_table: str,
) -> DAG:
"""
Returns an Airflow DAG that loads filtered rows from the source table into a dated HDFS path
and exports a curated directory back into a warehouse table using the Sqoop provider.
"""Provides Sqoop import/export operator for DAG tasks. @satisfied-by