tessl install tessl/pypi-apache-airflow-backport-providers-apache-sqoop@2021.3.0Apache Airflow backport provider package for Apache Sqoop integration, providing SqoopHook and SqoopOperator for data import/export between relational databases and Hadoop
Agent Success
Agent success rate when using this tile
92%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.39x
Baseline
Agent success rate without this tile
66%
Move results of a SQL query from a JDBC source into an HDFS directory with format controls and optional parallelization.
$CONDITIONS and writes results to a specified HDFS directory, returning the path when successful. @testappend is False, the destination directory is cleared before writing new results; when True, new rows are appended without removing existing files. @testtext or parquet; any other file_format value raises a ValueError. @testsplit_by column and num_mappers count to drive parallel import shards; defaults to a single mapper when omitted. @test@generates
from typing import Optional, Dict, Any
def import_query_to_hdfs(
conn_id: str,
query: str,
target_dir: str,
*,
file_format: str = "text",
append: bool = False,
split_by: Optional[str] = None,
num_mappers: int = 1,
extra_options: Optional[Dict[str, Any]] = None,
) -> str:
"""
Executes a SQL query against the connection identified by conn_id and imports the results into target_dir on HDFS.
Raises ValueError if query lacks the required placeholder for partitioning or if file_format is unsupported.
Additional import flags can be supplied via extra_options for passthrough to the underlying import tool.
Returns the HDFS directory containing imported data.
"""Provides Sqoop-based JDBC-to-HDFS import support.