or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/apache-airflow-backport-providers-apache-sqoop@2021.3.x
tile.json

tessl/pypi-apache-airflow-backport-providers-apache-sqoop

tessl install tessl/pypi-apache-airflow-backport-providers-apache-sqoop@2021.3.0

Apache Airflow backport provider package for Apache Sqoop integration, providing SqoopHook and SqoopOperator for data import/export between relational databases and Hadoop

Agent Success

Agent success rate when using this tile

92%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.39x

Baseline

Agent success rate without this tile

66%

task.mdevals/scenario-8/

HDFS Query Import Job

Move results of a SQL query from a JDBC source into an HDFS directory with format controls and optional parallelization.

Capabilities

Query-driven import

  • Accepts a SQL query containing $CONDITIONS and writes results to a specified HDFS directory, returning the path when successful. @test

Append vs overwrite

  • When append is False, the destination directory is cleared before writing new results; when True, new rows are appended without removing existing files. @test

File format selection

  • Supports writing as text or parquet; any other file_format value raises a ValueError. @test

Parallelization controls

  • Applies an optional split_by column and num_mappers count to drive parallel import shards; defaults to a single mapper when omitted. @test

Implementation

@generates

API

from typing import Optional, Dict, Any

def import_query_to_hdfs(
    conn_id: str,
    query: str,
    target_dir: str,
    *,
    file_format: str = "text",
    append: bool = False,
    split_by: Optional[str] = None,
    num_mappers: int = 1,
    extra_options: Optional[Dict[str, Any]] = None,
) -> str:
    """
    Executes a SQL query against the connection identified by conn_id and imports the results into target_dir on HDFS.
    Raises ValueError if query lacks the required placeholder for partitioning or if file_format is unsupported.
    Additional import flags can be supplied via extra_options for passthrough to the underlying import tool.
    Returns the HDFS directory containing imported data.
    """

Dependencies { .dependencies }

apache-airflow-providers-apache-sqoop { .dependency }

Provides Sqoop-based JDBC-to-HDFS import support.