CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-apache-airflow-backport-providers-apache-sqoop

Apache Airflow backport provider package for Apache Sqoop integration, providing SqoopHook and SqoopOperator for data import/export between relational databases and Hadoop

Overall
score

92%

Overview
Eval results
Files

task.mdevals/scenario-1/

HDFS Table Import Utility

A small utility that moves an entire relational table into HDFS using the bundled Sqoop provider dependency.

Capabilities

Import full table

  • When given a connection ID, table name, and target directory, it runs a dependency-backed import into HDFS, defaults to Parquet output, and returns the final HDFS directory path. @test

Filtered column import

  • Accepts an optional list of column names and a WHERE fragment; imports only those columns and rows that satisfy the filter. @test

Append versus replace

  • Supports toggling append mode; when append is false it replaces existing data at the target directory, and when true it preserves existing files and appends new rows. @test

Parallel and driver controls

  • Allows specifying a split-by column and mapper count for parallel import, and optionally enables direct-connect fast paths plus a JDBC driver override for non-default dialects. @test

Implementation

@generates

API

from typing import Iterable, Optional

def import_table_to_hdfs(
    conn_id: str,
    table: str,
    target_dir: str,
    columns: Optional[Iterable[str]] = None,
    where: Optional[str] = None,
    append: bool = False,
    split_by: Optional[str] = None,
    num_mappers: Optional[int] = None,
    direct: bool = False,
    driver: Optional[str] = None,
    file_format: str = "parquet",
) -> str:
    """
    Imports a relational table into HDFS using the provided dependency.
    Returns the HDFS directory path where data landed.
    Raises an exception if the import fails.
    """

Dependencies { .dependencies }

apache-airflow-backport-providers-apache-sqoop { .dependency }

Provides Sqoop-based import support for moving relational data into HDFS.

Install with Tessl CLI

npx tessl i tessl/pypi-apache-airflow-backport-providers-apache-sqoop

tile.json