or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/apache-airflow-backport-providers-apache-sqoop@2021.3.x
tile.json

tessl/pypi-apache-airflow-backport-providers-apache-sqoop

tessl install tessl/pypi-apache-airflow-backport-providers-apache-sqoop@2021.3.0

Apache Airflow backport provider package for Apache Sqoop integration, providing SqoopHook and SqoopOperator for data import/export between relational databases and Hadoop

Agent Success

Agent success rate when using this tile

92%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.39x

Baseline

Agent success rate without this tile

66%

task.mdevals/scenario-6/

HCatalog Import Job

Create a small utility that pulls data from a relational source directly into a Hive/HCatalog table, using the package's HCatalog-aware import support and an opt-in table creation flag.

Capabilities

Imports into HCatalog

  • When given a source table, warehouse directory, HCatalog database, and HCatalog table, the job imports the table into that Hive location (not just a raw HDFS folder) and reports the final warehouse path. @test

Optional table creation

  • If the target Hive table is missing and create_table is true, the job creates it before loading; if create_table is false, a missing table yields a clear failure, and an existing table is left intact. @test

Static partition load

  • When provided a partition mapping (for example {"dt": "2024-11-11"}), the import lands data under that static partition path within the Hive table without overwriting other partitions. @test

Implementation

@generates

API

from dataclasses import dataclass
from typing import Dict, Optional

@dataclass
class HCatalogImportConfig:
    source_table: str
    warehouse_dir: str
    hcatalog_database: str
    hcatalog_table: str
    create_table: bool
    partition: Optional[Dict[str, str]] = None
    split_by: Optional[str] = None
    num_mappers: int = 1
    extra_options: Optional[Dict[str, str]] = None

@dataclass
class ImportResult:
    warehouse_path: str
    rows_imported: int

def run_hcatalog_import(config: HCatalogImportConfig) -> ImportResult:
    """Runs an HCatalog-targeted import job using the configured data movement provider."""

Dependencies { .dependencies }

apache-airflow-providers-apache-sqoop { .dependency }

Provides Sqoop-based HCatalog import capabilities and optional Hive table creation.