Sqoop Import & Export

Overview

The Sqoop Import and Export job entries allow users to orchestrate efficient bulk loads to and from their structured datastores, such as their relational databases, and their Hadoop cluster as part of a PDI job via Sqoop. We interface with the Sqoop command line tool to provide seamless integration for those users already using Sqoop.

Modes of Operation

There are two modes of operation: "Quick Setup" and "Advanced Mode", for both import and export. Quick Setup exposes only the required/minimum options to successfully complete a Sqoop import/export from/to HDFS and an RDBMS. Advanced Mode allows the user to configure all possible settings for the Sqoop import/export in addition to supplying an existing working command line configuration.

Screenshots (as released in PDI 4.4.0-GA and Pentaho Suite 4.8)

Quick Setup

Advanced Options (List)

Advanced Options (Command Line)

Developer Notes

  • Base implementations for a "blocking" job entry were created to support multiple job entries that require launching processes and polling for their completion.
  • The notion of a "job config" was introduced as the model behind a job that can be serialized to disk/repository easily without the developer needed to implement their own serialization mechanism. (Note: this is something that should be intrinsic to Kettle but is currently not.)