Sqoop Import

The Sqoop Import job allows you to import data from a relational database into the Hadoop Distributed File System (HDFS) using Apache Sqoop. This job has two setup modes:

Quick Mode provides the minimum options necessary to perform a successful Sqoop import.
Advanced Mode's default view provides options for to better control your Sqoop import. Advanced Mode also has a command line view which allows you to paste an existing Sqoop command line argument into.

For additional information about Apache Sqoop, visit http://sqoop.apache.org/.

Quick Setup

Option	Definition
Name	The name of this job as it appears in the transformation workspace.
Database Connection	Select the database connection to import from. Clicking Edit... allows you to edit an existing connection or you can create a new connection from this dialog by clicking New....
Table	Source table to import from. If the source database requires it a schema may be supplied in the format: SCHEMA.YOUR_TABLE_NAME.
Namenode Host	Host name of the target Hadoop NameNode.
Namenode Port	Port number of the target Hadoop NameNode.
Jobtracker Host	Host name of the target Hadoop JobTracker.
Job Tracker Port	Port number of the target Hadoop JobTracker.
Target Directory	Path of the directory to import into.

Advanced Setup

Option	Definition
Default/List view	List of property and value pair settings which can be modified to suit your needs including options to configure an import to Hive or HBase.
Command line view	Field which accepts command line arguments, typically used to allow you to paste an existing Sqoop command line argument.

Additional Instructions

This section contains additional instructions for using this step.

Import Sqoop Table to an HBase Table (MapR 3.1)

If you want to run a job that uses Sqoop to import data to an HBase table on a mapr 3.1 (or higher) secured cluster, you will need to specify the path to the mapr security jar in the Sqoop Import job entry.

Add the Sqoop Import entry for your job.
Select the Advanced Options link, then click the List View icon.
Set the hbase and other arguments needed for your job.
Click the Command Line View icon.
In the Command Line field, set the libjars parameter to the path to the mapr security jar. The path you enter depends on whether you plan to run the job locally on the Spoon node or remotely on the DI Server node.

Local (Spoon) Node Path: -libjars plugins/pentaho-big-data-plugin/hadoop-configurations/mapr31/lib/pentaho-hadoop-shims-mapr31-security-<version number>.jar
Remote (DI Server) Node Path: -libjars ../../pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr31/lib/pentaho-hadoop-shims-mapr31-security-<version number>.jar

Note: Replace <version number> with the version number of the pentaho-hadoop-shims-mapr31-security jar file that is on your system.

6. Click OK to close the Sqoop Import entry.