Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Sqoop Import

The Sqoop Import job allows you to import data from a relational database into the Hadoop Distributed File System (HDFS) using Apache Sqoop. This job has two setup modes:

  • Quick Mode provides the minimum options necessary to perform a successful Sqoop import.
  • Advanced Mode's default view provides options for to better control your Sqoop import. Advanced Mode also has a command line view which allows you to paste an existing Sqoop command line argument into.

For additional information about Apache Sqoop, visit http://sqoop.apache.org/.

Quick Setup

Option

Definition

Name

The name of this job as it appears in the transformation workspace.

Database Connection

Select the database connection to import from. Clicking Edit... allows you to edit an existing connection or you can create a new connection from this dialog by clicking New....

Table

Source table to import from. If the source database requires it a schema may be supplied in the format: SCHEMA.YOUR_TABLE_NAME.

Namenode Host

Host name of the target Hadoop NameNode.

Namenode Port

Port number of the target Hadoop NameNode.

Jobtracker Host

Host name of the target Hadoop JobTracker.

Job Tracker Port

Port number of the target Hadoop JobTracker.

Target Directory

Path of the directory to import into.

Advanced Setup

Option

Definition

Default/List view

List of property and value pair settings which can be modified to suit your needs including options to configure an import to Hive or HBase.

Command line view

Field which accepts command line arguments, typically used to allow you to paste an existing Sqoop command line argument.

Additional Instructions

This section contains additional instructions for using this step.

Import Sqoop Table to an HBase Table (MapR 3.1)

If you want to run a job that uses Sqoop to import data to an HBase table on a mapr 3.1 (or higher) secured cluster, you will need to specify the path to the mapr security jar in the Sqoop Import job entry. 

  1. Add the Sqoop Import entry for your job.
  2. Select the Advanced Options link, then click the List View icon. 
  3. Set the hbase and other arguments needed for your job.
  4. Click the Command Line View icon.
  5. In the Command Line field, set the libjars parameter to the path to the mapr security jar.  The path you enter depends on whether you plan to run the job locally on the Spoon node or remotely on the DI Server node. 
  • Local (Spoon) Node Path: -libjars plugins/pentaho-big-data-plugin/hadoop-configurations/mapr31/lib/pentaho-hadoop-shims-mapr31-security-<version number>.jar 
  • Remote (DI Server) Node Path:  -libjars ../../pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr31/lib/pentaho-hadoop-shims-mapr31-security-<version number>.jar

Note: Replace <version number> with the version number of the pentaho-hadoop-shims-mapr31-security jar file that is on your system. 

    6. Click OK to close the Sqoop Import entry.​

  • No labels