Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

Description

Start a PDI Cluster on YARN is used to start a cluster of carte servers on Hadoop nodes, assign them ports, and pass cluster access credentials. When this step is run and a cluster is created, the metadata for that cluster is stored in the shared.xml file or, if you are using the enterprise repository, in the DI Repository. For more information on carte clusters, see Use Carte Clusters in the Pentaho Help documentation.

In earlier versions of Spoon, this step was labeled Start a YARN Kettle Cluster.

Context

Use this step to start a cluster of carter servers. The carte servers in the cluster will continue to run until a Stop a PDI Cluster on YARN step is executed, or you manually stop the cluster.  

...

If you assign the cluster a name that has not been used before, you will need to create a cluster schema in Spoon. You only need to specify the cluster name when you create the cluster schema, see the Create a Cluster Schema in Spoon topic in the Pentaho Help documentation for more information. A YARN hadoop configuraiton should already be configured.  Information on configuring a YARN hadoop configuration appears in Additional Configuration for YARN Shims.

Options

You can configure the cluster through the Start Kettle Cluster on YARN dialog, which appears when you double-click on the job icon. This dialog contains a Step Name field and 2 tabs. The Step Name field is the entry name, which can be customized or left as the default. The 2 tabs enable you to configure the Cluster and Files.

Cluster

The items in the Cluster tab contain cluster configuration details:

...

If you run the job from a user's user’s PDI installation, the config files from that user's user’s KETTLE_HOME directory are used. If the job is scheduled or otherwise runs on a Pentaho DI Server, the config files from that server's server’s configured KETTLE_HOME are copied when the job starts.

If you want to use different configuration files from what is in your and the server's server’s KETTLE_HOME directories, you should copy those files manually into the YARN workspace folder and ensure the corresponding checkboxes in the Copy local resource files to YARN section of the Files are not selected.

...