Carte Configuration

Note: Please see also the latest content Configure Static and Dynamic Carte Clusters on help.pentaho.com (extended properties since PDI version 5.3). General information can be found in Use Carte Clusters.

Carte Configuration

There are 2 ways to configure a Carte instance.  One is by simply specifying a hostname and a port on the command line:

sh carte.sh localhost 8080

This command will launch a Carte slave server on your machine on port 8080 with default values.

Specifying options on the command line is not ideal so Pentaho create an XML file format for administrators that want to better control over the functionality of the Carte service.

sh carte.sh configuration.xml

XML Configuration File Format

All options are placed under the slave_config root element:

<slave_config>
...
</slave_config>

Example

<slave_config>
  <slaveserver>
    <name>Slave01</name>
    <hostname>localhost</hostname>
    <port>9081</port>
  </slaveserver>


  <masters>
    <slaveserver>
      <name>master1</name>
      <hostname>localhost</hostname>
      <port>9080</port>
      <!--<webAppName>pentaho-di</webAppName>-->
      <username>admin</username>
      <password>password</password>
      <master>Y</master>
    </slaveserver>
  </masters>


  <report_to_masters>Y</report_to_masters>

  <max_log_lines>10000</max_log_lines>
  <max_log_timeout_minutes>1440</max_log_timeout_minutes>
  <object_timeout_minutes>1440</object_timeout_minutes>
</slave_config>

Carte options

To specify on which hostname, interface or port the Carte slave server runs, you should specify a slave_server element:

<slaveserver>
  <name>slave server name</name>
  <hostname>hostname</hostname>
  <port>port-number</port>
</slaveserver>

Optionally you can also add the networking_interface option instead of the hostname.  The IP-address to run on will then be automatically determined by looking at the IP-address of the specified interface.  For example, if you have an eth0 interface containing IP-address 192.168.1.3, then that's the address that will be taken.

Tuning options

You can place the following elements under the root element:

Property

Value

Description

max_log_lines

Integer value indicating the maximum number of log lines kept in memory

When set to 0 all log lines are kept in memory until a job or transformation is removed from memory. This value is not exact. A few hundred extra rows might be kept in memory for performance reason.

max_log_timeout_minutes

Integer value configuring the longest time a log line is kept in memory.

When set to 0 all log lines are kept in memory until a job or transformation is removed from memory. This value is not exact.  The log buffer will only be verified periodically (every minute).

object_timeout_minutes

Integer value configuring the longest time a transformation or a job is kept in memory.

While it is nice to be able to verify if a transformation or a job ran correctly by asking Carte or the DI Server, there usually isn't a need to keep execution history around indefinitely. With this parameter you can specify the number of minutes after which the object in question is removed from memory. A value of 0 means that all objects will be kept forever or until they are manually removed. This value is not exact.  The execution history or transformations and jobs will only be cleaned periodically.

for example:

<slave_config>
  <max_log_lines>10000</max_log_lines>
  <max_log_timeout_minutes>1440</max_log_timeout_minutes>
  <object_timeout_minutes>1440</object_timeout_minutes>
</slave_config>

Dynamic clustering options

If you want your slave server to be part of a dynamic cluster you need tell it which master to report to.  You can do this by specifying a masters element and by adding a report_to_master element set to Y.  For example:

<masters>

  <slaveserver>
    <name>master1</name>
    <hostname>localhost</hostname>
    <port>8080</port>
    <!--<webAppName>pentaho-di</webAppName>-->
    <username>cluster</username>
    <password>cluster</password>
    <master>Y</master>
  </slaveserver>

</masters>

<report_to_masters>Y</report_to_masters>

(all below the root element)

Note: If a DI Server will be the master, an additional "webAppName" element must be included under the "slaveserver" element.

Defining a repository connection

Starting from PDI version 5 you can specify a connection to a repository.  This connection will typically be used for the following reasons:

  • to automatically find transformations that can act as a service (SQL/JDBC table), see also the Thin JDBC driver page.
  • to allow jobs defined in the repository to be executed through a servlet.  You can for execute a job in the repository by using the following URL format:  http://hostname:port/kettle/runJob/?job=/path/to/jobname&level=DebugLevel&ParameterName=ParameterValue*
  • to allow transformations defined in the repository to be executed through a servlet.  You can for execute a transformation in the repository by using the following URL format:  http://hostname:port/kettle/runTrans/?trans=/path/to/transname&level=DebugLevel&ParameterName=ParameterValue*

To define a repository, simply add a repository element to the root.  Please keep in mind that "Username and password refers to the repository, not the carte server:

<repository>
  <name>Repository Name (id)</name>
  <username>username</username>
  <password>password</password>
</repository>

Sequences

Carte is also capable of handing out sequence ranges for usage in a clustering world.  One usecase is the Get ID from slave server step.     You can configure this by using the sequences element:  <sequences>

<sequences>
 <sequence>
  <name>test</name>
  <start>0</start>
  <connection>MySQL</connection>
  <schema/>
  <table>SEQ_TABLE</table>
  <sequence_field>SEQ_NAME</sequence_field>
  <value_field>SEQ_VALUE</value_field>
 </sequence>
</sequences>

Since it's a burden to manually add each sequence in the config file, it is also possible to automatically configure sequences with the autosequence element:

<autosequence>
 <connection>MySQL</connection>
 <schema/>
 <start>1234</start>
 &nbsp;<table>SEQ_TABLE</table>
 <sequence_field>SEQ_NAME</sequence_field>
 <value_field>SEQ_VALUE</value_field>

 <autocreate>N</autocreate>
 </autosequence>

Please note that in both cases you need to add a connection XML element.  You can copy this information from Spoon (right click on a connection, copy XML):

<connection>
 <name>MySQL</name>
 <server>localhost</server>
 <type>MYSQL</type>
 <access>Native</access>
 <database>test</database>
 <port>3306</port>
 <username>matt</username>
 <password>Encrypted 2be98afc86aa7f2e4cb79ce10df90acde</password>
</connection>