.01 Introduction to Spoon

Unknown macro: {scrollbar}

This page does not contain the latest version of Spoon documentation.  To learn more about Spoon or any other PDI topic, visit http://help.pentaho.com.

What is Spoon?

Kettle is an acronym for "Kettle E.T.T.L. Environment." Kettle is designed to help you with your ETTL needs, which include the Extraction, Transformation, Transportation and Loading of data.

Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the Kettle tools — Pan and Kitchen. Pan is a data transformation engine that performs a multitude of functions such as reading, manipulating, and writing data to and from various data sources. Kitchen is a program that executes jobs designed by Spoon in XML or in a database repository. Jobs are usually scheduled in batch mode to be run automatically at regular intervals.

Note: For a complete description of Pan or Kitchen, see the Pan and Kitchen user guides.

Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository. Pan or Kitchen can then read the data to execute the steps described in the transformation or to run the job. In summary, Pentaho Data Integration makes data warehouses easier to build, update, and maintain.

The following topics are covered in this section:

Installing Spoon


Follow the instructions below to install Spoon:

  1. Install the Sun Microsystems Java Runtime Environment version 1.5 or higher. You can download a JRE for free at http://www.javasoft.com/.
  2. Unzip the binary distribution zip-file in a directory of your choice.
  3. Under Unix-like environments (Solaris, Linux, MacOS, for example), you must make the shell scripts executable. Execute these commands to make all shell scripts in the Kettle directory executable:
cd Kettle
chmod +x *.sh

Launching Spoon


The scripts below allow you to launch Spoon on different platforms:

  • Spoon.bat: launch Spoon on the Windows platform.
  • spoon.sh: launch Spoon on a Unix-like platform such as Linux, Apple OSX, Solaris

To make a shortcut under the Windows platform, an icon is provided. Use "spoon.ico" to set the correct icon. Point the shortcut to the spoon.bat file.

Supported Platforms


The Spoon GUI is supported on the following platforms:

  • Microsoft Windows: all platforms since Windows 95, including Vista
  • Linux GTK: on i386 and x86_64 processors, works best on Gnome
  • Apple's OSX: works both on PowerPC and Intel machines
  • Solaris: using a Motif interface (GTK optional)
  • AIX: using a Motif interface
  • HP-UX: using a Motif interface (GTK optional)
  • FreeBSD: preliminary support on i386, not yet on x86_64
     

Known Issues


Below is a list of known issues associated with Spoon.

Linux
Occasional JVM crashes running SuSE Linux and KDE. Running under Gnome presents no problems. (detected on SUSE Linux 10.1 but earlier versions also have the same problem)

FreeBSD
Problems with drag and drop. Use the right click pop up menu on the canvas as a workaround.

Check the Tracker lists at http://jira.pentaho.com for up-to-date information about recently discovered issues.

User Interface Overview


The Main tree in the upper-left panel of Spoon allows you to browse connections associated with the jobs and transformations you have open. When designing a transformation, the Core Objects palate in the lower left-panel contains the available steps used to build your transformation including input, output, lookup, transform, joins, scripting steps and more. When designing a job, the Core objects palate contains the available job entries. The Core Objects bar contains a variety of job entry types. These items are described in detail in the following chapters: .03 Database Connections, .06 Hops, .09 Transformation Steps, .11 Job Entries, .12 Graphical View.

Command Line Options

Below are the command line options that you can use when starting the Spoon application:

-file=<filename>

Runs the specified transformation or job

Default: Last opened files

  • .ktr : Kettle Transformation
  • .kjb : Kettle Job
-file=transform.ktr


-logfile=<Logging Filename>

Specify the location of the log file.

Default: Standard output.

-logfile=log_file.txt


-level=<Logging Level> sets the log level for the transformation being run.

-level=Basic

Below are the possible values:

Nothing

Do not display any output

Error

Only display errors

Minimal

Use minimal logging

Basic

This is the default basic logging level

Detailed

Give detailed logging output

Debug

Show detailed output for debugging purposes.

Rowlevel

Detailed logging at a row level. Warning - this generates a lot of data.


-rep=<Repository name> -user=<Username> -pass=<password> -trans=<TransName>

-rep=<Repository name> -user=<Username> -pass=<password> -job=<JobName> 

Automatically load a transformation or a job from a Repository upon startup.

You must specify the options -user, -pass , -job and*-trans*.

The repository details are loaded from the filerepositories.xml in the local directory or in the Kettle directory:

  • $HOME/.kettle/
  • C:\Documents and Settings\<username>\.kettle on Windows.

Default: Dialog box will ask for Repository details

-rep=repos -user=admin -pass=adminpasswd -trans=trans.ktr

* Important*:

  1. On Windows, Pentaho advises you to use the /option:value format to avoid command line parsing problems by the MS-DOS shell.
  2. Fields in italic represent the values that the options use.
  3. Use quotes or double quotes if there are spaces in option values; quotes ensure that option values stay together.

Repository


Spoon allows you to store transformation and job files to the local file system or in the Kettle repository. The Kettle repository can be housed in any common relational database. To load a transformation from a database repository, you must connect to this repository. Define a database connection to the repository when you start Spoon, as shown below:

 

 

The information associated with repositories is stored in "repositories.xml". This file is located in the hidden directory ".kettle" in your default home directory. On Windows, the file is located in C:\Documents and Settings\.kettle

Note: The complete path and file name associated with repositories.xml file is displayed on the Spoon console.
If you don't want Welcome dialog box to appear each time Spoon starts up, disable Present this dialog at startup or use the Options dialog under the Edit / Options menu.

Note: The default password for the administrator is admin. Pentaho strongly recommends that you change this default password using the Repository Explorer or the Repository/Edit User menu.

Repository Auto-Login

You can have Spoon automatically log into the repository by setting the following environment variables: KETTLE_REPOSITORY, KETTLE_USER and KETTLE_PASSWORD.

(warning) Important: Because logging automatically poses a security risk, Pentaho strongly recommends that you always lock your computer to prevent unauthorized access to the repository.

License


Beginning with version 2.2.0, Kettle was released into the public domain under the LGPL license. Please refer to Appendix A for the full text of this license.

Note: Pentaho Data Integration is referred to as "Kettle" below.

Copyright (C) 2006 Pentaho Corporation

Kettle is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

Kettle is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with the Kettle distribution; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Transformation Definitions


The table below contains a list of transformation definitions:

Transformation

Description

Value

Values are part of a row and can contain any type of data: strings, floating point numbers, unlimited precision BigNumbers, integers, dates or boolean values

Row

A row consists of 0 or more values that are processed together as a single entry.

Input Stream

A stack of rows that enters a step

Hop

A graphical representation of one or more data streams between two steps; a hop always represents the output stream for one step and the input stream for another — the number of streams is equal to the copies of the destination step (one or more)

Note

Descriptive text that can be added to a transformation

Job Definitions


The table below contains a list of job definitions:

Job

Description

Job Entry

A part of a job that performs a specific task

Hop

A graphical representation of one or more data streams between two steps; a hop always represents the output stream for one step and the input stream for another — the number of streams is equal to the copies of the destination step (one or more)

Note

Descriptive text that that can be added to a job

Toolbar Icons


The icons on the toolbar of the main screen are from left to right:

Icon

Description

Create a new job or transformation

Open transformation/job from file if you're not connected to a repository or from the repository if you are connected to one.

Save the transformation/job to a file or to the repository.

Save the transformation/job under a different name or filename.

Open the print dialog.

Run transformation/job: runs the current transformation from XML file or repository.

Preview transformation: runs the current transformation from memory. You can preview the rows that are produced by selected steps.

Run the transformation in debug mode allowing you to troubleshoot execution errors.

Replay the processing of a transformation for a certain date and time. This will cause certain steps (Text File Input and Excel Input) to only process rows that failed to be interpreted correctly during the run on that particular date and time.

Verify transformation: Spoon runs a number of checks for every step to see if everything is going to run as it should.

Run an impact analysis: what impact does the transformation have on the used databases.

Generate the SQL that is needed to run the loaded transformation.

Launches the database explorer allowing you to preview data, run SQL queries, generate DDL and more.

Kettle Options


Kettle options allow you to customize properties associated with the behavior and look and feel of the graphical user interface. Examples include startup options such as whether or not to display tips and the Kettle Welcome Page, and user interface options such as fonts and colors. To access the options dialog, select Edit, then Options... from the menu bar.

General Tab

Feature

Description

Maximum Undo Level

Sets the maximum number of steps that can be undone (or redone) by Spoon

Default number of lines in preview dialog

Allows you to change the default number of rows that are requested from a step during transformation previews

Maximum nr of lines in the logging windows

Specifies the maximum limit of rows to display in the logging window

Show tips at startup?

Sets the display of tips at startup

Show welcome page at startup?

Controls whether or not to display the Welcome page when launching Spoon

Use database cache?

Spoon caches information that is stored on the source and target databases. In some instances, caching causes incorrect results when you are making database changes. To prevent errors you can disable the cache altogether instead of clearing the cache every time.

Note: Spoon automatically clears the database cache when you launch DDL (Data Definition Language) statements towards a database connection; however, when using third party tools, clearing the database cache manually may be necessary.

Feature

Description

Open last file at startup?

Loads the last transformation you used (opened or saved) from XML or repository automatically

Auto save changed files?

Automatically saves a changed transformation before running

Only show the active file in the main tree?

Reduces the number of transformation and job items in the main tree on the left by only showing the currently active file

Only save used connections to XML?

Limits the XML export of a transformation to the used connections in that transformation. This is helpful while exchanging sample transformations to avoid having all defined connections to be included.

Ask about replacing existing connections on open/import?

Requests permission before replacing existing database connections during import

Replace existing connections on open/import?

This is the action that's being taken when there is no dialog shown. *(see previous option) *

Show "Save" dialog?

Allows you to turn off the confirmation dialogs you receive when a transformation has been changed

Automatically split hops?

Turns off the confirmation dialogs you get when you want to split a hop (see also 7.4. Splitting A Hop)

Show "copy or distribute" dialog?

Turns off the warning message that appears when you link a step to multiple outputs. This warning message describes the two options for handling multiple outputs:

  1. Distribute rows - destination steps receive the rows in turns (round robin)
  2. Copy rows - all rows are sent to all destinations

Show repository dialog at startup?

Controls whether or not the repositories dialog shows up at startup.

Ask user when exiting?

Controls whether or not to display the confirmation dialog when a user chooses to exit the application.

Clear custom parameters (steps/plug-ins)

Clears all parameters and flags that were set in the plug-in or step dialogs.

Display tool tips?

This option controls whether or not to display tool tips for the buttons on the main tool bar.

Look and Feel tab

Feature

Description

Fixed width font

The font that is used in the dialog boxes, trees, input fields, and so on

Font on workspace

The font that is used on the graphical view

Font for notes

The font to use in notes that are displayed in the Graphical View

Background color

Sets the background color in Spoon. It affects all dialogs too

Workspace background color

Sets the background color in the Graphical View of Spoon

Tab color

The color that is being used to indicate tabs that are active/selected.

Icon size in workspace

Affects the size of the icons in the graph window. The original size of an icon is 32x32 pixels. The best results (graphically) are probably at sizes 16,24,32,48,64 and other multiples of 32.

Line width on workspace

Affects the line width of the hops on the Graphical View and the border around the step.

Shadow size on workspace

If this size is larger then 0, a shadow of the steps, hops, and notes is drawn on the canvas, making it look like the transformation floats above the canvas.

Dialog middle percentage

By default, a parameter is drawn at 35% of the width of the dialog, counted from the left. You can change this with this parameter. May be useful in cases where you use unusually large fonts.

Canvas anti-aliasing?

Some platforms like Windows, OSX and Linux support anti-aliasing through GDI, Carbon or Cairo. Check this to enable smoother lines and icons in your graph view. If you enable this and your environment doesn't work any more afterwards, change the value for option "EnableAntiAliasing" to "N" in file $HOME/.kettle/.spoonrc (C:\Documents and Settings\<user>\.kettle\.spoonrc on Windows)

Use look of OS?

Checking this on Windows allows you to use the default system settings for fonts and colors in Spoon. On other platforms, this is always the case.

Show branding graphics

Enabling this option will draw Pentaho Data Integration branding graphics on the canvas and in the left hand side "expand bar".

Preferred Language

Here you can specify the default language setting. If a certain text hasn't been translated into this locale, Kettle will fall back to the fail over locale.

Alternative Language

Because the original language in which Kettle was written is English, it's best to set this locale to English.

Search Metadata


This option will search in any available fields, connectors or notes of all loaded jobs and transformations for the string specified in the Filter field. The metadata search returns a detailed result set showing the location of any search hits. This feature is accessed by choosing Edit|Search metadata from the menu bar.

Set Environment Variable


The Set Environment Variable feature allows you to create and set environment variables for the current user session explicitly. This is a useful feature when designing transformations for testing variable substitutions that are normally set dynamically by another job or transformation.

This feature is accessible by choosing Edit|Set Environment Variable from the menu bar.

Note: This page also displays when you run a transformation that use undefined variables. This allows you to define them right before execution time.

Show environment variables

This feature displays the current list of environment variables and their values. It is accessed by selecting the Edit|Show environment variables option from the menu bar.

Execution Log history


If you have configured your Job or Transformation to store log information in a database table, you can view the log information from previous executions by right-clicking on the job or transformation in the Main Tree and selecting 'Open History View'. A view similar to the one below appears:

Note: todo: replace screenshot when PDI-224 is fixed

Note: The log history for a job or transformation also opens by default each next time you execute the file.

Replay


The Replay feature allows you to re-run a transformation that failed. Replay functionality is implemented for Text File Input and Excel input. It allows you to send files that had errors back to the source and have the data corrected. Only the lines that failed before processed during the replay if a .line file is present. The Replay feature uses the date in the filename of the .line file to match the entered replay date.

Generate Mapping Against Target Step


In cases where you have a fixed target table, map the fields from the stream to their corresponding fields in the target output table. Use a Select Values step in your transformation. The 'Generate mapping against target' option provides you with an easy-to-use dialog for defining these mappings that automatically creates the resulting Select Values step that can be dropped into your transformation flow prior to the table output step.

To access the 'Generate mapping against target' option right click in the table output step.

After defining your mappings, select OK and the Select Values step containing your mappings will appear on the workspace. Attach the mapping step into your transformation just before the table output step.

Generate mappings example

Below is an example of a simple transformation in which we want to generate mappings to our target output table:

  1. Begin by right-clicking on the Table output step and selecting 'Generate mappings against target'.
  2. Add all necessary mappings using the Generate Mapping dialog shown above and click OK. You will now see a Table output mapping step has been added to the canvas.
  3. Drag the generated Table output Mapping step into your transformation flow prior to the table output step:

Safe Mode


Use safe mode when you are mixing rows from many sources to ensure that all rows have the same layout in all conditions. The safe mode option is available in the Spoon logging window or on the Execute a Transformation/Job window. When running in safe mode, the transformation checks every row that passes and ensures that all layouts are identical. If a row is does not have the same layout as the first row, an error is generated and reported.

Note: This option is also available in Pan.

Welcome Page


The Welcome page displays the first time you launch Spoon 3.0; it provides you with links to additional information about Pentaho Data Integration. You can disable Welcome page in Spoon options by selecting Options from the Edit menu.