Unique Rows

(warning) PLEASE NOTE: This documentation applies to an earlier version. For the most recent documentation, visit the Pentaho Enterprise Edition documentation site.

Description

The Unique rows step removes duplicate rows from the input stream(s).

(warning) Important: Make sure that the input stream is sorted; otherwise, only consecutive double rows are evaluated correctly.

See also the Unique rows (HashSet) step that does not need the rows to be sorted.

Options

The table below contains descriptions of all options for the Unique rows step:

Option

Description

Step name

Name of the step; this name has to be unique in a single transformation

Add counter to output?

Check this option to add a counter field to the stream.

Counter field

Define the counter field name.

Redirect duplicate row

Processes duplicate rows as an error and redirect rows to the error stream of the step. Requires you to set error handling for this step.

Error Description

Sets the error handling description to display when duplicate rows are detected. Only available when Redirect duplicate row is checked.

Fields to compare table

Specify the field names on which you want to force uniqueness or click Get to insert all fields from the input stream(s) You can choose to ignore case by setting the Ignore case flag to Y. For example: Kettle, KETTLE, kettle are the same if the compare is performed as case-insensitive. In this instance, the first occurrence (Kettle) is passed to the next step(s).