(warning) PLEASE NOTE: This documentation applies to an earlier version. For the most recent documentation, visit the Pentaho Enterprise Edition documentation site.

Description

The Sort rows step sorts rows based on the fields you specify and on whether they should be sorted in ascending or descending order.

Notes:

Options

The following table describes the options associated with the Sort step:

Option

Description

Step name

Name of the step;this name has to be unique in a single transformation.

Sort directory

The directory in which the temporary files are stored in case when needed; the default is the standard temporary directory for the system

TMP-file prefix

Choose an easily recognized prefix so you can identify the files when they show up in the temp directory.

Sort size

The more rows you store in memory, the faster the sorting process because fewer temporary files must be used and less I/O is generated.

Free memory threshold (in %)

If the sort algorithm finds that it has less available free memory than the indicated number, it will start to page data to disk.

Note: This is not exact science, because:

  1. This is checked every 1000 rows. Depending on the row size and other steps within complex transformations this could still lead to an OutOfMemoryError.
  2. In a Java Virtual Machine it's not possible to know the exact amount of free memory.  As such we recommend you don't use this for very complex transformations with other steps and processes that use up a lot of memory.


Compress TMP Files

Compresses temporary files when they are needed to complete the sort.

Only pass unique rows?

Enable if you want to pass unique rows only to the output stream(s).

Fields table

Specify the fields and direction (ascending/descending) to sort. You can specify whether to perform a case sensitive sort (optional)

Get Fields

Click to retrieve a list of all fields coming in on the stream(s).

Metadata Injection Support

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.