Transformation Deadlocks

Overview

A transformations can run into a deadlock situation (aka as blocking, stall, hang) out of different reasons:

  • It is possible that an external component like a table lock by the database blocks the transformation. This is explained in detail in Database Locks When Reading and Updating from a Single Table.
  • The "Block this step until steps finish" step might run into a deadlock when there are more rows to process than the number of Rows in Rowset. Please see the step documentation for more details.
  • Within transformations there are situations when streams get split and joined again, so that the transformation blocks by design. In general this design can be very critical because a transformation can work with a limited number of rows (e.g. in development and testing environments) and will run into a deadlock state in production. This is explained in more detail here:

Example for a deadlock transformation with the Stream lookup step

In the following example, the transformation hangs when the number of input rows is higher then 10,000 (this is identical to the transformation setting "Nr. of rows in rowset". (In more complex scenarios with multiple steps, this number varies and is x times the "Nr. of rows in rowset" where x is the number of steps between the root data step and the joining or lookup step.)

The reason is that every hop can hold up to 10,000 rows in it's buffer. The Stream lookup step waits for further processing of the rows coming directly from the Generate Rows step until all rows came in from the Group by step. Since we have more than 10,000 rows now in the buffer, the Generate rows step stops sending further rows, so the Group By step will never finish and also the other steps.

Workarounds

One workaround is increasing the buffer size "Nr. of rows in rowset", but this is very dangerous when the number of rows to process will get more over time and this workaround should be avoided.

Separate the input streams and have two identical input streams as illustrated in the following transformation:

Another option is to split the transformation into separate transformations and write intermediate data to a temporary file or table. A build in step that can be used as a workaround here, is the Blocking Step. Put this step into the stream and select the option "Pass all rows" as illustrated in the following transformation.

Consider setting the other options in the Blocking step like Cache size according to your needs.

Example for a deadlock transformation with the Merge join step

This example is similar to the previous but the step is different and the number of hobs in between. This transformation stalls when the number of rows is more than 20,002 since there are two hops (buffers or each 10,000 rows) in between from the split with the copy rows and the merge.

The workarounds are the same as described in the previous Stream lookup step example. When we look at the workaround with the Blocking step, it would look like this:

It is a little bit tricky to decide in what leg the Blocking step should be inserted. This depends on the logic and the Join Type (in this case "left outer") of the Merge Join step. Since the Sort rows step blocks all rows, the decision to insert the Blocking step in the other stream is obvious and easy to decide. But there could be other more complicated design logics.

References

Related JIRA cases to warn the user in the design phase:

  • PDI-4588 Detect dead lock situations in transformations and jobs
  • PDI-1729 As an ETL developer, I would like a warning when the same table within the same connection and transformation is used for read and write to avoid race conditions.

The samples are attached to this page: BlockingSamples.zip