Overview

A transformations can run into a deadlock situation (aka as blocking, stall, hang) out of different reasons:

Example for a deadlock transformation with the Stream lookup step

In the following example, the transformation hangs when the number of input rows is higher then 10,000 (this is identical to the transformation setting "Nr. of rows in rowset". (In more complex scenarios with multiple steps, this number varies and is x times the "Nr. of rows in rowset" where x is the number of steps between the root data step and the joining or lookup step.)

The reason is that every hop can hold up to 10,000 rows in it's buffer. The Stream lookup step waits for further processing of the rows coming directly from the Generate Rows step until all rows came in from the Group by step. Since we have more than 10,000 rows now in the buffer, the Generate rows step stops sending further rows, so the Group By step will never finish and also the other steps.

Workarounds

One workaround is increasing the buffer size "Nr. of rows in rowset", but this is very dangerous when the number of rows to process will get more over time and this workaround should be avoided.

Separate the input streams and have two identical input streams as illustrated in the following transformation:

Another option is to split the transformation into separate transformations and write intermediate data to a temporary file or table. A build in step that can be used as a workaround here, is the Blocking Step. Put this step into the stream and select the option "Pass all rows" as illustrated in the following transformation.

Consider setting the other options in the Blocking step like Cache size according to your needs.

Example for a deadlock transformation with the Merge join step

This example is similar to the previous but the step is different and the number of hobs in between. This transformation stalls when the number of rows is more than 20,002 since there are two hops (buffers or each 10,000 rows) in between from the split with the copy rows and the merge.

The workarounds are the same as described in the previous Stream lookup step example. When we look at the workaround with the Blocking step, it would look like this:

It is a little bit tricky to decide in what leg the Blocking step should be inserted. This depends on the logic and the Join Type (in this case "left outer") of the Merge Join step. Since the Sort rows step blocks all rows, the decision to insert the Blocking step in the other stream is obvious and easy to decide. But there could be other more complicated design logics.

References

Related JIRA cases to warn the user in the design phase:

The samples are attached to this page: BlockingSamples.zip