Fork-Join parallelization: feature or problem?

Java 8′s ForkJoinPool & parallel streams are receiving some serious attention now. They seem a sophisticated concept to parallelize work, but some are warning of serious limitations.

Edward Harned of CoopSoft, in particular, has analyzed the Fork/Join design implemented in Java.

He documents numerous problems with the suitability, efficiency & scalability of the “work stealing” algorithm it implements.

Key problems include:

  • work-stealing causes high contention
  • work-stealing is inefficient at parallelizing jobs
  • I/O, long-running or blocking tasks do not schedule efficiently
  • “recursive decomposition” overflows with uneven structures
  • submission queues become a bottleneck
  • framework may spawn hundreds of “continuation” threads
  • tasks may run on unexpected threads
  • ForkJoinPool is unreliable in EJB containers

There are also severe reservations about the efficiency of scheduler, and the handling of blocking tasks.

Essentially, his position is that Fork/Join is not “Parallel Processing”; instead, it’s a narrow and specialized technique, not applicable for general applications programming but only to a narrow category of “recursive decomposition” tasks.

We should consider our requirements carefully, and remember not to ignore alternatives such as ThreadPoolExecutor.

Are you experimenting with parallel streams? Would these problems affect your applications? Add your comment now.

See:
Edward Harned: A Java Form-Join Calamity
Pierre-yves Saumont: What’s Wrong in Java 8, Parallel Streams
Stack Overflow: Performance problems using new Fork Join framework

One thought on “Fork-Join parallelization: feature or problem?”

  1. Carsten Habicht asked a very good question:

    Bad rant or insightful?

    But if we read the JDK’s own provided examples for ForkJoin usage, we may be able to make our own mind up.

    Oracle’s ‘RecursiveTask’ example, they themselves describes as a “dumb” way to calculate Fibonacci & a linear algorithm would be faster in practice.

    Oracle’s RecursiveAction ‘sumOfSquares’ example introduces both manual scheduling tweaks, task tracking & execution.. no clean algorithm here.

    The only reason I can presume for Oracle to require all these tweaks, would be poor/failed performance without. Such problems even with the documentation examples (!) tend to support Harned’s thesis.

    - RecursiveTask Javadoc
    - RecursiveAction Javadoc

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>